diogenes.display package

Submodules

diogenes.display.display module

Tools to visualize data and display results

class diogenes.display.display.Report(exp=None, report_path='report.pdf')

Bases: object

Creates pdf reports.

Reports can either be associated with a particular diogenes.grid_search.experiment.Experiment or it can simply be used as a way to concatenate figures, text, and tables

Parameters:
  • exp (diogenes.grid_search.experiment.Experiment or None) – Experiment used to make figures. Optional.
  • report_path (path of the generated pdf) –
add_fig(fig)

Adds matplotlib.figure.Figure to report

add_graph_for_best(func_name)

Adds a graph to report that gives performance of the best Trial

Parameters:func_name (str) – Name of a function that can be run on a Trial that returns a figure. For example ‘roc_curve’ or ‘prec_recall_curve’
add_graph_for_best_prec_recall()

Adds prec/recall for best Trial in an experiment

add_graph_for_best_roc()

Adds roc curve for best Trial in an experiment

add_heading(heading, level=2)

Adds a heading to the report

Parameters:
  • heading (str) – text of heading
  • level (int) – heading level (1 corresponds to html h1, 2 corresponds to html h2, etc)
add_legend()

Adds a legend that shows which trial number in a summary graph corresponds to which Trial

add_subreport(subreport)

Incorporates another Report into this one

Parameters:subreport (Report) – report to add
add_summary_graph(measure)

Adds a graph to report that summarizes across an Experiment

Parameters:measure (str) – Function of Experiment to call. The function must return a dict of Trial: score. Examples are ‘average_score’ and ‘roc_auc’
add_summary_graph_average_score()

Adds a graph to report that summarizes average_score across Experiment

add_summary_graph_roc_auc()

Adds a graph to report that summarizes roc_auc across Experiment

add_table(M)

Adds structured array to report

add_text(text)

Adds block of text to report

get_report_path()

Returns path of generated pdf

to_pdf(options={}, verbose=True)

Generates a pdf

Parameters:
Returns:

Return type:

Path of generated pdf

exception diogenes.display.display.ReportError

Bases: exceptions.Exception

Error generated by Report

diogenes.display.display.crosstab(col1, col2, verbose=True)

Makes a crosstab of col1 and col2. This is represented as a structured array with the following properties:

  1. The first column is the value of col1 being crossed
  2. The name of every column except the first is the value of col2 being crossed
  3. To find the number of cooccurences of x from col1 and y in col2, find the row that has ‘x’ in col1 and the column named ‘y’. The corresponding cell is the number of cooccurrences of x and y
Parameters:
  • col1 (np.ndarray) –
  • col2 (np.ndarray) –
Returns:

structured array

Return type:

np.ndarray

diogenes.display.display.describe_cols(M, verbose=True)

Returns summary statistics for a numpy array

Parameters:M (numpy.ndarray) – structured array
Returns:structured array of summary statistics for M
Return type:numpy.ndarray
diogenes.display.display.feature_pairs_in_rf(rf, weight_by_depth=None, verbose=True, n=10)

Describes the frequency of features appearing subsequently in each tree in a random forest

Parameters:
  • rf (sklearn.ensemble.RandomForestClassifier) – Fitted random forest
  • weight_by_depth (iterable or None) –

    Weights to give to each depth in the “occurences weighted by depth metric”

    weight_by_depth is a vector. The 0th entry is the weight of being at depth 0; the 1st entry is the weight of being at depth 1, etc. If not provided, wdiogenes are linear with negative depth. If the provided vector is not as long as the number of depths, then remaining depths are weighted with 0

  • verbose (boolean) – iff True, prints metrics to console
  • n (int) – Prints the top-n-scoring feature pairs to console if verbose==True
Returns:

A tuple with a number of metrics

  1. A Counter of cooccuring feature pairs at all depths
  2. A list of Counters of feature pairs. Element 0 corresponds to depth 0, element 1 corresponds to depth 1 etc.
  3. A dict where keys are feature pairs and values are the average depth of those feature pairs
  4. A dict where keys are feature pairs and values are the number of occurences of those feature pairs weighted by depth

Return type:

(collections.Counter, list of collections.Counter, dict, dict)

diogenes.display.display.feature_pairs_in_tree(dt)

Lists subsequent features sorted by importance

Parameters:dt (sklearn.tree.DecisionTreeClassifer) –
Returns:Going from inside to out:
  1. Each int is a feature that a node split on
  2. If two ints appear in the same tuple, then there was a node that split on the second feature immediately below a node that split on the first feature
  3. Tuples appearing in the same inner list appear at the same depth in the tree
  4. The outer list describes the entire tree
Return type:list of list of tuple of int
diogenes.display.display.get_roc_auc(labels, score, verbose=True)

return area under ROC curve

Parameters:
  • labels (np.ndarray) – vector of ground truth
  • score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
  • verbose (boolean) – iff True, prints area under the curve
Returns:

area under the curve

Return type:

float

diogenes.display.display.get_top_features(clf, M=None, col_names=None, n=10, verbose=True)

Gets the top features for a fitted clf

Parameters:
  • clf (sklearn.base.BaseEstimator) – Fitted classifier with a feature_importances_ attribute
  • M (numpy.ndarray or None) – Structured array corresponding to fitted clf. Used here to deterimine column names
  • col_names (list of str or None) – List of column names corresponding to fitted clf.
  • n (int) – Number of features to return
  • verbose (boolean) – iff True, prints ranked features
Returns:

structured array with top feature names and scores

Return type:

numpy.ndarray

diogenes.display.display.plot_box_plot(col, title=None, verbose=True)

Makes a box plot for a feature

Parameters:
  • col (np.array) –
  • title (str or None) – title of a plot
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_correlation_matrix(M, verbose=True)

Plot correlation between variables in M

Parameters:
  • M (numpy.ndarray) – structured array
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_correlation_scatter_plot(M, verbose=True)

Makes a grid of scatter plots representing relationship between variables

Each scatter plot is one variable plotted against another variable

Parameters:
  • M (numpy.ndarray) – structured array
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_kernel_density(col, verbose=True)

Plots kernel density function of column

From: https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/

Parameters:
  • col (np.ndarray) –
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_on_timeline(col, verbose=True)

Plots points on a timeline

Parameters:
  • col (np.array) –
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

Returns:

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_prec_recall(labels, score, title='Prec/Recall', verbose=True)

Plot precision/recall curve

Parameters:
  • labels (np.ndarray) – vector of ground truth
  • score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
  • title (str) – title of plot
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_roc(labels, score, title='ROC', verbose=True)

Plot ROC curve

Parameters:
  • labels (np.ndarray) – vector of ground truth
  • score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
  • title (str) – title of plot
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.plot_simple_histogram(col, verbose=True)

Makes a histogram of values in a column

Parameters:
  • col (np.ndarray) –
  • verbose (boolean) – iff True, display the graph
Returns:

Figure containing plot

Return type:

matplotlib.figure.Figure

diogenes.display.display.pprint_sa(M, row_labels=None, col_labels=None)

Prints a nicely formatted Structured array (or similar object) to console

Parameters:
  • M (numpy.ndarray or list of lists) – structured array or homogeneous array or list of lists to print
  • row_labels (list or None) – labels to put in front of rows. Defaults to row number
  • col_labels (list of str or None) – names to label columns with. If M is a structured array, its column names will be used instead
diogenes.display.display.table(col, verbose=True)

Creates a summary or the number of occurrences of each value in the column

Similar to R’s table

Parameters:col (np.ndarray) –
Returns:structured array
Return type:np.ndarray

Module contents