diogenes.display package¶
Submodules¶
diogenes.display.display module¶
Tools to visualize data and display results
-
class
diogenes.display.display.
Report
(exp=None, report_path='report.pdf')¶ Bases:
object
Creates pdf reports.
Reports can either be associated with a particular diogenes.grid_search.experiment.Experiment or it can simply be used as a way to concatenate figures, text, and tables
Parameters: - exp (diogenes.grid_search.experiment.Experiment or None) – Experiment used to make figures. Optional.
- report_path (path of the generated pdf) –
-
add_fig
(fig)¶ Adds matplotlib.figure.Figure to report
-
add_graph_for_best
(func_name)¶ Adds a graph to report that gives performance of the best Trial
Parameters: func_name (str) – Name of a function that can be run on a Trial that returns a figure. For example ‘roc_curve’ or ‘prec_recall_curve’
-
add_graph_for_best_prec_recall
()¶ Adds prec/recall for best Trial in an experiment
-
add_graph_for_best_roc
()¶ Adds roc curve for best Trial in an experiment
-
add_heading
(heading, level=2)¶ Adds a heading to the report
Parameters: - heading (str) – text of heading
- level (int) – heading level (1 corresponds to html h1, 2 corresponds to html h2, etc)
-
add_legend
()¶ Adds a legend that shows which trial number in a summary graph corresponds to which Trial
-
add_subreport
(subreport)¶ Incorporates another Report into this one
Parameters: subreport (Report) – report to add
-
add_summary_graph
(measure)¶ Adds a graph to report that summarizes across an Experiment
Parameters: measure (str) – Function of Experiment to call. The function must return a dict of Trial: score. Examples are ‘average_score’ and ‘roc_auc’
-
add_summary_graph_average_score
()¶ Adds a graph to report that summarizes average_score across Experiment
-
add_summary_graph_roc_auc
()¶ Adds a graph to report that summarizes roc_auc across Experiment
-
add_table
(M)¶ Adds structured array to report
-
add_text
(text)¶ Adds block of text to report
-
get_report_path
()¶ Returns path of generated pdf
-
to_pdf
(options={}, verbose=True)¶ Generates a pdf
Parameters: - options (dict) – options are pdfkit.from_url options. See https://pypi.python.org/pypi/pdfkit
- verbose (bool) – iff True, gives output regarding pdf creation
Returns: Return type: Path of generated pdf
-
exception
diogenes.display.display.
ReportError
¶ Bases:
exceptions.Exception
Error generated by Report
-
diogenes.display.display.
crosstab
(col1, col2, verbose=True)¶ Makes a crosstab of col1 and col2. This is represented as a structured array with the following properties:
- The first column is the value of col1 being crossed
- The name of every column except the first is the value of col2 being crossed
- To find the number of cooccurences of x from col1 and y in col2, find the row that has ‘x’ in col1 and the column named ‘y’. The corresponding cell is the number of cooccurrences of x and y
Parameters: - col1 (np.ndarray) –
- col2 (np.ndarray) –
Returns: structured array
Return type: np.ndarray
-
diogenes.display.display.
describe_cols
(M, verbose=True)¶ Returns summary statistics for a numpy array
Parameters: M (numpy.ndarray) – structured array Returns: structured array of summary statistics for M Return type: numpy.ndarray
-
diogenes.display.display.
feature_pairs_in_rf
(rf, weight_by_depth=None, verbose=True, n=10)¶ Describes the frequency of features appearing subsequently in each tree in a random forest
Parameters: - rf (sklearn.ensemble.RandomForestClassifier) – Fitted random forest
- weight_by_depth (iterable or None) –
Weights to give to each depth in the “occurences weighted by depth metric”
weight_by_depth is a vector. The 0th entry is the weight of being at depth 0; the 1st entry is the weight of being at depth 1, etc. If not provided, wdiogenes are linear with negative depth. If the provided vector is not as long as the number of depths, then remaining depths are weighted with 0
- verbose (boolean) – iff True, prints metrics to console
- n (int) – Prints the top-n-scoring feature pairs to console if verbose==True
Returns: A tuple with a number of metrics
- A Counter of cooccuring feature pairs at all depths
- A list of Counters of feature pairs. Element 0 corresponds to depth 0, element 1 corresponds to depth 1 etc.
- A dict where keys are feature pairs and values are the average depth of those feature pairs
- A dict where keys are feature pairs and values are the number of occurences of those feature pairs weighted by depth
Return type: (collections.Counter, list of collections.Counter, dict, dict)
-
diogenes.display.display.
feature_pairs_in_tree
(dt)¶ Lists subsequent features sorted by importance
Parameters: dt (sklearn.tree.DecisionTreeClassifer) – Returns: Going from inside to out: - Each int is a feature that a node split on
- If two ints appear in the same tuple, then there was a node that split on the second feature immediately below a node that split on the first feature
- Tuples appearing in the same inner list appear at the same depth in the tree
- The outer list describes the entire tree
Return type: list of list of tuple of int
-
diogenes.display.display.
get_roc_auc
(labels, score, verbose=True)¶ return area under ROC curve
Parameters: - labels (np.ndarray) – vector of ground truth
- score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
- verbose (boolean) – iff True, prints area under the curve
Returns: area under the curve
Return type: float
-
diogenes.display.display.
get_top_features
(clf, M=None, col_names=None, n=10, verbose=True)¶ Gets the top features for a fitted clf
Parameters: - clf (sklearn.base.BaseEstimator) – Fitted classifier with a feature_importances_ attribute
- M (numpy.ndarray or None) – Structured array corresponding to fitted clf. Used here to deterimine column names
- col_names (list of str or None) – List of column names corresponding to fitted clf.
- n (int) – Number of features to return
- verbose (boolean) – iff True, prints ranked features
Returns: structured array with top feature names and scores
Return type: numpy.ndarray
-
diogenes.display.display.
plot_box_plot
(col, title=None, verbose=True)¶ Makes a box plot for a feature
Parameters: - col (np.array) –
- title (str or None) – title of a plot
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_correlation_matrix
(M, verbose=True)¶ Plot correlation between variables in M
Parameters: - M (numpy.ndarray) – structured array
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_correlation_scatter_plot
(M, verbose=True)¶ Makes a grid of scatter plots representing relationship between variables
Each scatter plot is one variable plotted against another variable
Parameters: - M (numpy.ndarray) – structured array
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_kernel_density
(col, verbose=True)¶ Plots kernel density function of column
From: https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/
Parameters: - col (np.ndarray) –
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_on_timeline
(col, verbose=True)¶ Plots points on a timeline
Parameters: - col (np.array) –
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
Returns: Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_prec_recall
(labels, score, title='Prec/Recall', verbose=True)¶ Plot precision/recall curve
Parameters: - labels (np.ndarray) – vector of ground truth
- score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
- title (str) – title of plot
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_roc
(labels, score, title='ROC', verbose=True)¶ Plot ROC curve
Parameters: - labels (np.ndarray) – vector of ground truth
- score (np.ndarray) – vector of scores assigned by classifier (i.e. clf.pred_proba(...)[-1] in sklearn)
- title (str) – title of plot
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
plot_simple_histogram
(col, verbose=True)¶ Makes a histogram of values in a column
Parameters: - col (np.ndarray) –
- verbose (boolean) – iff True, display the graph
Returns: Figure containing plot
Return type: matplotlib.figure.Figure
-
diogenes.display.display.
pprint_sa
(M, row_labels=None, col_labels=None)¶ Prints a nicely formatted Structured array (or similar object) to console
Parameters: - M (numpy.ndarray or list of lists) – structured array or homogeneous array or list of lists to print
- row_labels (list or None) – labels to put in front of rows. Defaults to row number
- col_labels (list of str or None) – names to label columns with. If M is a structured array, its column names will be used instead
-
diogenes.display.display.
table
(col, verbose=True)¶ Creates a summary or the number of occurrences of each value in the column
Similar to R’s table
Parameters: col (np.ndarray) – Returns: structured array Return type: np.ndarray