Interactive visualizations with Bokeh
The above tools will allow you to rapidly examine the performance of a model. However, we can go one step further and generate dynamic or interactive visualizations, which can allow us to more deeply inspect tradeoffs between metrics at different thresholds, and more rapidly select a classifier of interest.
For this section, we’ll use Bokeh, an interactive visualization library in Python. If you’ve read previous posts in this series, you’ll already be familiar.
Bokeh has become an incredibly useful way to generate interactive visualizations in Python. A major value of making plots interactive with Bokeh is that it is now easy to use with pandas dataframes and the HoverTool function allows you to add additional dimensionality to your data without cluttering it. For example, we can generate a HoverTool that has the threshold for each point on the ROC curve, or even add additional metrics (such as F1 score, precision).
To generate a Bokeh plot, we’ll use
ColumnDataSource to encode the output of
roc_curve and pass that to the plot. Below is a simple example:
# generate roc curve
fpr, tpr, thresholds = roc_curve(y_test, probas_[:,1], pos_label=pos_label_)
# calculate auc
roc_auc = auc(fpr, tpr)
# create CDS
source_ROC = ColumnDataSource(data=dict(x_fpr=fpr,
In the above snippet, we get the outputs of the curve, add a “legend” which is the AUC (note we encode this as a vector with the same length as our other variables), package them as a dictionary, and pass them to a
ColumnDataSource which we can then use to plot:
""" Very basic ROC curve plot """
# create custom HoverTool that will show exact values
hover_ = HoverTool(names=['ROC'], tooltips=[("TPR", "@y_tpr"), ("FPR", "@x_fpr"), ("Thresh", "@thresh")])
# add custom HoverTool to toolbox we want with our plot
p_tools = [hover_, 'zoom_in', 'zoom_out', 'save', 'reset']
# create plot
p = figure(title=f'ROC curve', tools=p_tools)
# add roc curve line
p.line('x_fpr', 'y_tpr', line_width=1, color="blue", source=source_ROC)
# add explicit data points along line - note we apply legend here
p.circle('x_fpr', 'y_tpr', size=5, color="orange", legend='auc_legend', source=source_ROC, name='ROC')
The code in our notebook is more detailed and will yield the following plot (GIF of interactive viz below):
If you navigate to the repo, you’ll see example of different ROC curve implementations. For example, one in which a cross-validation is performed and the mean of the ROC curve is shown, as well as a version that shows performance of different classifiers:
Finally, I include code for how to generate a combined interactive ROC and PR curves: