Making sense of real-world data: ROC curves, and when to use them

Interactive visualizations with Bokeh

The above tools will allow you to rapidly examine the performance of a model. However, we can go one step further and generate dynamic or interactive visualizations, which can allow us to more deeply inspect tradeoffs between metrics at different thresholds, and more rapidly select a classifier of interest.

For this section, we’ll use Bokeh, an interactive visualization library in Python. If you’ve read previous posts in this series, you’ll already be familiar.

Bokeh has become an incredibly useful way to generate interactive visualizations in Python. A major value of making plots interactive with Bokeh is that it is now easy to use with pandas dataframes and the HoverTool function allows you to add additional dimensionality to your data without cluttering it. For example, we can generate a HoverTool that has the threshold for each point on the ROC curve, or even add additional metrics (such as F1 score, precision).

To generate a Bokeh plot, we’ll use ColumnDataSource to encode the output of roc_curve and pass that to the plot. Below is a simple example:

# generate roc curve
fpr, tpr, thresholds = roc_curve(y_test, probas_[:,1], pos_label=pos_label_)
# calculate auc
roc_auc = auc(fpr, tpr)
# create CDS 
source_ROC = ColumnDataSource(data=dict(x_fpr=fpr,
y_tpr=tpr,
thresh=thresholds,
auc_legend=roc_auc*len(tpr))

In the above snippet, we get the outputs of the curve, add a “legend” which is the AUC (note we encode this as a vector with the same length as our other variables), package them as a dictionary, and pass them to a ColumnDataSource which we can then use to plot:

""" Very basic ROC curve plot """
# create custom HoverTool that will show exact values
hover_ = HoverTool(names=['ROC'], tooltips=[("TPR", "@y_tpr"), ("FPR", "@x_fpr"), ("Thresh", "@thresh")])
# add custom HoverTool to toolbox we want with our plot
p_tools = [hover_, 'zoom_in', 'zoom_out', 'save', 'reset']
# create plot
p = figure(title=f'ROC curve', tools=p_tools)
# add roc curve line
p.line('x_fpr', 'y_tpr', line_width=1, color="blue", source=source_ROC)
# add explicit data points along line - note we apply legend here
p.circle('x_fpr', 'y_tpr', size=5, color="orange", legend='auc_legend', source=source_ROC, name='ROC')
# show 
show(p)

The code in our notebook is more detailed and will yield the following plot (GIF of interactive viz below):

Interactive ROC curve in Bokeh. Note that HoverTool allows us to explicitly see the TPR, FPR, and threshold at each datapoint (orange) along the curve (blue). At a threshold of 0.5 (blue dot), we achieve a TPR of 0.98 and FPR of 0.39.

If you navigate to the repo, you’ll see example of different ROC curve implementations. For example, one in which a cross-validation is performed and the mean of the ROC curve is shown, as well as a version that shows performance of different classifiers:

CodePen.io embedding of the interactive visualization. Click to play with the plots.

Screenshot from the CodePen.io embed above.

Finally, I include code for how to generate a combined interactive ROC and PR curves:

CodePen.io embedding of the interactive visualization. Click to play with the plots.

read original article here