When you train a classifier on a dataset, it is using a specific algorithm to define a set of hyperplanes that separates the data points into specific classes. Where the algorithm switches from one class to the other are called decision boundaries. On one side a decision boundary, a datapoint is more likely to be called as one class — on the other side of the boundary, it’s more likely to be called as another.
Boundaries are fuzzy, but they illustrate where key ‘decision points’ are made by the model.
Importantly, decision boundaries are not confined to just the data points you provided — they span through the entire feature space you trained on. The model can predict a value for any possible combination of inputs in your feature space. If the data you train on is not ‘diverse’, the overall topology of the model (decision boundaries and classification regions) will generalize poorly to new instances.
This is important to know for models you throw into production, or try to reuse on orthogonal datasets. There is nothing inherent to a machine learning model that will warn you if the model is not appropriate for another dataset. There is nothing that will tell you ‘this data point is very different from the ones I learned on.’
Understanding the limitations of existing models and the decision boundaries they learned is helpful for repurposing and reapplication, especially in instances where retraining or transfer learning are not possible.
Training a classifier requires data and an algorithm. Choosing an algorithm is an iterative and often experimental process. Rarely am I able to correctly select the appropriate algorithm that will perform best on a particular dataset on my first try.
So why is that? Why is there no ‘one model to rule them all’? Can’t we just throw a neural net at every problem?
The “No Free Lunch Theorem” states that search and optimization algorithms with excellent performance for one class of problems will not excel at others. In other words, there is no universally-useful algorithm across all data. Selecting the right approach takes intuition, an understanding of the data and goals of the analysis, practice, and time.
Examining decision boundaries is a great way to learn how the training data you select affects performance and the ability for your model to generalize — especially if you’re someone who learns tactilely. Visualization of decision boundaries can illustrate how sensitive models are to each dataset. And it’s a great way to intuitively understand how specific algorithms work, and their limitations for specific datasets.
Decision Boundary Plots in Bokeh
In Part 1, I discussed using Bokeh to generate interactive PCA reports. Here I’ll discuss how to use Bokeh to generate decision boundary plots.
My goals for this visualization tool were three-fold. Given a model and a dataset:
- I want to see which data points were used for training, to understand if the distribution was appropriate
- I want to see which data points were used for testing, and ‘where’ the classifier is challenged with accurate predictions
- I want to see the decision boundaries for each class, to understand if/how the model will be useful for prediction for future, unseen data points
In addition, the code should be as generalizable as possible — it should accept any sci-kit learn classifier and any dataset with any number of classes. Note that a limitation of the approach is that it only works on two-dimensional data currently, so transforming the data (with e.g. PCA) is necessary. In the future, I may explore visualizing multi-dimensional decision boundaries in two/three human-interpretable dimensions.
Below, I’ll walk through key components of the visualization.