How to make data science experiments agile
The tension between long-term planning and short-term flexibility is everywhere, including data science methodology. Is it possible for product development teams to reconcile rapid iteration with the slow-moving behemoth of the deep research process, or must they pick one?
For our case study, we’ll take BrainQ. Spoiler alert: not only is it possible to reconcile fast and slow approaches to data science experiments, but the lessons BrainQ has learned along the way offer a roadmap your team can follow.
BrainQ’s mission is to treat neuro-disorders with AI-powered technologies. If you were about to humor our discussion of agility, that probably stopped you in your tracks — we’re dealing with a behemoth’s behemoth: medical research data science.
Traditional data science is bad enough; most practitioners agree that if you have your heart set on trustworthy statistical inference, there’s a lot of thinking and planning in your future. Every statistics 101 class teaches you to have your hypotheses, methodologies, and assumptions hammered out before your gruelling data collection process even begins. Measure twice, cut once, and don’t even think about iterating!
Whenever you add even a whiff of medical research, it all gets slower. Now the process starts out coated in the superglue of regulatory approval and clinical trials coordination. Agility — with your process broken into small, predictable, iteration-friendly components — is the dream, but how do you inject an agile approach into deep medical research?
Data science lends itself well to both exploration and rigor, though not always at the same time.
The trick is a two-punch approach. Data science lends itself well to both exploration and rigor, though not always at the same time. It turns out that data science best practices for exploring and triaging what’s worth doing with depth and rigor are all about nimbleness. Not everything has to be done slowly and carefully…
Fixing a broken mindset
The first thing to fix is the mindset that can come with classical training in statistics and data science. A typical university exam in statistics presents a series of hypotheses for budding students to test, along with mathematically-phrased assumptions. Most of the finesse is in carefully (properly! rigorously!) testing them. From my first STAT101 midterm to my statistics PhD qualifying exams, the format I experienced was pretty much the same. This kind of thing makes up the bulk of our training, so it’s often the part a newly-minted statistician treasures.
Have you ever noticed that the hypotheses are there all along?
Have you ever noticed that the hypotheses are there all along — nicely thought out by the professor — and students rarely have to question their genesis? Once the sacred question is in place, of course we have to pursue answering it with utmost seriousness. Now turn the whole thing on its head: you have to come up with the hypothesis and assumptions. How do you do that?
One option is to mimic what you’d be used to from class. Meditate in a closet and come up with the hypothesis and assumptions in advance. Design the data collection strategy and statistical testing in advance of any data. Get everything ready to go and then get it right in one shot.
Sounds good? We forgot humility. Chances are that we made a mistake in the setup. As someone with over a decade of experience at this, one of the best lessons I’ve learned is: it’s too hard to think of everything up front.
It’s too hard to think of everything up front.
Locking in an approach up front and following it rigidly means we’ll end up with a perfect solution to the wrong question. (Lovingly called making a Type III error in statistics.)
What you never see in class is how everything can crash and burn if you messed up on figuring out how to ask your question. Those life lessons are hard to simulate and your head might pop as you imagine not imagining everything you forgot to imagine.
Permission to get agile
So if the inflexible approach that feels comfy from data science camp doesn’t work, what to do? Blend in some agile thinking, of course. Here’s the mind hack: allow your approach to be sloppy at first and burn some of your initial time, energy, and data on informing a good direction later.
Here’s the mind hack: allow your approach to be sloppy at first to inform a good direction later.
How do we go about doing that? Allow yourself phases where the only result you’re after is an idea of how to design your ultimate approach better. This means you’re encouraged to start with:
- Low-quality data: use small sample sizes, synthetic data, and non-randomly sampled data to gain insights about the data collection process itself.
- Rough-and-dirty models: seek an understanding of what the payoff from minimum effort looks like. Start with bad algorithms which you know are only going to give you a benchmark, not a final solution.
- Multiple comparisons: instead of picking a single hypothesis test, throw the kitchen sink at your data for inspiration. You’re doing this to discover signals worth basing your final approach on.
This advice breaks pretty much every rule you learned in class.
If the statistician in you isn’t screaming yet, I admire your sangfroid. This advice breaks pretty much every rule you learned in class. So why am I endorsing these “bad behaviors”? Because it matters what project phase you’re in. I’m all about following the standard advice later, but the early pilot phase has different rules.
The important thing is to avoid rookie mistakes by remembering these two principles:
- Don’t take any findings from the early phase too seriously.
- Do collect a clean new dataset when you’re ready for the final version.
Pilot studies in data science
You’re using your initial iterative exploratory efforts to inform your eventual approach (which you’ll take just as seriously as the most studious statistician would). The trick is to use the best of exploratory nimbleness to inform what’s worth considering along the way. If you’re used to the rigidity of traditional statistical inference, it’s time to rediscover the benefits of pilot studies in science and find ways to embed the equivalent into your data science.
The best source of inspiration for a bulletproof final version is the collection of lessons learned along the way to an MVP.
This is the approach that BrainQ has embraced and it has worked wonders for them. If you’d like to learn more about the nitty-gritty of BrainQ’s process, check out the full case-study on The Lever, a technically-oriented source of applied AI advice for startups, operated by Google Developers Launchpad, and co-edited by Peter Norvig and myself.