Being Data Science Ready – Hacker Noon

How to accelerate your startup, build data equity, and control data debt

Data debt sticks around for a long time and is hard to handle. Don’t accumulate too much of it.

Sometimes development teams discover that they have accumulated so much technical debt that the only solution is to tear down large parts of the codebase and rebuild them from scratch.

That is very painful, but it can work. That’s because you only care about what’s in your code now. If an old version was cumbersome and buggy, that doesn’t matter anymore. Like in a game of chess, in software what matters is the current state of the board.

Data is not like that. The value of your data comes in part from quantity and history. If you revamp your data practices and improve them, that’s great. But you’re not going to throw away all your data, wipe down the database, and sit around and wait until you’ve collected some new data, are you?

Data debt sticks around for a long time. It’s hard to handle, and you can’t solve the problem by starting from scratch. So you should be careful about accumulating it.

The benefits of being data science ready

So data debt is bad. But reducing or preventing it takes effort. It’s basically a law of nature.

So why not just worry about it later, when you hire your first data scientist? Or, how much should you invest in controlling your data debt now? What are the signs that you are data science ready?

Being data science ready has two immediate benefits: acceleration and equity.

Focusing your efforts on maximizing them will help you collect the right data, keep it in good shape, and prevent excessive data debt — while staying focused on the priorities of an early stage company.

Acceleration: you can’t learn what you don’t measure

In an early stage startup, the primary goal is to learn: understand the market need, compare distribution channels, experiment with improving conversion, etc.

The most successful framework to accelerate learning is the lean startup, which relies on a quick succession of build-measure-learn experiments, each designed to test an hypothesis about the market.

The key word here is measure. Acceleration comes from learning. Learning comes from measuring. Measuring requires data. Obviously, you have to collect the right data to properly learn from your experiments. Fortunately, that is the key to being data science ready.

The Proof of the Pudding is in the Analytics

Acceleration doesn’t require fancy data science. All it takes is rock solid analytics.

Let’s say you want to run an A/B test on a new feature of your product. Here are some questions you will want to measure and track: Which users exactly were in arm A and which in arm B? How many times did each user interact with the feature? For each of these interactions, how did each user respond? Are there differences between different demographic groups, geographic areas, or other user characteristics?

Without understanding the answers to such questions you can’t learn from the experiment. So capturing them in your database is a great investment:

  • Make a list of questions like these that should be easy to answer. Make them very granular.
  • Write the relevant queries. Make sure that they are tested and documented.
  • When you make changes to the database, test the queries for slowdowns or altered output.

As you have more and more granular analytics available at the tip of your fingers, it will become very easy to do deep dives into user behavior. You will get easy access to data-oriented insights that will clarify your thinking about your product. That’s actually one of the main roles of a data scientist — and you’ll be doing it before you even hire one!

Equity: Your data is an asset

The second benefit of being data science ready is equity. For any modern company, data is a primary asset. It is taken into account in the valuation of your company. Many companies have been acquired just for their data. Access to vast amounts of data is one of the main reasons why tech giants like Google and Facebook are so powerful.

To be valuable, data has to be actionable.

If a data scientist joins your company and spends six months on cleaning data before she can do anything with it, then your data is not actionable. Your company is literally worth less because of that.

Give your future data scientist a voice

Think about your data from the perspective of your future data scientist

As a product developer, you are used to thinking about things from the point of view of the user. You do that even if you don’t personally interact with most of your users, or before you even have any users.

Sometimes the user is not the client paying for your product. Instead it could be their employer or an advertiser. As you build your product, you think about how to provide value to them as well.

“Your data is an asset” means that at some point someone will be willing to pay for your data. That person is also a client. You should keep them in mind.

That client will be paying you because they can answer questions that they care about based on your data. Well, answering questions based on your data is exactly the job of a data scientist!

By thinking about your future data scientist you are building data equity.

Remember Dahlia? The best way to maintain data equity is to think about things from her perspective. Write user stories for her. Prioritize them in your backlog.

It doesn’t matter if you don’t know anything about machine learning. Dahlia wants to answer questions about your data. And that’s great, because you’re already answering many questions like that by having solid analytics!

All you have to do now is think about how Dahlia can learn from the analytics that you already have and use them to answer other questions about the data. So as you are writing all those queries, ask yourself:

  • How easy would it be for someone who doesn’t know the database to understand these queries?
  • Is all the information necessary to understand them contained in the database? Or do they rely on institutional knowledge, old documents on Google Drive, etc.?
  • Is the work that led to these queries properly recorded, ideally under version control in your codebase?

Dahlia will thank you for thinking about her, and when she finally joins the team, she will be a force multiplier.

Bottom line

  • Do you have actionable analytics? Are they granular and well tested?
  • If a new data scientist joins your team, would she be able to quickly understand the queries behind the analytics? Would she be able to change them without breaking things?

If the answer to these questions is yes, then congratulations: you are data science ready. You are doing a great job controlling data debt. You are accelerating your learning cycles and building data equity, adding to the value of your company.

And the answer is no? Then you should start thinking about reducing data debt. I will address how to do so in a future post.

read original article here