Most companies claim to make data-driven decisions
Startups are often proud to say that they do.
But in my experience, that’s not the whole truth. Sure, decisions are made based on data, but mostly the obvious ones and based on data that’s available at the time.
Making data-driven decisions is hard
The first challenge is understanding, out of all the data you already have, what is relevant. Next, you need to figure out what other data you need and how to collect it.
It takes a lot of effort and practice to fully understand your metrics and how they relate.
You also need to continuously review decisions as more and better data becomes available.
Finally, it also takes experience to tell when a change in metrics is a signal and not variance.
First, you must gather as much correct data as possible
You must make decisions based on sufficient correct and complete data. Almost always at least one of these factors is missing and sometimes even more than one.
Sufficient
Having sufficient data is difficult for two main reasons: you probably don’t have too much traffic, and you have just started building your tracking.
That’s why you should track more data than you think you need: you will avoid situations where you need to add tracking and then wait for a month before you have enough data.
You should use a sample size calculator, or an A/B test calculator to give you an idea on the number of data points you need.
Correct
Correct has two meanings.
First, correct data means calculating it correctly and to a sufficient degree of accuracy. An average of 1000 data points isn’t necessarily the same as the average of averages of a 100 data points. Similarly, do you need the average, or the median? I’ve had the book Statistics for Engineers recommended to me, but have not read it yet.
Secondly, it also means that out of all the metrics that you have, you have chosen the ones correlate the most with the decision you’re trying to make.
Complete
Complete data means that you can see the whole picture and all the pieces that are affected.
Then, you must analyze and understand your data
You must know which metrics are effectively a compound of others and not the roots themselves. A very simple example is average order size: it’s the total order value divided by the number of orders. You can’t affect the average order size, but you can affect each of its components.
You also need to know which metrics are related, and how. For instance, basket size and number of orders for a particular customer might be inversely correlated. The more orders they place, the smaller each individual order is.
Make a decision
So you have all data in the sufficient sample size. It’s been calculated correctly. It’s relevant and gives you the whole picture. You have analyzed it and understand it.
Now, you can truly make a data driven decision. Well done 👍
Write it down
So that you can both communicate the decision to others, but so that you have a way to recall your original reasoning, you must write all of this down.
- The decision you’re trying to make in the form of a question
- The assumptions you might have made
- The answer you’ve come up with
- And all the supporting data, with projections
The bigger the decision, the more thorough you should be.
The projections should include both the number you’ll move (i.e. the number you can directly influence) as well as how you expect the other numbers to move in correlation.
For instance, if you are planning on issuing coupons in order to increase the number of users, you should write down both how many coupons you’ll issue as well as how many you expect will be redeemed.
Review it periodically
You will then want to periodically re-visit all decisions depending on their impact and uncertainty.
The more impactful and more uncertain a decision is, the more frequently it needs to be monitored to ensure that your assumptions and projections still hold.
If changes need to be made you should revise your decision document.
Practice till you can easily tell the signal from the noise
By constantly going through this process, you will learn how to tell when signals are strong enough and when they’re just variance.
The main issue is that with small data sets even seemingly large shifts in data carry little meaning.
One way to counteract this is to track several related but not dependent numbers and look for consistent changes.
It’s a lot of work
Recognizing how important this is and doing it constantly, consistently and well is difficult but sets companies apart.
One of my favorite examples of a team making proper data driver decisions it Great Britain’s cycling team that dominated the olympics in 2012. They embraced a philosophy of marginal gains where they would make any slight improvements. For instance, they 3D printed a custom holder for the cycling computer for one of their team members to get a slightly better posture for them.
Early-stage startups, however, rarely have the bandwidth to take it to this extreme. The closer you are to it, though, the better off you’ll be.