In our fast-paced world, we want results, and we want them NOW. The sooner, the better—time is of the essence. But in certain situations, this isn’t a practical mindset. With any new data analytics project, you won’t necessarily see immediate results. As the old saying goes, Rome wasn’t built in a day. And neither are perfect predictive models. By their very nature, they take time to develop, time to improve, and time to show results. However, this investment of time is well worth the wait in the long run.
Seeing Results While Controlling for Outside Factors
When changing a business practice, the ability to see the immediate impact of that decision helps to justify the costs of its development and implementation. However, we must be aware of potential confounding variables that may affect the measurement of the new practice’s effectiveness. In probability theory, this is known as “The Law of Large Numbers.” Put simply, as the number of trials (data points) increases, the average will come closer to the actual (expected) value. Let’s look at a real-world example to explain this phenomenon.
Suppose you own a cupcake shop. One day, you discover a new recipe that you believe is better than your current recipe, and you decide to start baking all of your cupcakes using the new recipe. Typically, you sell 100 cupcakes a day. On the first day you start using the improved recipe, you only sell 40 cupcakes. On the surface, this seems like an epic failure. Do you chalk it up to the new recipe? Perhaps. But it could be that there just weren’t many people in your area having birthdays that day, or the gloomy weather prevented people from venturing out for cupcakes.
The next day, you sell your usual 100 cupcakes. Do you conclude that your new recipe is only as good as your old recipe? Or do you decide to go back to the old recipe because your two-day sales figures are down?
Let’s say you decide to stick it out and continue using the new recipe. The next day you sell 125 cupcakes, then 115, then 150. People start telling their friends about your great new cupcakes, and the next month, you sell, on average, 200 cupcakes a day! Assuming nothing else about your business has changed, you can conclude that your improved recipe has improved your sales—but it required some time to see the overall effect.
Imagine if you had abandoned the new recipe after day 2—you would likely still be selling an average of 100 cupcakes a day. In retrospect, you didn’t have enough data at that point to make an informed decision. Having more data points helps you avoid confounding factors outside of your control—such as the weather on a particular day.
A Statistical Explanation
This idea holds especially true with predictive modeling. Modeling uses statistical methods to compute a propensity score. This score shows the likelihood of an event happening. This doesn’t mean higher scoring events will occur and lower scoring ones won’t, it just means that higher scoring events are more likely to occur. Keep in mind that there are many variables outside of anyone’s knowledge or control—environmental, circumstantial, and of course, a person’s own free will. A model may predict that a customer is very likely to do something, but at the end of the day, there are other factors involved.
For example, perhaps the model says that customer type C has an 80% propensity to perform action A. What this score means is that customer C is expected to perform that action eight out of ten times. When you start tracking the actions of type-C customers, perhaps the first two do NOT perform action A. If you were to draw conclusions from this point in time, you would conclude that there must be something wrong with the model.
However, if you continue tracking the behavior of type-C customers, you see that out of twenty total type-C customers, eighteen of them HAVE performed action A. Now, you can see that the 80% prediction was indeed accurate.
Another way to think about this is to remember the main principle behind “Big Data”—patterns emerge when you have data in large quantities. This can also be applied to the results. It takes time to gather a sufficiently large collection of data points—we must remember to think about the big picture rather than the short term.
Because of these external factors and the nature of statistics, a long-term, controlled experiment is recommended. Only after attempting to control for confounding variables—comparing a control state to an experimental state and looking at the effectiveness of the model over time—can you see real, accurate results.
Predictive Models Improve Over Time
Additionally, one of the most interesting characteristics of predictive modeling is the ability of the model to improve over time as new data is gathered, added, and integrated into the model. The propensity score on day 15 will be more accurate than the score on day 1, for the simple reason that the model has had a chance to improve over time. Using the cupcake analogy, maybe you have a suggestion box in your store, and every day, a knowledgeable customer provides a suggestion for improving your recipe or your offerings. Every day, your cupcakes get better and better! By the end of the month, you would barely be able to keep up with the long lines of customers waiting to buy your new, improved cupcakes.
The Time for Analytics is NOW!
Because it takes some time to develop the best model and see accurate results, it’s best to get ahead of your competitors and start analytics as soon as possible. If you don’t think you are ready to embark on a project now, at least get your company analytics-ready for the future: start thinking about what kinds of questions your company could benefit from answering through analytics; begin to take inventory of your data resources; and create a plan for improving your data collection techniques. While you may not see the results in the immediate short-term, don’t lose sight of the fact that analytics are a great solution for long-term business success.
To quote a song popularized by the late George Harrison,
“It’s gonna take time
A whole lot of precious time
It’s gonna take patience and time,
To do it right, child”
Analytics takes time, but the results are worth it!