I created my first data mining model back in 2005. This was a basket analysis in which we determined which products are commonly found together (numerous connection) and which are almost never found apart from each other (strong connection). We used this to rearrange the shelves in five stores, putting the numerously connected products in corners of the store, driving as many people past other shelves as possible. Strongly connected products were kept in the same or adjoining shelves. This increased the upsell of about 30%, so shortly after the evaluation, all 900 stores were rebuilt according to the new layout.
Ever since then, I’ve been in awe of what data mining, now often referred to as machine learning, can do. I am sure many of you have employed and maybe even operationally use such models. Having continued to use them to score customers, from churn likelihood to cross sales potential to probability of accepting an offer and to many other things, I sat down in 2013 and wondered what the next step could be. Here I was at a company with several well working models, but even though they were labeled as “predictive”, none could acutally tell me much about the future. So I started thinking; What if there was a way to hook up the models to the cash flow?
As it turns out, there was a way. So far, the models had been used to more or less categorize customers, such as in “potential churners” and “loyal customers”, based on some threshold of the probability to churn. However, behind the strict categorization are individual probabilities for churn. Even loyal customers have a probability to churn, albeit a low one. The first realization was that if we were to use the models more intelligently, we would have to give up bagging, boosting, and any other methods that may distort the probability distributions (back then there were no way to calibrate the resulting probabilities). We needed probabilities as close to the actual ones as possible. Among customers with a predicted 2% likelihood to churn in a year, after a year the outcome should be that 2% of them have churned.
Interestingly, this meant that the classification accuracy of the model went down, but it was more true to reality when looking at the population as a whole. With those individual and realistic probabilities in place, the next step was to use them to build a crystal ball, so that we could look into the future. I devised a game theoretical model, into which I could pour individuals and their probabilities for events happening in a given time frame, and it would return for which individuals these events actually happened. Iterate, and I could predict the same for the next time frame, and the next, and the next… We’ll call this a simulation.
This is where randomness comes into play. There is no way to say for sure which of the customers with a low 2% probability to churn that actually will churn. That would just be too good. The game theoretical model will, however, spit out the correct number of such churners, with respect taken to all the other customers and their individual probabilities. Because of that, it works well on different aggregated levels, but as you increase the granularity the results will become more stochastic. In order to get monetary results, the simulation was extended to take revenue and costs into account and a whole number of other things that could be calculated using traditional business logic. Apart from customer outflow through churn, there is also customer inflow. These were modeled as digital twins of existing customers. With all this in place, it was for the first time possible to forecast the revenue, among all the other things, far into the future.
Running several simulations, with different random numbers, will actually tell you if your business is volatile or stable. Hopefully, the results from using different random numbers will not differ much, indicating that your business is stable. In reality, there is no perfectly stable business though. In one simulation your very best customer may churn early, whereas in another the same customer stays until the end. Even if the difference on the bottom line is slight, such a difference impairs comparability between simulations. The solution, provided that your business is quite stable, is still to use random numbers, but such that remain fixed between simulations.
So, if you have a well working crystal ball, why would there be a need to do more than one simulation? Well, right now, the crystal ball has about one hundred thousand parameters; knobs that you can turn. Almost all of these are statistically determined, and a few are manually entered, but many are very interesting to fiddle with. Simulations are perfect to use when you want to do what-if analysis. Run a baseline simulation, based on the most likely future scenario, then twist some knobs, run again, and compare. This can also be used to get an idea of how sensitive your business is to a twist and which knobs matter the most.
I’ve run baselines, worst-case, best-case, different pricing, higher and lower churn, more or less inflow, changed demographics, stock market crashes, lost products, new products, possible regulations, and so forth, during the last six years with this simulation engine. All with more than fifty different measures forecasted, many monetary, to the celebration of management. Simulations replaced budgeting, simulations stress test the business on a yearly basis, simulations are used to price products, simulations are used to calculate ROI, simulations are used every time something unexpected happens in the market, and above all simulations have this company prepared.
We have turned “what-if” into “if-what” — action plans of “what” to do should the “if” come to pass. I believe this is the natural next step for all of you doing machine learning now, but who have not yet enriched it using game theoretical simulations. In all honesty, I am a bit perplexed why I haven’t heard of anyone else doing this yet. Amazon recently showed off some new forecasting engine, so maybe simulations will become more mainstream. On a side note, predicting 50 forecast units 30 periods into the future for 10 million entities, which is what we frequently do, will with Amazon’s pricing cost 50 * 30 * 10000000 / 1000 * $0.60 = $9 million per simulation. This alone is more than the cost of the entire simulation engine over its six year lifetime so far.
If you want to know more about simulations, don’t hesistate to contact me. You can also read more on the homepage at http://www.uptochange.com. Up to Change is also sponsoring work on Anchor modeling.