Premature Optimization and the Valley of Confusion | Tom Cunningham

When formalizing tradeoffs our decisions typically get worse before they get better.

Those who climb the mountain of efficiency first pass through the valley of confusion.

Data scientists are often asked to provide a formula to calculate the expected costs and benefits of a decision but this is hard. There are often many subtleties which we grasp intuitively but do not know how to formalize. For this reason using a simple model is often worse than no model.

Some examples of common mistakes:

Causal effects assumed to be equal to correlations.
Decisions depend on a sharp threshold of statistical significance.
Decisions depend on average rather than marginal value.
Lack of accounting for opportunity cost.
Lack of accounting for option value.
Lack of accounting for network effects.
Confusion over whether an outcome is intrinsically desirable or instrumentally desirable.
Estimating the topline impact of product improvements with estimates from AB tests (i.e. accounting only on effects for existing users, not new users).

These are all avoidable mistakes but it takes time and experience to model them correctly. In fact many of the tools we use to formalize optimization problems are relatively recent discoveries: over the last few centuries we figured out how to write down probabilities and expected value, linear programming, and dynamic programming. But before these discoveries we were still able to make quite subtle and complex decisions – we were able to build pyramids, welfare states, aeroplanes – our informal and intuitive methods were sufficient.

This is independent of whether decisions should be formalized. There are some situation where no formalization of costs and benefits will be sufficient, for various reasons. My point is different: even when we truly care only about a well-defined outcome it’s often better to rely on our instincts than to use a model if we don’t have time to deeply invest in that model.

It’s a common idea that formalizing the costs and benefits of a decision can lead to worse decision-making. People talk about “MacNamara’s fallacy,” but this can be taken in two ways: whether problems are intrinsically unquantifiable or just difficult to quantify. In tech companies I think many problems are quantifiable in principle but it’s sufficiently hard that it’s sensible to leave off quantifying until you’re confident that you have a good model.

The process of formalizing a decision often looks like a dialogue. When you try to write down a model to represent the tradeoffs of a situation very often it feels like a dialogue with your intuition: you first write a simple model, and see that it implies that we should be making a radically different decision. You try to think through, intuitively, the implications of that decision, and see that there is some important factor that the model is missing, and so revise the model, and then look again at the implications.

Avoiding the valley of confusion. I have been talking about the choice between using judgment or relying on a model. There is an alternative route: instead of replacing judgment the goal should be to augment it. Broadly speaking by trying to summarize data in a way that helps inform intuitions about tradeoffs. If we continue this process we may end up at a comprehensive formal model which we can trust.

Some examples:

If we are trying to improve decisions about which experiments to ship then we can build a table of prior experiments, and for each new experiment summarize how it compares with prior ones. (My post on experiment interpretation expands on this.)
If we are trying to improve decisions about the value of metrics as surrogates for causal effects, we can start by visualizing the distributions of each metric and the correlations between them to help build judgment about how they relate to user experience.
If we are trying to improve evaluation of a given initiative we should start by mapping out prior initatives and compare their inputs (e.g. headcount, time) and outputs (metrics), to benchmark estimates.

General considerations that apply to all evaluations:

All models should be described as fundamentally aids to human judgment, and it’s always legitimate to override their recommendations.
Model output should not be described with a single number but a visualization to show how the broad patterns in data.
We should summarize what decisions other people have made in similar situations, both inside and outside the company.