Thinking About Tradeoffs? Draw an Ellipse

Author

Tom Cunningham, Integrity Institute

Published

October 25, 2023

This material was first presented at MIT CODE 2021. Thanks to Sean Taylor among others for comments.

Thinking about tradeoffs? draw an ellipse. When making a tradeoff between two outcomes, \(X\) and \(Y\), it’s useful to sketch out what the tradeoff looks like, and an ellipse is a good first-order approximation. The ellipse helps visualize the most interesting parameter: the tightness, i.e. how much the tradeoff varies as you vary the weights.

Choosing launch criteria? draw an ellipse. You can think of the set of features (or experiments) as each having some metric impact, \(\Delta X\) and \(\Delta Y\). It is useful to sketch the combined impact of \(\Delta X\) and \(\Delta Y\) that are achievable (the Pareto frontier), it will often look like an ellipse. Additionally you can prove that the Pareto frontier will be exactly an ellipse if the set of experiment-effects have a joint Normal distribution.

Choosing ranking weights? draw an ellipse. When ranking items in a recommender there’s always some tradeoff between different outcomes. We can prove that the aggregate tradeoff is exactly an ellipse if the outcome is additive and the individual item-level features are Normally distributed.

Allocating headcount? draw an ellipse. When you shuffle headcount around a company it’s hard to precisely measure the impact on different goals, but it’s still useful to sketch ellipses to characterize the tradeoffs you face. Most usefully this helps characterize the effect of changing goals on within-team and between-team reallocation of effort.

Tight and Loose Tradeoffs

Suppose we are considering how much weight to put on metric \(X\) vs metric \(Y\) when making a product choice. We can characterize the effect of variation in the weight by drawing a Pareto frontier: the efficient outcome will be the point on the Pareto frontier which has a slope equal to the ratio of weights on Y and X (i.e. where it is tangent with an indifference curve). We can distinguish between “tight” and “loose” Pareto frontiers. The first figure shows a tight tradeoff: in this case it doesn’t matter whether we maximize X or Y or a weighted average, we’ll end up in roughly the same place anyway. The second figure shows a loose tradeoff: in this case the outcome does depend substantially on the relative weight we put on \(X\) and \(Y\). Thus if, among experiments, we observe a high positive correlation between \(\Delta X\) and \(\Delta Y\) then the choice of shipping criteria is relatively unimportant: we should ship the same experiments anyway. Similarly if, among items in a recommender, we observe a high positive correlation between prediction of the two outcomes, then the choice of weights is relatively unimportant: we would show the same items anyway.




Ellipses for Experiments

Suppose we have a set of experiments, each of which can be characterized by its impact on two metrics, \(\Delta X\) and \(\Delta Y\). We visualize such a set of experiments at right.

If the set of features is separable, meaning that the impact of each feature is independent of what other features are launched, then a natural question will be the shape of the Pareto frontier formed by all possible combination of experiments.

We show below that, if the distribution of experiments is mean zero and joint normal, then the expected Pareto frontier will be an ellipse, and it will have exactly the shape of an isovalue of the density of experiments. Thus knowing the variance and covariance of experiment results allows us to characterize the nature of the Pareto frontier we face.

Ellipses for Ranking

Suppose we are choosing a fixed set of items to show to a user based on predictions of two outcomes, \(x_1\) and \(x_2\) (e.g. likes and comments, streams and dislikes, etc.). A natural question will be the shape of the Pareto frontier formed by alternative selections of items.

We show below that, if the predictions are well calibrated, the outcomes are independent, and the distribution of prediction obeys a joint Normal distribution, then the Pareto frontier will be an ellipse, and it will have exactly the shape of an isovalue of the density of predictions. Thus knowing the variance and covariance of predictions allows us to exactly characterize the nature of the aggregate tradeoffs we face.

Ellipses for Company Strategy

Tradeoffs are looser higher in the decision hierarchy. We can distinguish between different levels of decision in the hierarchy of the firm: \[\substack{\text{company objective}\\\text{(choose headcount)}} > \substack{\text{team objective}\\\text{(choose projects)}} > \substack{\text{shipping objective}\\\text{(choose experiments)}} > \substack{\text{algorithm objective}\\\text{(choose items)}} \] We can think of each successive level as holding more variables fixed, and so we expect the Pareto frontiers to become successively tighter (Le Chatelier principle). We thus expect the tradeoff to be loosest at the level of overall company objectives, where we reallocate headcount. For this reason we should expect that, if the company as a whole pivots form metric X to metric Y, the principal effect will be a reallocation of effort between products rather than reallocation within products.

Different product areas have different Pareto frontiers. Typically two different product areas will have substantially different ability to affect different metrics, and we may observe a situation like that shown on the right: team A primarily moves metric \(X\), team B primarily moves metric \(Y\).




We can also draw a combined Pareto frontier. Here we add up the effects on each. In this case neither invidual Pareto frontier shows a substantial effect from changing weights (if we restrict weights to be positive), and so accordingly the combined Pareto frontier shows little response to a change in weights. Note that I have not derived an expression for the combined Pareto frontier and here I am using a purely geometric argument. It would be lovely for formalize this part of the argument but I have not yet been able to.









Greater investment will shift Pareto frontiers out. Here we visualize reallocating employees from team B (the frontier shifts in) to team A (the frontier shifts out).









A combined company Pareto frontier will be loose. Here the green curve represents all the possibile outcomes as you shift resources between team A and B: we have now turned a tight tradeoff into a loose tradeoff. In this case this represents that a change in company objectives will be reflected mainly in reallocation of effort between teams rather than within teams.





Appendix: Simulations

In this section I compare Pareto frontiers generated from different distribution of \((X,Y)\) from different joint distribution, and then draw the Pareto frontiers.

The left-hand plot shows the raw distribution, the right-hand plot shows the Pareto frontier.

The first plots confirm the analytical solution for Gaussian outcomes. The other plots show that Pareto frontiers for some other distributions seem relatively egg-shaped, i.e. not-too-far from ellipses.

Joint normal with positive correlation:

Independent laplace:

Common Laplace factor with independent Gaussian noise:

Independent uniform:

Common uniform factor plus independent uniform noise:

Model

Suppose we have a set of items with, \(x_1\) and \(x_2\), distributed Normally: \[\binom{x_1}{x_2}\sim N\left(\binom{0}{0}, \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}\right).\]

We additionally let each item have a score, \(v\), which is simply a weighted sum of the two characteristics (normalizing the weight on the first characteristic to be 1): \[v=x_1 + wx_2.\]

We can write the covariance between the characteristics and the score as follows:

\[Cov\begin{bmatrix}x_1\\x_2\\v\end{bmatrix}= \begin{bmatrix} \sigma^2_1 & \sigma_1\sigma_2\rho & \sigma_1^2+ w\rho\sigma_1\sigma_2 \\ \sigma_1\sigma_2\rho & \sigma_2^2 & \rho\sigma_1\sigma_2+w\sigma_2^2 \\ \sigma_1^2+w\rho\sigma_1\sigma_2 & \rho\sigma_1\sigma_2+w\sigma_2^2 & \sigma_1^2+w^2\sigma_2^2 + 2\rho w\sigma_1\sigma_2 \\ \end{bmatrix}\]

We wish to know the total number of actions of each type, \(X_1\) and \(X_2\), for a given score threshold \(\bar{v}\):

\[\begin{aligned} X_1 &=P(v\geq \bar{v})E[x_1|v\geq \bar{v}] \\ X_2 &=P(v\geq \bar{v})E[x_2|v\geq \bar{v}]. \end{aligned}\]

We first calculate the conditional expectations:

\[\begin{aligned} E[x_1|v\geq \bar{v}] =& \sigma_1 \frac{Cov(x_1,v)}{\sqrt{Var(x_1)Var(v)}} \frac{\phi(\frac{\bar{v}}{\sqrt{Var(v)}})}{\Phi(\frac{\bar{v}}{\sqrt{Var(v)}})} \\ =& \sigma_1 \frac{\sigma_1^2+w\rho\sigma_1\sigma_2} {\sqrt{\sigma_1^2(\sigma_1^2+w^2\sigma_2^2 + 2\rho w\sigma_1\sigma_2)}} \frac{\phi(\frac{\bar{v}}{\sqrt{Var(v)}})}{\Phi(\frac{\bar{v}}{\sqrt{Var(v)}})}\\ =& \frac{\sigma_1^2+w\rho\sigma_1\sigma_2} {\sqrt{\sigma_1^2+w^2\sigma_2^2 + 2\rho w\sigma_1\sigma_2}} \frac{\phi(\frac{\bar{v}}{\sqrt{Var(v)}})}{\Phi(\frac{\bar{v}}{\sqrt{Var(v)}})} \end{aligned}\]

Next we will assume that the expected quantity of items is fixed. This implies that both both \(P(v\geq \bar{v})\) and \(\frac{\bar{v}}{\sqrt{Var(v)}}\) will be constant, and we will define: \[\begin{aligned} \gamma\equiv &\frac{\phi\left(\frac{\bar{v}}{\sqrt{Var(v)}}\right)} {\Phi\left(\frac{\bar{v}}{\sqrt{Var(v)}}\right)}P(w\geq w^*) \\ X_1 =& \frac{\sigma_1^2+w\rho\sigma_1\sigma_2} {\sqrt{\sigma_1^2+w^2\sigma_2^2 + 2\rho w\sigma_1\sigma_2}}\gamma \\ X_2 =& \frac{w\sigma_2^2+\rho\sigma_1\sigma_2} {\sqrt{\sigma_1^2+w^2\sigma_2^2 + 2\rho w\sigma_1\sigma_2}}\gamma \end{aligned}\]

We thus have expressions for \(X_1\) and \(X_2\) as a function of the relative weight \(w\). We wish to rearrange these to express \(X_1\) directly in terms of \(X_2\). To help we turn to Mathematica, with the following input:1

F1[w_,p_,s1_,s2_,g_]:=g(s1^2+w p s1 s2 )/Sqrt[s1^2+w^2 s2^2+2w p s1 s2]
F2[w_,p_,s1_,s2_,g_]:=g(w s2^2 +p s1 s2)/Sqrt[s1^2+w^2 s2^2+2w p s1 s2]
Solve[{X1==F1[w,p,s1,s2,g],X2==F2[w,p,s1,s2,g]}, {X1,w}]
Simplify[First[%1]]

This returns a large expression:

\[\begin{aligned} X_1(X_2) &= \frac{ g^2 \rho s_1 s_2^3 X_2 - p s_1 s_2 X_2^3 + g^3 s_2^4 \sqrt{\frac{-g^2 (-1 + p^2) s_1^2 s_2^2}{g^2 s_2^2 - X_2^2}} - g s_2^2 X_2^2 \sqrt{-\frac{g^2 (-1 + p^2) s_1^2 s_2^2}{ g^2 s_2^2 - X_2^2}} + X_2 \sqrt{(-1 + p^2) s_1^2 s_2^2 X_2^2 (-g^2 s_2^2 + X_2^2)} }{g^2 s_2^4 - s_2^2 X_2^2}\end{aligned}\]

We can however substantially simplify this: \[\begin{aligned} X_1 &= \frac{ s_1s_2X_2 (g^2 \rho s_2^2 - p X_2^2) + g^2s_2^2(g^2 s_2^2- X_2^2) \sqrt{-\frac{(-1 + p^2) s_1^2 s_2^2}{g^2 s_2^2 - X_2^2}} - X_2^2s_1s_2 \sqrt{(p^2-1) (g^2 s_2^2-X_2^2)} }{g^2 s_2^4 - s_2^2 X_2^2} \\ &= \frac{ s_1s_2X_2 p(g^2 s_2^2 - X_2^2) + gs_2^3s_1 g \sqrt{(p^2-1)(g^2 s_2^2 - X_2^2)} - X_2^2s_1s_2 \sqrt{(p^2-1) (g^2 s_2^2-X_2^2)} }{s_2^2(g^2 s_2^2 - X_2^2)} \\ &= \frac{s_1}{s_2}X_2p + \frac{ s_1s_2(g^2s_2^2 - X_2^2 ) \sqrt{(p^2-1) (X_2^2-g^2 s_2^2)} }{s_2^2(g^2 s_2^2 - X_2^2)} \\ &= X_2 \rho \frac{s_1}{s_2} +\frac{s_1}{s_2}\sqrt{(p^2-1) (X_2^2-g^2 s_2^2)}. \end{aligned}\]

We now wish to show that this curve is equal to an isovalue of the joint distribution of \(x_1\) and \(x_2\) (illustrated at right). We can write the isovalue of the joint Normal distribution of \((x_1,x_2)\) as follows:2

  • 2 From Bertsekas and Tsitsiklis (2002) “Introduction to Probability”, Section 4.7

  • \[k = \frac{x_1^2}{\sigma_1^2}+\frac{x_2^2}{\sigma_2^2}-2\rho\frac{x_1x_2}{\sigma_1\sigma_2}.\]

    Solving this quadratic we can write: \[\begin{aligned} x_1 &= x_2 \rho \frac{\sigma_1}{\sigma_2} \pm \frac{\sigma_1}{\sigma_2}\sqrt{-x_2^2+x_2^2\rho^2+k\sigma_2^2} \\ &= x_2 \rho \frac{\sigma_1}{\sigma_2} \pm \frac{\sigma_1}{\sigma_2}\sqrt{k\sigma_2^2-(1-\rho^2)x_2^2}. \end{aligned}\]

    We can see that the two curves will be equal where \(k=\frac{\sigma_2^2}{\sigma_1^2}(\rho^2-1)g^2\).