Ranking by Engagement

Tom Cunningham, May 8 2023.1

  • 1 tom.cunningham@gmail, @testingham. I worked at FB for 5 years, and Twitter for 1 year, now affiliated with the Integrity Institute. This note entirely based on public information. Thanks to comments from Jeff Allen, Jacquelyn Zehner, David Evan Harris, Jonathan Stray, and others.

  • Six observations on ranking by engagement:

    1. Internet platforms rank content primarily by the predicted probability of engagement. Platforms show each user the items that are most likely to make the user click, or reply, or retweet, etc.2

    2. Platforms rank by engagement because it increases user retention. In experiments which compare engagement-ranked feeds to unranked feeds (“chronological” feeds) the users with engagement-ranked feeds consistently show substantially higher long-run retention (DAU) and time-spent. Platforms care about engagement not in itself but as a means to an end, and when faced with a tradeoff between engagement and retention would choose retention.

    3. Engagement is negatively related to quality. The content with the highest predicted engagement very often has low scores by various measures of objective quality: clickbait, spam, scams, misleading headlines, copied content, inauthentic content, and misinformation. Intuitively this is because engagement only measures immediate appeal, and the most appealing content can be the most disappointing. Low quality content typically hurts retention, and as a consequence platforms often supplement their engagement-based ranking algorithms with a range of proxies for content quality.

    4. Sensitive content is often both engaging and retentive. Engagement-ranked feeds often increase the prevalence of various types of “sensitive” content: nudity, bad language, abuse, hate speech, hyper-partisan politics, etc.. However unlike low-quality content, reducing the prevalence of sensitive content often hurts retention, implying sensitivity is positively correlated with retention.

    5. Sensitive content is often preferred by users. Platforms have tried out many experiments with asking users directly for their preferences over content. The results have been mixed, and platforms have often been disappointed to find that users express fairly positive attitudes towards content that the platform considers sensitive.

    6. Platforms don’t want sensitive content but don’t want to be seen to be removing it. Platform decision-makers often have principled reasons for limiting the distribution of certain types of sensitive content. Additionally there are instrumental reasons: sensitive content attracts negative attention from the media, advertisers, app stores, politicians, regulators, and investors. But platforms are also liable to get negative attention if they make substantive judgments about the sensitivity of content, especially when it has some political dimension. As a consequence platforms often target sensitive content indirectly by using proxies, and they prefer to justify their decision-making by appealing to user preferences or to user retention.

  • 2 In this note I’m using “engagement” to refer to individual actions not user-level metrics like time-spent or DAU.

  • In an appendix I formalize the argument. I show that all these observations can be expressed as covariances between different properties of content, e.g. between the retentiveness, predicted engagement rates, and other measures of content quality. From those covariances we can derive Pareto frontiers and visualize how platforms are trading-off between different outcomes.




    Argument in Detail

    1. Talking about ranking is complicated. To help simplify things I bucket attributes of content into five types:

      1. Engagement: the predicted probability of a user clicking, commenting, retweeting, etc., on a specific piece of content.
      2. Retentiveness: the causal contribution of seeing the content on a specific user’s long-term retention (e.g. DAU). Unlike the other attributes this can never be directly observed, only inferred from experiments.
      3. Quality: some objective measure of quality, e.g. whether fact-checked, whether the headline is misleading, whether the linked website has a high ad-load, whether the source is trustworthy, etc..
      4. Sensitivity: whether the content could be offensive, harmful, corrosive – e.g. nudity, bad language, abuse, hate speech.
      5. Preference: the user’s response to a survey question, e.g. “do you want to see more of this type of content?”

      Note that “quality” and “sensitivity” apply to pieces of content, while the other three attributes apply to relationship between a user and a piece of content.

    2. Social media platforms rank their content primarily by predicted engagement. The core ranking model for most social platforms is a weighted average of predicted engagement rates.3

      However ranking functions also include hundreds of other tweaks incorporating non-engagement features, upranking or downranking content depending on, for example, the media type (photo/text/video), the relationship between the user and the author (whether you follow this person), various predictions of of objective quality (classifiers predicting whether the content is spam, offensive, adult, misinformation, etc.), or other features (network centrality, off-platform popularity, etc.). They also often have some diversity rules to prevent the content that is shown from being too similar.4

      Ranking by popularity is common for other media: we look at lists of bestsellers, most popular, highest grossing, most watched, or top charting. Attention is limited and it would be inefficient to offer people a random selection of everything that’s available.

    3. Predicted engagement rates are mostly historical engagement rates. By far the most important predictors of whether a user will engage with a piece of content are (1) this user’s hist