TL;DR: The experiments run by Meta during the 2020 elections were not big enough to test the theory that social media has made a substantial contribution to polarization in the US. Nevertheless there are other reasons to doubt it.

Summary

Thanks to Dean Eckles, Solomon Messing, Jeff Allen, & Brandon Silverman for discussion which led to this post. I put together the spreadsheet summary of results with Dean and Solomon. See also a post by Dean.

¹ Allcott et al. (2019), see below for discussion of whether these standard deviations are comparable.

Three new experiments show that changing Facebook’s feed ranking algorithm for 1.5 months has an effect on affective polarization of less than 0.03 standard deviations. This is small compared to a growth of 1.1 standard deviations in nationwide affective polarization over the last 40 years.¹

Small effects in these experiments are consistent with large effects in aggregate. The aggregate contribution of social media to polarization will differ from these experimental estimates in a number of ways: depth, breadth, duration, timing, category, and population. My rough attempts to account for these considerations make me think the aggregate effect is likely 10 or 20 times larger than the effects that would be measured in these experiments, and so small effects in these experiments are consistent with large effects on aggregate.

Put simply: these experiments measure the effect of reducing exposure of an individual user (not their friends and family) to political content on Facebook by 15% for 1.5 months, and occurred in a period after Facebook had already sharply reduced the amount of partisan content circulating. Thus we should expect them to measure only a small fraction of the cumulative impact of social media, and in fact these results are consistent with social media being entirely responsible for the growth of polarization in the US.

Nevertheless other evidence implies that social media has probably not made a huge contribution to US polarization. If we wish to evaluate the balance of evidence relating social media to polarization there are many other sources which are probably more informative than these experiments. I give a rough sketch below and it seems to me social media probably does not account for a majority share, mainly because (1) polarization had been growing for 20 years prior to social media’s introduction, and much of the growth since 2014 was in people without internet access; (2) a lot of partisan discourse continues to spread outside of social media, e.g. through cable TV and talk radio; (3) other countries do not show a similar increase in affective polarization.

Discussion of these results has been distressingly non-quantitative. The majority of discussion of these results (in papers, editorials, on Twitter) has been about whether these changes “have an effect” or “do not have an effect.” Interpreted sympathetically these statements are compressed ways of saying “an effect larger than 0.03 standard deviations.” However I think taking this shortcut so consistently has led to far too little time thinking about what we have learned from these experiments that we didn’t already know, and what is the balance of evidence regarding the effects of social media. I give a lot of examples below.

The Experiments

Last week’s papers reported the results of three experiments on Facebook’s News Feed. The experiments (Guess et al. (2023b), Guess et al. (2023a), Nyhan et al. (2023)) were run between September and December 2020, and half-way through participants were asked about their feelings towards members of their own party and the opposing party, e.g. “how warm do you feel about Republicans on a scale of 0-100?”² The answers were aggregated to make an index of “affective polarization”: \[\xymatrix@R=0em@C=6em{ *+[F:<5pt>]\txt{rank items on News\\Feed chronologically} \ar[dr] & \\ *+[F:<5pt>]\txt{remove reshares\\on News Feed} \ar[r] & *+[F:<5pt>]\txt{affective\\polarization\\survey}\\ *+[F:<5pt>]\txt{downrank likeminded\\items on News Feed} \ar[ur] } \]

² Although the treatments ran for 3 months (24 Sep–23 Dec 2020), the survey responses were collected during the experiment and the average survey measure was measured after around 1.5 months of treatment: see Figure S2 in the Supplementary Appendix.

All three experiments found effects on polarization of less than 0.03 standard deviations (SDs). The 95% confidence intervals on affective polarization are approximately \(\pm\) 0.03 SDs, and the effect-sizes are all smaller than that (i.e. they do not estimate a significant effect). Dean Eckles, Solomon Messing, and myself put together a spreadsheet summary of the results from all the experiments reported so far, along with other results from the literature on political effects of media.

They also measured effects on a number of other off-platform outcomes: removing reshares did lower news knowledge by 0.07 standard deviations, but all other outcomes (factual discernment, issue polarization, perceived legitimacy, self-reported turnout) were not significant, and had similar-sized confidence intervals.

Extrapolating to the Cumulative Effect of Social Media

Many people have interpreted these results as implying that social media has not had much effect on overall polarization. E.g. one of the experimental papers says:³

³ Guess et al. (2023b)

“these findings suggest that social media algorithms may not be the root cause of phenomena such as increasing political polarization.”

Here I try to extrapolate from these experiments to the long-run aggregate effect of social media. The comparison is between two extremes but there are a lot of other intermediate estimands that we could alternatively use, e.g. the effect of permanantly disabling just Facebook for everybody, or the effect of temporarily disabling all social networks for an individual user.

These are difficult judgment calls. I have tried my best to be neutral and discuss evidence on either side but it’s likely I’m forgetting some important considerations.

I work through five ways in which the experimental results will differ from the aggregate impact on social media:

Depth. Whether changing one feature or disabling the app entirely.
Breadth. Whether changing the experience for one user or for all users.
Duration. Whether changing the experience for 1.5 months or for the whole history of social media.
Timing. Whether changing the experience in Oct 2020, or the average effect over 2004-2020.
Category. Whether changing the experience just for Facebook or for all social media.
Population. Whether we are estimating the effect for all US adults or just Facebook users.

I try to give quantitative estimates for each of these five differences, and it makes me think that having tight confidence intervals on the effects of the experiments (plus or minus 0.03 SDs) is still consistent with the aggregate effect of social media being having an effect as large as 1 SD or more.⁴

⁴ I am considering the effect-size rather than the uncertainty, you could separately do a similar exercise to propagate up the uncertainty, and I think this would make the experiments seem even more under-powered to measure the aggregate effect.

(1) Depth: the experiments have small effects on exposure. Each of the experiments reported have effects on overall Facebook time-spent of less than 25%, and on exposure to political material of less than 15%. Thus the effect of complete withdrawal from Facebook seems likely to be at least 2X larger than measured by any of these experiments. The most natural causal path from Facebook use to polarization is exposure to partisan or misleading political media. An additional experiment was run which deactivated peoples’ accounts but the results from that experiment are not yet public (as of August 4).

effects on metric⁵	time-spent	political impression	cross-cutting impressions	untrustworthy impressions
baseline	?	14pp	21pp	3pp

- rank chronologically	-21%	+12%	-10%	+60%
- remove reshares	-5%	-14%	-3%	-32%
- downrank likeminded posts	-1%	-5%	+7%	?

⁵ source data.

(2) Breadth: Experiments exclude network effects. The effects of social media on polarization likely work not just through direct exposure but also downstream via peoples’ interactions with friends and families, in both online and offline conversations. Thus it seems likely the aggregate effect could be 2X or larger than the individual effect.

(3) Duration: the experiments only measure short-run effects. These experiments measured the effect of a News Feed change on polarization after around 1.5 months, while most American adults have been using Facebook for perhaps 10 years.⁶ It is hard to judge how quickly we should expect polarization attitudes to respond to treatment, and I have not found useful academic literature. The national polarization trends documented in Allcott et al. (2019) seem fairly stable despite a volatile news cycle suggesting attitudes change relatively slowly. If the half-life of adjustment was 1.5 months (which seems quite short to me) then the effects measured in these experiments would be half of the long-run effect. It seems likely that the effects of exposure do not decay at a constant rate: there is a short-run component that decays quickly (the effect of salience), and a long-run component that decays slowly.⁷

⁶ The polarization survey measures were collected in wave 3, 4, and 5. From eyeballing Figure S3 (and assuming the response rate declines over time) it appears the average response would be collected around 1.5 months after the treatment began: .

⁷ The only dynamic effects discussion I could find was wave-by-wave results for survey questions in the supplementary appendices. I think it would be useful to show the dynamic effects for on-platform behavior because (1) there is good reason to expect the cumulative treatment effects will dramatically change over time, (2) the experiments are sufficiently well-powered that dynamic effects should be easy to observe.

Guess et al. (2023b) notes this limitation:

“It is possible that such downstream effects require a more sustained intervention period … although our approximately 3-month study had a much longer duration than that of most experimental research in political communication.”

Although if I understand correctly this is slightly misleading: the outcome variables were measured after 1.5 months exposure, not 3 months.

(4) Timing: Experiments were run during the lead-up to the 2020 election. The experiments ran between September and November 2020. If we compare this to the average experience on Facebook over the previous decade there are reasons to expect relatively smaller effects on polarization:

Following the 2016 elections facebook invested very heavily in integrity systems reducing prevalence of many types of bad content by factors of between 2X and 10X, especially misinformation and hyperpartisan political content. In May 2020 Guy Rosen claimed that Facebook had made “a number of important steps to reduce the amount of content that could drive polarization on our platform” over the prior years.
Allcott et al. (2019) estimates that exposure to misinformation on Facebook (measured by data on engagement with domains known to host misinformation) grew over 2015 and 2016, roughly doubling, then fell over 2017 and 2018, roughly halving. The data ends at the end of 2018 but I believe the trend would continue downward.
Meta’s Community Standards reports show a decline in prevalence of most types of harmful content by a factor of between 2 and 5 over roughly 2017 to 2022 (see chart here).
Prior to and during the 2020 election Facebook implemented a series of extra “break the glass” measures with the effect of suppressing extreme or fringe political content.
During election seasons there tends to be significantly more political content circulating. This might mean that Facebook would have a relatively larger impact on polarization in this period. However if the influence of social media on polarization depends on the share of exposure to partisan or polarizing content (rather than the level) then the effect would be the same in election season as outside election season.

In fact 2020 Facebook has further reduced the prevalence of political and fringe content since 2020:

The share of politics on News Feed was reduced by 50%.⁸
Prevalence of hate-speech fell by factor of 5 between 2020 and 2022 (from 0.1% to 0.02%). (ref).
Engagement on US right-wing politics pages has fallen by factor of 4 from 2021-2022 (ref).
Prevalence of engagement-bait among the top 20 most-viewed posts went from 100% to 5% between 2021Q3 to 2022Q3 (ref).

⁸ The WSJ reported that in late 2021 “Mr. Zuckerberg and the board chose the most drastic [option], instructing the company to demote posts on “sensitive” [(politics and health)] topics as much as possible in the newsfeed that greets users when they open the app”, and that in 2022 “politics accounts for less than 3% of total content views in users’ newsfeed, down from 6% around the time of the 2020 election.” The article reports that these experiments reduced daily visitation (daily active users) by 0.2%.

(5) Category: The experiments affected only Facebook, but in 2020 Facebook probably accounted for around 1/4 of all partisan political content that people are exposed to on social media. If we include YouTube, TikTok, Instagram, Twitter, Snapchat, Reddit, and the long tail of niche social networks. In contrast, if we are estimating the cumulative effect (2004-2020) then Facebook would likely comprise a significantly larger share of exposure political content.

(6) Population: The experiments measure outcomes only on Facebook users. I believe that the “population average treatment effects” reported in the papers are weighted to match the Facebook-using population, not the voting population. This would be a reason for the experimental effect-size to be larger than the aggregate effect. I would guess around 2/3 of the US adult population is active once/month on Facebook, and so the aggregate effect-size could be smaller by that factor.⁹

⁹ The supplement to the Science paper mentions Facebook had 231 million monthly active users.

Putting it all together. If factors 1-5 each contributed a 2X amplification, as well as factor 6 contributing a 2/3 shrinkage, then the cumulative effect of social media on polarization until 2020 would be 20X larger than the experimentally-measured effect, i.e. effective confidence intervals would be 0.6 SDs instead of 0.03 SDs. In other words these experiments would not be sufficiently well-powered to rule out social media being responsible for the entire growth of polarization since 2014.

Note: adjusting for standard deviation size. The papers all report effect sizes on affective polarization in units of standard deviations. I wasn’t sure whether these are standard-deviations of the cross-sectional variance, or the residual variance after controlling for pre-treatment values. If the latter then I would guess they are perhaps half the size of the cross-section SD, based on my professional experience (and Supplementary Appendix page S-140). If correct this would halve the estimated effect-size when expressed in cross-sectional standard deviations, i.e. it would close the gap by a factor of 2.

Other Evidence on Media and Polarization

Here is a rough sketch of the evidence of related to affective polarization. I do not consider myself an expert on this literature and I would love corrections or additions. On balance this evidence seems to imply that social media hasn’t been the primary contributor to US affective polarization, but I think a thorough analysis of this evidence would be really valuable.

News Sources: From Pew data I would guess social media is around 25% of all exposure to political news, probably a higher share of exposure to partisan political news. Cable TV and political talk radio probably account for similar shares of overall exposure to partisan media. This seems the strongest evidence that social media is not the primary driver of affective polarization.
Professional Opinion: The political science literature talks about “the paradox of minimal effects” and the economics-of-media literature generally seems to have a consensus that most persuasive effects of media are small. However this might just apply to marginal effects.
Other Experiments: Allcott et al. (2020) is often interpreted as finding an effect on affective polarization but it does not (see below). Broockman and Kalla (2022) finds a null effect. I don’t know of other good experiments on affective polarization.
National Trends: In the US affective polarization steadily grew 1978-2020, for a total of 1.1 SD over 40 years. Other countries do not show a consistent trend, and there is no clear connection with internet access or online news consumption.
Demographic Trends: Over 1996-2012 affective polarization grew the most in groups who have not increased their internet access. I’m not aware of more recent data.

Discussion

Trends in affective polarization. Boxell et al. (2022) document affective polarization across a dozen countries, 1978-2020:

In the US affective polarization index increased from around 25 to 50, “an increase of 1.08 standard deviations as measured in the 1978 distribution.” (I’m not sure if the SD increased).
Across the world there’s no clear trend: some countries increased, other countries decreased. This weakens the simple argument that polarization has increased at the same time as social media use.
In the US the trend seems to be almost entirely due to increasing negative feelings about the opposing party:

The US timeseries can be seen online from the ANES.

Observational data finds that much of the growth in polarization in the US was among people who were not online. Boxell et al. (2017) say

“the growth in polarization in recent years [1996-2012] is largest for the demographic groups least likely to use the internet and social media”

Content on Meta platforms. Guess et al. (2023b) has data from the control group in their 2020 experiments:

Share of Impressions	Facebook	Instagram
Political content	14%	5%
Political news content	6%	-
Content from untrustworthy sources	3%	1%
Uncivil content	3%	2%

Pew 2022 has data on where people get their news from:

	pct adults regularly get news from
television	65%
news websites	63%
search	60%
social media	50%
radio	47%
print	33%
podcasts	23%

Radio show popularity. Around half of the top 20 most-listened radio shows in the US are conservative talk, with around 90M weekly listeners (this is double-counting overlapping users). Data from 2021.

Television. Fox News is Cable TV’s most-watched network with around 5M regular viewers. (source from 2016).

Time spent on social media. Statista: Average time-spent 150 minutes/day/person on social networks

The academic literature has identified other possible causes of polarization. Some potential causes: southern realignment, 1968 changes to the primary system, the Obama presidency, the tea party movement (though each of these could be in part proximal causes). Martin & Yurcoglu (2017) argue that a large part of recent growth is due to cable news: > “the cable news channels can explain an increase in political polarization of similar size to that observed in the US population over [2000-2008]. … In absolute terms, however, this increase is fairly small.”

See also Haidt and Bail’s long document Social Media and Political Dysfunction: A Collaborative Review

Does Allcott et al. (2020) find that Facebook use increases polarization? This paper reports on an experiment paying people to stop using Facebook for a month. They find an effect of -0.16 SDs (\(\pm\) 0.08) on a measure they describe as “political polarization,” however there are some subtleties:

Unlike the questions used in typical population surveys the questions were explicitly about their feelings during the period of the experiment, e.g. “Thinking back over the last 4 weeks, how warm or cold did you feel towards the parties and the president on the feeling thermometer?”
Polarization is measured by a composite of different measures. By far the largest effect was on the “congenial news exposure” question: “over the last 4 weeks how often did you see news that made you better understand the point of view of the Democrat (Republican) party?” The score was the difference between the answer for their own party vs the other-side party. It seems to me that it’s not surprising that deactivating Facebook would affect one’s exposure to such news, but that this wouldn’t normally be called a measure of “polarization” in the literature. The paper mentions in a footnote that “the effect on the political polarization index is robust to excluding each of the seven individual component variables,” but it turns out that removing “congenial news exposure” halves the effect-size and shifts the p-value from 0.00 to 0.09 (i.e. from very significant to non-significant). I’m not sure I would describe this as a finding that is “robust”.
The paper finds no significant effect on their two “affective polarization” measures (-0.08 \(\pm\) 0.08 SD, and 0 \(\pm\) 0.04 SD), however the 2020 papers which cite Allcott et al. (2020) seem to treat it as finding that Facebook has a positive effect on “polarization” without noting that it has a null effect on affective polarization.

Quantitative vs Qualitative Description of Results

Throughout these papers and in the public discussion the findings have been described in qualitative terms: i.e. either as “positive,” “negative,” or “neutral.” Implicitly these terms are referring to whether the results are statistically-significant (p<0.05), which depends on whether the effect-size is bigger than the confidence interval. These statements only make sense given some implicit understanding of how broad the confidence intervals are, yet I do not think that implicit understanding exists: I’m fairly confident that most people reading these statements (and many people making them) do not know quantitatively what the thresholds are.¹⁰

¹⁰ It’s worth stating that all of these treaments will have some non-zero effect, so it’s never literally correct to say “this treatement has no effect on polarization,” it can only be understood as a roundabout way of saying “this treatment has a small effect” for some definition of “small”.

Titles and abstracts used qualitative descriptions. The titles and abstracts all used qualitative language, e.g. “did not reduce” or “did not significantly affect” or “had no measurable effects.” None of the abstracts of the papers gave information on the size of the effects that were ruled out.
Hypotheses used qualitative descriptions. The pre-analysis plans contained a series of hypotheses, e.g.:

H1: Decreased exposure to content shared by like-minded friends, Pages, and groups decreases affective polarization.

H1: Reverse chronological feed will reduce polarization and negative perceptions of out-groups.

The terms “decrease” and “reduce” are presumably implicitly referring to the width of the confidence intervals, but I could find no discussion of how much .
Public discussion used qualitative descriptions. Almost all discussion in editorials and on Twitter described the results in qualitative terms, whether there was an effect or not, not in quantitative terms.
Elicitation of priors used qualitative descriptions. I was at an SSRC conference a few days before the results were released and there was a poll taken to predict the results. For “polarization” the options were (as I recall) “no effect”, “small increase”, “substantial increase”, etc., where I believe “increase” was intended to be interpreted as “statistically significant increase.” However as I recall we were not told the width of the confidence intervals when asked to make predictions. I think this is a bad way of eliciting priors: whether something is significant depends on the width of the confidence intervals as much as the effect-size. Thus an equivalent way of phrasing the question would be “do you think these experiments are sufficiently powered?”¹¹
Power calculations used qualitative descriptions.

Guess et al. (2023b) said that there was sufficient power to detect “small” effects, without explaining why they regarded 0.03 SD as small.¹² The supplement and pre-analysis plan do not mention “power” or seem to discuss the quantitative interpretation of these effect-sizes.

They do cite a previous paper:

“In all cases, we could rule out effect sizes smaller than those found in previous research [citation to Allcott 2020]

However I beleive this is a misinterpretation: Allcott et al. (2020) does test for affective polarization but they find a non-significant effect. As discussed above that paper reports a significant effect for “polarization” but the significance is due solely to the response to asking people about “congenial news exposure” over the last 4 weeks, which I think is quite different from polarization.

The Supplementary Appendix to Guess et al. (2023b) says the sample size was chosen to detect an effect size of 1.5 percentage points in vote choice (p. S-139), however it is not clear why this effect size was chosen.

Guess et al. (2023a) says explicitly that they (or someone) expected a significant effect:

“Contrary to expectations, the treatment does not significantly affect political polarization or any measure of individual-level political attitudes.”

However I did not find a discussion of why they expected an effect of that size.

¹¹ A similar phenomenon occurs in forecasting: if someone asks you a question for which you miss crucial context like “what is the chance of the Grockles winning the Kaplooey cup?” then you can give a good answer but it will be based on your judgment of the person asking the question, not your judgment about the substance of the question itself.

¹² “The large samples … allowed for adequate statistical power to detect small effects (for example, for affective polarization, we were powered to detect population average treatment effects with Cohen’s d = 0.032 or larger for both Facebook and Instagram).”

Good quantitative work. Some of the authors of these 2020 papers have written other papers which I think use a much more useful approach: they use observational data, are quite focussed on quantitative outcomes, and they perform back-of-the-envelope calculations to reconcile evidence from different sources, e.g. Boxell et al. (2022), Boxell et al. (2017), Allcott and Gentzkow (2017).

References

Allcott, H., Braghieri, L., Eichmeyer, S., Gentzkow, M., 2020. The welfare effects of social media. American Economic Review 110, 629–676.

Allcott, H., Gentzkow, M., 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 211–236.

Allcott, H., Gentzkow, M., Yu, C., 2019. Trends in the diffusion of misinformation on social media. Research & Politics 6, 2053168019848554.

Boxell, L., Gentzkow, M., Shapiro, J.M., 2022. Cross-Country Trends in Affective Polarization. The Review of Economics and Statistics 1–60. https://doi.org/10.1162/rest_a_01160

Boxell, L., Gentzkow, M., Shapiro, J.M., 2017. Greater internet use is not associated with faster growth in political polarization among US demographic groups. Proceedings of the National Academy of Sciences 114, 10612–10617.

Broockman, D., Kalla, J., 2022. Consuming cross-cutting media causes learning and moderates attitudes: A field experiment with fox news viewers. https://doi.org/10.31219/osf.io/jrw26

Guess, A.M., Malhotra, N., Pan, J., Barberá, P., Allcott, H., Brown, T., Crespo-Tenorio, A., Dimmery, D., Freelon, D., Gentzkow, M., 2023a. Reshares on social media amplify political news but do not detectably affect beliefs or opinions. Science 381, 404–408.

Guess, A.M., Malhotra, N., Pan, J., Barberá, P., Allcott, H., Brown, T., Crespo-Tenorio, A., Dimmery, D., Freelon, D., Gentzkow, M., González-Bailón, S., Kennedy, E., Kim, Y.M., Lazer, D., Moehler, D., Nyhan, B., Rivera, C.V., Settle, J., Thomas, D.R., Thorson, E., Tromble, R., Wilkins, A., Wojcieszak, M., Xiong, B., Jonge, C.K. de, Franco, A., Mason, W., Stroud, N.J., Tucker, J.A., 2023b. How do social media feed algorithms affect attitudes and behavior in an election campaign? Science 381, 398–404. https://doi.org/10.1126/science.abp9364

Nyhan, B., Settle, J., Thorson, E., Wojcieszak, M., Barberá, P., Chen, A.Y., Allcott, H., Brown, T., Crespo-Tenorio, A., Dimmery, D., Freelon, D., Gentzkow, M., González-Bailón, S., Guess, A.M., Kennedy, E., Kim, Y.M., Lazer, D., Malhotra, N., Moehler, D., Pan, J., Thomas, D.R., Tromble, R., Rivera, C.V., Wilkins, A., Xiong, B., Jonge, C.K. de, Franco, A., Mason, W., Stroud, N.J., Tucker, J.A., 2023. Like-minded sources on facebook are prevalent but not polarizing. Nature 620, 137–144. https://doi.org/10.1038/s41586-023-06297-w