Bias, Skew, and Search Engines Are Sufficient to Explain Online Toxicity
Henry Farrell,Cosma Shalizi
DOI: https://doi.org/10.1145/3624715
IF: 22.7
2024-03-26
Communications of the ACM
Abstract:U.S. political discourse seems to have fissioned into discrete bubbles, each reflecting its own distorted image of the world. Many blame machine-learning algorithms that purportedly maximize "engagement"—serving up content that keeps YouTube or Facebook users watching videos or scrolling through their feeds—for radicalizing users or strengthening their partisanship. Sociologist Shoshana Zuboff 15 even argues that "surveillance capitalism" uses optimized algorithmic feedback for "automated behavioral modification" at scale, writing the "music" that users then "dance" to. There is debate whether such algorithms in fact maximize engagement (their objective functions also typically contain other desiderata). More recent research 3 offers an alternative explanation, suggesting that people consume this content because they want it, independent of the algorithm. It is impossible to tell which is right, because we cannot readily distinguish the consequences of machine learning from users' preexisting proclivities. How much demand comes from algorithms that maximize on engagement or some other commercially valuable objective function, and how much would persist if people got information some other way? Even if we cannot answer this question in any definitive way, we need to do the best we can. There are many possible interface technologies that can help organize vast distributed repositories of knowledge and culture like the Web. These include: Traditional systems of categorization (such as the Dewey Decimal System, or the original Yahoo!) Systems such as Wikipedia and Reddit in which human volunteers collate, organize, present and revise information, providing an information resource, and a means for searching it, and human-selected links to external sources. "Traditional" search algorithms like Google's original PageRank algorithm 7 that rank items in terms of relevance, estimated as a function of the text of the options and the query, the number and "quality" of inbound links, etc. Modern social media algorithms: machine-learning driven systems that rank content to maximize some observable notion of users' engagement with it or other profit-related measure, updating the ranking model depending on user responses to the options presented. Large language model-driven interfaces that generate outputs based on a set of statistical weights that lossily summarize some larger corpus of text and associated data. If some of these interfaces lead to the kinds of toxicity (most particularly, distorted or false beliefs) that plague online political discussion in the U.S. we really want to know it. For example, if Zuboff is right, our politics would be much better if we had not adopted the kinds of social media algorithms that she worries about, and might be dramatically improved if we reverted to earlier, simpler interfaces. If social media algorithms are primarily to blame for fractured discourse, then curbing them might make the Internet safer for democracy. If people still find distorted information when "algorithmic rabbit holes" 3 are not there, then curbing such algorithms would have less benefit, and perhaps even none at all. Answering such questions involves comparing different interfaces with each other, to figure out which kinds of social and political consequences might be associated with each kind of interface. A Thought Experiment: The Internet without Modern Algorithms Without good data (and appropriate statistical tools: social networks can seem designed to impede causal inference), we will resort to a thought experiment. How would the Internet affect democracy if modern social media algorithms were not a key interface through which people find content? Specifically, what would have happened if machine learning had not been used, and we had remained in the Internet circa 2012? A thought experiment like this uses a simple model to compare the likely outcomes associated with different interfaces. Such models have obvious limitations. They strip out most of the features of complex phenomena, focusing on some causal relationships rather than others. But they also force modelers to clarify their intuitions, and can have considerable explanatory benefits if they focus on the right causal relationships. Scholars of complexity such as Scott Page 8 advocate acquiring a rich portfolio of models, but urge that each individual model, to be useful, must be "simple enough that within it we can apply logic." We want our thought experiment to be psychologically realistic. Whe -Abstract Truncated-
computer science, theory & methods, software engineering, hardware & architecture