Recommender systems increasingly determine which articles we read, which videos we watch, even which friends we hear from. They collectively determine how billions of hours of human attention are directed every day.
These systems are generally optimized for very noisy proxies of user value, like click-through rate, time-in-app, and daily active users. While these proxies are good enough to generate an incredible amount of economic value, we should expect them to become misaligned with user values when enough optimization pressure is exerted. Indeed, researchers have investigated a number of negative side-effects from the use of platforms guided by recommenders, including addiction, political polarization, harm to relationships, and reduced cognitive capacity.
Aligning recommender systems is the process of making these systems robustly optimize for their users’ values and interests. This includes individual values such as well-being, learning, and growth, but also collective values such as diversity, fairness, justice, and tradition.
There are two dovetailing motivations to work on the problem of aligning recommenders. The first is the incredible value that could be created in the near-term by directing human attention in a way that helps humans make better decisions, perhaps decisions that they would endorse upon reflection. The second motivation is that today’s recommenders are the forerunners of the far more powerful AI systems that humanity will go on to build. These advanced AI systems will potentially direct far more than just human attention, making decisions about and automating complex human tasks like teaching, strategic planning, and government policy. The output of these systems will be far harder to supervise, but the cost of their misalignment would be far greater, possibly catastrophic. Recommenders are an ideal test bed in which to evaluate proposed methods for aligning advanced AI systems, because they are relatively easy to supervise while engaging with the full complexity of human values.
For a more detailed description of recommender alignment and related issues in AI Safety, see
Aligning Recommender Systems as Cause Area (EA Forum post)
What are you optimizing for? Aligning Recommender Systems with Human Values (paper published at ICML 2020)
Human Compatible: Artificial Intelligence and the Problem of Control (book by Stuart Russell on AI safety)
What failure looks like (Alignment Forum post).
Why create a Recommender Alignment Newsletter?
Many people are working on tasks adjacent to the problem of aligning recommenders, including
Recommender systems researchers working in academia, usually on public datasets and without access to real users.
Product managers, designers, and engineers working on production recommender systems at technology companies.
Social scientists studying the effects of recommenders from a variety of perspectives.
Legal scholars and policymakers designing recommender regulation.
Analysts at think tanks and other civil society organizations working to mitigate the harm of misaligned recommenders.
AI safety researchers working on the alignment of general AI systems.
This newsletter is dedicated to bringing the views and insights of these disparate fields into conversation with each other. The goal of this newsletter is to cover every important new development in the field of recommender alignment, ranging in scope from abstract AI alignment proposals to product changes in real-world systems.
Please help us by sending any relevant links or resources, and any feedback about the newsletter’s structure that would make it more useful to you, our readers!
Paper Summaries: Two Studies of Recommender Interfaces
For this first issue we’re highlighting two papers on recommender user interfaces. Relative to work on recommender algorithms and objectives, interfaces that give users more control are an under-researched way to improve alignment.
Designing for the better by taking users into account (RecSys 2019)
Ivan’s summary: The authors evaluate a prototype news recommender with a variety of control mechanisms. They argue for the value of doing loosely-structured, qualitative focus-group-style discussions as a way to discover and understand the diversity of user opinions, needs and concerns. If you don’t do this first, you won’t know what to look for in quantitative user studies / experiments.
Some takeaways from the user feedback they received:
Users generally distrusted the recommender system’s intentions, and worried about “loss of human agency due to recommender systems”, diminishing their critical thinking and creativity and making them vulnerable to manipulation.
Users liked seeing a dashboard summarizing their reading history in an accessible format split by categories, and the ability to “nudge” the recommender to show more of a particular category. Quoting one user, “what I like about it is that you see your manifest behavior, but next to it, you have your ambitions, ‘I should read more about art’, well now you can express that, and change the settings, I really like that”
Explicit control settings are seen as very valuable but too inconvenient to use on an everyday basis, e.g. “I would use this [heavyweight interface] on a Saturday afternoon, not on weekdays”.
Ivan’s summary: Like the previous paper, this one describes user studies testing out new interfaces on a small group of highly motivated users. In the context of a news recommender, the authors give users a preview of how their feedback actions (likes, clicks) will change the stories they see. Users strongly prefer this interface to traditional recommender interfaces with feedback that is both invisible and delayed, and say it increases their sense of control over the system and decreases decision anxiety.
Developments in Production Recommender Systems
Jack Bandy and Nick Diakopoulos performed an audit of how using Twitter’s curation algorithm (“Top Tweets”), instead of using a chronological Twitter feed, shifts the distribution of tweets displayed in application. Technical details of their audit implementation can be found here. Full paper here.
Facebook is testing new ways for users to give feedback on News Feed posts, beyond engagement signals and “Worth your time” surveys. See also their 2019 blog post on Using Surveys to Make News Feed More Personal.
In January Twitter introduced Birdwatch, a pilot program to crowdsource misinformation detection. The program is live at twitter.com/i/birdwatch and they claim it will eventually be deployed to the production version of Twitter. More companies should adopt a strategy of maintaining multiple public versions of their recommender system, allowing them to try out experimental features and user interfaces while minimizing risks to their main product. This mitigates the difficulty of changing interfaces in large production systems, which is one of the biggest blockers for recommender alignment in practice.
Written by Ivan Vendrov, Chris Painter, and Jonathan Stray. Thanks to Jeremy Nixon and Dylan Hadfield-Menell for helpful comments and discussions related to this post.