Participatory ML and Recommender Alignment
Two real examples of improving alignment by widening participation
Most modern recommender systems are technocracies: nearly all the important design decisions are made by a small group of engineers and product managers, and users can influence the system only through a narrow and rigid feedback channel (such as clicking, rating, unsubscribing). The paucity of feedback results in the system being blind to most user values, which under strong enough optimization pressure results in their obliteration (by an argument along the lines of Consequences of Misaligned AI; optimizing an incomplete objective under resource constraints will minimize omitted features). This explains a number of catastrophic failures in technocratic polities, as memorably chronicled by James Scott in Seeing Like a State. It is also a major source of misalignment in technocratic recommender systems.
From this perspective on recommender alignment, the solution is to build participatory recommender systems - ones that distribute power widely among their stakeholders in an effort to capture and respect a wider range of values. I believe this is a deeply underrated area of recommender system design, and so am highlighting two papers which describe real world deployments of participatory recommendation:
ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia
This paper describes the system Wikipedia uses to flag vandalism & suggest work to editors, probably the most impactful participatory ML system in the world today. They describe it as a “meta-algorithmic system” - a system that allows community members to easily build, audit, modify, and compose algorithms, resulting in a rich ecosystem of cooperating users and algorithms. They describe a ton of useful patterns of participatory design, including
Enabling the creation of a wide variety of models that can be freely modified, reused, and composed together by anyone, rather than trying to build the “one true model”.
Collecting explicit labels from the affected communities instead of relying on implicit feedback.
Giving users insight into models by showing counterfactual predictions (e.g. how would the vandalism prediction change if the submitter was anonymous); this helps detect model biases and weaknesses that help inform their use.
Letting users request models optimized for custom metrics (e.g. maximize recall @ 90% precision) as appropriate for their use case.
One weakness (that the authors themselves admit) is that training a new model still generally requires an ML expert; people without an ML background generally have to ask the ORES team to train a model for them, which reduces the level of participation. Our next paper solves this problem by providing flexible tools for users to build their own models.
WeBuildAI: Participatory Framework for Algorithmic Governance
This paper describes a recommender system for matching food donations with potential recipients, built by the authors for a “food rescue” nonprofit that connects (1) grocery stores with expiring food (2) nonprofits that distribute the food (3) volunteers that transport the food. There are a number of values at stake, from minimizing volunteer travel time to distributing the food equally to prioritizing recipients in poorer neighborhoods, and no clear way to trade off between them.
The basic WeBuildAI process is illustrated by this diagram from the paper:
The most interesting step is the first one, when they get representatives from all stakeholder groups to build models representing their preferences over food distribution decisions. I love their description of the challenges with building human preference models:
“Building a model that embodies an individual’s beliefs on policy gives rise to three challenges. First, people need to determine what information, or features, should be used in algorithms. Second, the individual needs to form a stable policy that applies across a broad spectrum of situations. This process requires people to examine their judgments in different contexts until they reach an acceptable coherence among their beliefs, or reflective equilibrium. Third, people without expertise in algorithms need to be able to express their beliefs in terms of an algorithmic model.”
To solve these challenges, they get each participant to build 2 models - a set of explicit rules, and an ML model learned from a series of pairwise comparisons. These models turned out to be substantially different - thinking in terms of abstract features and rules results in different decisions than choosing between concrete cases. Most (10/15) of the participants chose the ML model as a better representation of their belief, but this could have been conflated by the ML model generally being built last. The papers goes into great depth on the experience of the participants, with many participants feeling like they learned more about the system and even about their own values. This is a hallmark of participatory design - it augments and improves human judgment instead of replacing it with algorithms.
Interesting Links
Many interesting papers on participatory approaches to ML can be found at the ICML 2020 workshop on the topic.
Some inspiring mockups illustrating concrete humane design patterns that iOS could implement. And for contrast, a taxonomy of "dark patterns" (h/t Chris) meant to deceive or manipulate users.
Finally, Breadboard (h/t @jonathanstray) is a platform that allows researchers to run experiments on social networks with real users, addressing a key problem with academic recommenders research.
As always, please send us any interesting links related to recommender alignment!