Monday, October 19, 2020

Agency and AI

There are probably other ways to come to define agency, but I come at this from the angle of thinking about artificial intelligence.

One version of our (perhaps not-so-distant) AI-affected future is that one AI will completely take over, rule the whole world. (The other version is that there will be multiple AI (and perhaps humans) which collectively, perhaps as something scene-like analogous to the international system of national governments, rule or prefer not to rule the whole world.)

If one AI takes over the whole world, we would hope that it was programmed in such a way that it was aligned with human well-being. But there's at least two problems. One is, the technical challenge of getting an AI to actually be aligned any particular way. It isn't always easy to code concepts, even ones which you can more or less live according to, yourself, as a human. The other is that people don't really know what human well-being is. I can think of a few different approaches to well-being: humans should feel pleasure, should not feel pain, should feel "ought-to-be-ness" (a concept to be discussed in my upcoming review of The Feeling of Value), should have their preferences satisfied, should have their trust maximized, should be "good people" (according to Christianity? What kind of Christianity? Or a kind of Islam? Or another religion? Or a branch of secular thinking?), should relate properly to reality (what does that mean?), should relate properly to God (which one?).

Given this profusion of options, one might conclude that it is unlikely that this is even the complete list, and also that if you wanted to code all these, it would be difficult, and you would tend to have to make a lot of decisions about how to balance goals that seemed to contradict in particular cases. And, at best, one strain of humanity would actually like what you were coming up with. As you designed your AI, you would be writing the constitution of the future world's culture and government, but it's likely that many or most people wouldn't approve of the decisions you were making.

So you might think that the wise and humble thing to do would be to try to make an AI that ceded as much decision-making ability to humans as possible. That way, over the coming centuries, humans could all participate in a process that humans have been participating in for hundreds of years, of proposing one or another version of human well-being and living by it, hoping to win over their neighbors. Perhaps someday we would have settled on one version of human well-being, and AI could maximize that. (Or maybe even then it should hang back, and we should hang back a little ourselves, because human consensus does not prove alignment with ultimate reality.) But until then, the AI would simply facilitate that process by which we make decisions.

So the AI might try to maximize agency. Agency, in this particular formulation, being "the ability to secure one's own decision-making ability and minimal physical liberty". The AI would try to keep people from dying, and help build them up to be capable of seeing things from their own point of view, so that they could make their own decisions if they chose to, and then at least minimally act on them, by not being impeded from moving their bodies. The AI would take into account the ways in which people's actions impair each other's decision-making abilities (psychological coercion, causing trauma, causing pain above a certain threshold, etc.), and do its best to create a society with fewer such impairments.

So there could be such a thing as "agency utilitarianism", whereby we try to build up the greatest number of people in the greatest way, to be able to function and make their own decisions, and sometimes to defend themselves, if they want to.

We already have world-ruling AI (of a sort), which are states. We also have elite cultures, such as those capable of designing powerful technologies or policies, or those capable of significantly influencing those designers. People in a position of power or "eliteness" are in a position where they can make decisions on behalf of other people. The greater the power, and the greater the differential between elites and non-elites, the more that elites or states resemble world-ruling AI. States and elites, like AI, also have the problem of not really knowing what human well-being is, and of not necessarily being aligned with it, whatever they take it to be. (The people who design AI are themselves part of the elite.)

It's hard to "code" a particular definition of well-being in systems made up of human beings, just as it is in AI. How would we actually get a state, an elite culture, or a culture at large (all of which rule over individual citizens) to adhere to one, correct, fully-featured definition of well-being? If we try to coordinate them all on one idea, the simpler the better. So perhaps the idea to implement should be for each citizen to be able to pursue a true definition of well-being, and to be able to pursue that and offer to share it with others, in ways that do not impede others' pursuit of the same. And thus, agency as the value for rulers to maximize, while citizens may seek anything more intense and true.


  1. Have you read "Human Compatible: Artificial Intelligence and the Problem of Control" he proposes a similar control mechanism for AI to what you describe here. Though it's more around doing what humans want than maximizing their agency, and the clever bit is that the AI starts out with very low certainty about what someone wants and then gradually learns (via bayesian updating) as it goes. It's a pretty good book I reviewed it here:

    I also like your idea of the state as a primitive AI. I may have to use that...

  2. I haven't read the book, but did encounter the strategy for alignment (CIRL) on the EA Forum.