The Alignment Problem: Machine Learning and Human Values

Book Review

Jun 01, 2023

The book The Alignment Problem by Brian Christian has been on my reading list for a while now. It came up again in a Product by Design podcast interview I was doing as we discussed the implications of AI, good and bad. And with my own increased interest in AI (don’t forget to read and share the Next Horizon newsletter for all things AI), it felt like a natural fit for our monthly book review.

The Alignment Problem: Machine Learning and Human Values

Overview

The book explores all the problems that can occur as we build ever more sophisticated AI.

In the book, Brian Christian delves into the history of AI, the current state of the field (which has evolved significantly), and potential future developments. He interviews many researchers and practitioners in AI, providing a comprehensive overview of the “alignment problem” and its potential solutions.

The alignment problem refers to the difficulty of ensuring that the goals of an AI system or machine learning model align with our human objectives or human values.

The disconnect between the intention and results—between what we put into the machine and the purpose we really desire—defines the essence of the alignment problem. Of course, it’s not just an AI problem. The alignment problem exists everywhere, especially where we have rewards and incentives. But this becomes especially relevant as we train machines and models.

This is a significant issue in AI ethics and safety, as misaligned AI might lead to unintended and harmful outcomes. Or might perpetuate biases and stereotypes that are part of our culture, whether we know it or not.

The book explores the alignment problem across multiple dimensions, including the technical challenges of defining and implementing human values in machines, the philosophical questions around what those values should be, and the social implications of increasingly powerful AI systems.

So what are some key ideas for us?

Key Takeaways

Bias in AI

The book goes through many of the best examples of bias in AI.

One example was from Amazon’s job applicant resume filtering system. Amazon had automated its search for the best candidates and trained an AI model to screen for the best candidates. The idea was to automate the search and only focus on the top resumes once the AI had identified them.

The problem was that the AI had been trained on successful candidates from the past 10 years. That would seem like a logical choice, until you realize that most of the industry, especially in technical roles, had been (and is) dominated by men.

So the AI could identify this and screened out female candidates since it had “learned” male candidates were preferable. Even when Amazon adjusted the AI to not penalize specific female terms, it could still identify subtle clues on resumes and screen out women. So Amazon eventually had to abandon the use of its AI.

Whenever a model is trained on limited data (which is always, since you can’t train a model on all the data else it ceases to become a model), you are prone to errors. So it is critical we understand and identify potential biases or limitations and do our best to mitigate them.

Which leads to the second key point:

Transparency and Interpretability

The book’s chapter on transparency is one of my favorites.

The gist of the chapter is that rule-based models are much simpler and easier to understand, but yield worse results. Neural networks (like AI models) are far superior but far more opaque. We rarely understand how an AI model is reaching its conclusions since it is a “black box”, often with many layers in the model itself.

There are obviously many problems if humans cannot understand and interpret what AI models are doing.

As the book suggests, for example, if a clinic or hospital were to use AI to triage patients, we would need to understand exactly why certain patients were being admitted and others were being treated and sent home.

This example was demonstrated in the book. In one case, since most asthma patients with other issues were admitted and treated, they always got better. So it suggested, based on that outcome, to treat asthma patients with complicating factors as outpatients. When the reason all of them recovered was because doctors admitted them. So the AI had learned the wrong lesson and made the wrong suggestion.

In order to correct these mistakes, we need to understand the models. We need to visualize what is happening and interpret it so we can make changes. Because we can’t send asthma patients home when they actually need to be admitted to the hospital.

Human Value Alignment

Finally, how do we align our AI models with our values? How do properly define our values? And how do we properly train AI in a way that it learns the right things and achieves the right outcomes?

The book is not only about AI, but explores many of the philosophical and psychological questions that AI brings to the forefront for us. Which is partly why I enjoyed it so much.

As WSJ put it:

Mr. Christian notes that computers may one day be able not only to learn our behaviors but also intuit our values—figure out from our actions what it is we’re trying to optimize. This possibility offers the hope of robust cooperative human-machine learning—an area of especially promising research at the moment—but it raises a number of thorny concerns: What if an algorithm intuits the “wrong” values, based on its best read of who we currently are but perhaps not who we aspire to be? Do we, Mr. Christian asks, really want our computers inferring our values from our browser histories?

Can we even align values in such a way? Humans are complex and often contradictory, even individually. Are we capable of defining and aligning AI to certain values? Could AI do it?

This, of course, gets to the heart of the problem. And there are no easy answers.

So…

The Alignment Problem by Brian Christian was an excellent read. If you are interested in AI—including its history, its future implications, and the philosophical ramifications—then this is a book for you. It will give you a great background on all the problems we’ve seen so far and what future problems will look like. As well as what we need to think about and do to address our AI future.