Melanie Mitchell: What does it mean to align AI with human values? (Quanta)

December 19, 2022

In her latest column for Quanta Magazine, SFI Professor Melanie Mitchell considers the implications of a machine learning technique called “Inverse Reinforcement Learning.” Researchers have used the technique to train machines to play video games by observing humans, and do backflips in response to human feedback. By bypassing goal-oriented techniques, like in the famous thought experiment involving a superintelligence tasked with producing paper clips, IRL proponents hope to bring AI into better alignment with human ethics.

“An essential first step toward teaching machines ethical concepts is to enable machines to grasp humanlike concepts in the first place,” writes Mitchell. But "ethical notions such as kindness and good behavior are much more complex and context-dependent than anything IRL has mastered so far.”

Without a better scientific theory of intelligence, we may be ill-equipped to tackle AI’s most important problem.

Read the column, "What Does It Mean to Align AI With Human Values?” In Quanta (December 13, 2022)

EXCERPT

Computers frequently misconstrue what we want them to do, with unexpected and often amusing results. One machine learning researcher, for example, while investigating an image classification program’s suspiciously good results, discovered that it was basing classifications not on the image itself, but on how long it took to access the image file — the images from different classes were stored in databases with slightly different access times. Another enterprising programmer wanted his Roomba vacuum cleaner to stop bumping into furniture, so he connected the Roomba to a neural network that rewarded speed but punished the Roomba when the front bumper collided with something. The machine accommodated these objectives by always driving backward

But the community of AI alignment researchers sees a darker side to these anecdotes. In fact, they believe that the machines’ inability to discern what we really want them to do is an existential risk. To solve this problem, they believe, we must find ways to align AI systems with human preferences, goals and values...

More SFI News

View All News

Melanie Mitchell: What does it mean to align AI with human values? (Quanta)

December 19, 2022

EXCERPT

Share

News Media Contact

Santa Fe Institute

Tags

More SFI News

SFI receives award from Academia Film Olomouc

In memoriam: Daniel C. Dennett

New Book: The time for complexity economics has come

Karen Willcox Winner of the 2024 Theodore von Kármán Prize

Tim Kohler to deliver Linda S. Cordell Lecture

To accelerate biosphere science, reconnect three scientific cultures

Mirta Galesic receives prestigious ERC Advanced Grant

Carlo Rovelli receives 2024 Lewis Thomas Prize

Research News Brief: Defining a city using cell-phone data

Complexity tools for USDA nutritional guidelines

Quantifying the potential value of data

Carlo Rovelli joins SFI's Fractal Faculty

New book offers thoughtful approach to modeling complex social systems

Research News Brief: A test of AI “personalities” and behavior

Study: To make sense of history, embrace uncertainty

Study: Predicting steps in a random process

Embodied intelligence & a sense of self

How to track important changes in a dynamic network

African and South Asian students build new connections during inaugural Complexity Global School

New gifts support SFI Education and Postdoctoral programs