In her latest column for Quanta Magazine, SFI Professor Melanie Mitchell considers the implications of a machine learning technique called “Inverse Reinforcement Learning.” Researchers have used the technique to train machines to play video games by observing humans, and do backflips in response to human feedback. By bypassing goal-oriented techniques, like in the famous thought experiment involving a superintelligence tasked with producing paper clips, IRL proponents hope to bring AI into better alignment with human ethics.
“An essential first step toward teaching machines ethical concepts is to enable machines to grasp humanlike concepts in the first place,” writes Mitchell. But "ethical notions such as kindness and good behavior are much more complex and context-dependent than anything IRL has mastered so far.”
Without a better scientific theory of intelligence, we may be ill-equipped to tackle AI’s most important problem.
Read the column, "What Does It Mean to Align AI With Human Values?” In Quanta (December 13, 2022)
EXCERPT
Computers frequently misconstrue what we want them to do, with unexpected and often amusing results. One machine learning researcher, for example, while investigating an image classification program’s suspiciously good results, discovered that it was basing classifications not on the image itself, but on how long it took to access the image file — the images from different classes were stored in databases with slightly different access times. Another enterprising programmer wanted his Roomba vacuum cleaner to stop bumping into furniture, so he connected the Roomba to a neural network that rewarded speed but punished the Roomba when the front bumper collided with something. The machine accommodated these objectives by always driving backward
But the community of AI alignment researchers sees a darker side to these anecdotes. In fact, they believe that the machines’ inability to discern what we really want them to do is an existential risk. To solve this problem, they believe, we must find ways to align AI systems with human preferences, goals and values...