Study: Visual Analogies for AI

Examples of some of the visual tasks to test intelligence in humans vs AI. (image: Appendix A in "The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain")

September 25, 2023

The field of artificial intelligence has long been stymied by the lack of an answer to its most fundamental question: What is intelligence? AIs such as GPT-4 have highlighted this uncertainty: some researchers believe that GPT models are showing glimmers of genuine intelligence but others disagree.

To address these arguments, we need concrete tasks to pin down and test the notion of intelligence, argue SFI researchers Arseny Moskvichev, Melanie Mitchell, and Victor Vikram Odouard in a new paper in Transactions on Machine Learning Research. The authors provide just that — and find that even the most advanced AIs still lag far behind humans in their ability to abstract and generalize concepts.

The team created evaluation puzzles — based on a domain developed by Google researcher François Chollet — that focus on visual analogy-making, capturing basic concepts such as above, below, center, inside, and outside. Human and AI test-takers were shown several patterns demonstrating a concept and then asked to apply that concept to a different image. The accompanying figure shows tests of the notion of sameness.

Examples of some of the visual tasks the authors of a recent study posed in testing intelligence. (image: Appendix A in "The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain")

These visual puzzles were very easy for humans: For example, they got the notion of sameness correct 88 percent of the time. But GPT-4 struggled, only getting 23 percent of these puzzles right. So the researchers conclude that, currently, AI programs are still weak at visual abstract reasoning.

“We reason a lot by analogies, so that’s why it’s such an interesting question,” Moskvichev says. The team’s use of novel visual puzzles ensured that the machines hadn’t encountered them before. GPT-4 was trained on large portions of the internet, so it was important to avoid anything it might have encountered already, to be certain it wasn’t just parroting existing text rather than demonstrating its own understanding. That’s why recent results like an AI’s ability to score well on a Bar exam aren’t a good test of its true intelligence.

The team believes that as time goes on and AI algorithms improve, developing evaluation routines will get progressively more difficult and more important. Rather than trying to create one test of AI intelligence, we should design more carefully curated datasets focusing on specific facets of intelligence. “The better our algorithms become, the harder it is to figure out what they can and can’t do,” Moskvichev says. “So we need to be very thoughtful in developing evaluation datasets.

Read the study, "The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain" in Transactions on Machine Learning Research (August, 2023)

More SFI News

View All News

Study: Visual Analogies for AI

September 25, 2023

Share

News Media Contact

Santa Fe Institute

Tags

More SFI News

In memoriam: Daniel C. Dennett

New Book: The time for complexity economics has come

Karen Willcox Winner of the 2024 Theodore von Kármán Prize

Tim Kohler to deliver Linda S. Cordell Lecture

To accelerate biosphere science, reconnect three scientific cultures

Mirta Galesic receives prestigious ERC Advanced Grant

Carlo Rovelli receives 2024 Lewis Thomas Prize

Research News Brief: Defining a city using cell-phone data

Complexity tools for USDA nutritional guidelines

Quantifying the potential value of data

Carlo Rovelli joins SFI's Fractal Faculty

New book offers thoughtful approach to modeling complex social systems

Research News Brief: A test of AI “personalities” and behavior

Study: To make sense of history, embrace uncertainty

Study: Predicting steps in a random process

Embodied intelligence & a sense of self

How to track important changes in a dynamic network

African and South Asian students build new connections during inaugural Complexity Global School

New gifts support SFI Education and Postdoctoral programs

The cultural evolution of collective property rights