As AI increasingly influences our human experience, we need to better understand its behavior. In new research published in Proceedings of the National Academy of Sciences, SFI External Professor Matthew Jackson (Stanford University) and co-authors presented two recent versions of ChatGPT — GPT-3 and -4 — with a common personality test. In what the authors call a “Turing test of whether AI chatbots are behaviorally similar to humans,” they also asked the bots to describe their moves in a suite of behavioral games that have been used to predict real-world economic and ethical behaviors, and then compared the bots’ responses to those of humans from around the world. While GPT-3’s behavior was detectably non-human, GPT-4 was statistically indistinguishable from humans, the study found. When GPT-4’s behaviors differed from typical human counterparts, it often chose strategies more consistent with altruism, fairness, empathy, and reciprocity. 

Just as we need to understand the “personalities” of AI bots as we put them in positions of decision-making, we need to understand how these interactions might affect us, says Jackson. “The more we understand early on — regarding where to expect great things from AI and where to expect bad things — the better we steer things in a beneficial direction,” he says.

Read the paper “A Turing test of whether AI chatbots are behaviorally similar to humans” in PNAS (February 22, 2024).

Read commentary about the study: