Researchers claim GPT-4 passed the Turing test

We’re interacting with artificial intelligence (AI) online more than ever before, and more than we realize, so researchers asked people to converse with four agents, including one human and three different AI models, to see if they could tell the difference.

The “Turing test” was first proposed by computer scientist Alan Turing in 1950 as the “Imitation Game,” which determines whether a machine’s ability to demonstrate intelligence is indistinguishable from that of a human. To pass the Turing test, a machine must be able to converse with someone and trick them into thinking it is human.

To replicate the test, the scientists asked 500 people to talk to four respondents, including a human and the 1960s AI program ELIZA, as well as both GPT-3.5 and GPT-4, the AIs that power ChatGPT. The conversation lasted five minutes, after which participants had to answer whether they thought they were talking to a human or an AI. In the study, published on the preprint arXiv server on May 9, participants judged GPT-4 to be human 54% of the time.

Elizaa system with pre-programmed responses but no large-scale language model (LLM) or neural network architecture, was judged to be human only 22% of the time, compared with GPT-3.5’s score of 50% and human participants’ score of 67%.

read more: “It is an AI’s natural right to harm humans in order to protect itself”: Humans may be abusing AI without knowing it

“Machines, like humans, can confabulate things by mixing up plausible post-hoc justifications for them,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science.

“AI is susceptible to cognitive biases, can be tricked or manipulated, and is becoming increasingly deceptive. All these factors mean that human-like weaknesses and quirks are expressed in AI systems, making them more human-like than previous approaches, which offered little more than a list of canned responses.”

The work builds on decades of attempts to get AI agents to pass the Turing test, and reflects general concerns that AI systems that are perceived as human would have “far-reaching social and economic consequences.”

The scientists also argue that there is legitimate criticism that the Turing Test’s approach is too simplistic, stating that “stylistic and socio-emotional factors play a larger role in passing the Turing Test than traditional notions of intelligence,” suggesting that we’ve been looking for machine intelligence in the wrong places.

“Innate intelligence alone has its limits. What’s really important is being intelligent enough to understand the situation and the skills of others, and have the empathy to connect those factors. Competence is only part of the value of AI; the ability to understand the values, preferences, and limitations of others is also essential. It is these qualities that will enable AI to act as a loyal, trustworthy concierge in our lives.”

Watson added that the study illustrates the challenges for future human-machine interactions, which will likely lead us to become increasingly paranoid about the nature of those interactions, especially in sensitive matters, and that the study highlights how AI has changed in the GPT era.

“ELIZA was severely limited in its capabilities by being limited to canned responses. It might fool you for five minutes, but its limitations quickly became apparent,” she says. “The language model is infinitely flexible, able to synthesize responses on a wide range of topics, speak in specific languages or social dialects, and express itself with character-driven personalities and values. This is a major step forward from anything that has been hand-programmed by humans, no matter how sophisticated and careful it may be.”