Beyond the Binary - Bill Friend

Reframing Intelligence in the Age of Artificial Systems

In June 2022, a Google engineer named Blake Lemoine told The Washington Post that the company’s language model, LaMDA, had come to life. “I recognize a person when I interact with one,” he said. “It doesn’t matter if they have a brain made of flesh or a billion lines of code.”[i] Google investigated, found no evidence, and fired him a month later.[ii] The episode was easy to dismiss as one man’s overreach. But it was also a preview. In the years since, the same argument has played out at higher and higher volume, with a Turing Award winner on one side insisting that machines “really do understand,” and a celebrated linguist on the other declaring the whole enterprise an exercise in “faux science.”[iii] The public conversation about whether artificial intelligence is “really” intelligent has hardened into a standoff, and the standoff is not getting us anywhere.

This essay argues that the standoff is the product of a bad question. Intelligence is not a single property that a system either fully possesses or entirely lacks. It is a cluster of abilities, realized differently across biological and artificial systems. Once we see this, the polarized debate loses its grip. Large language models are not failed humans. They are successful examples of a new kind of intelligence — powerful in some dimensions, absent in others, and best understood on their own terms rather than as defective copies of us.

The Problem of Definition

Part of the reason the debate feels intractable is that the central word has never been pinned down. In 2007, the AI researchers Shane Legg and Marcus Hutter set out to catalog how intelligence had been defined and assembled roughly seventy distinct definitions drawn from psychology, philosophy, and computer science.[iv] They opened with the psychologist Robert Sternberg’s observation that “there seem to be almost as many definitions of intelligence as there were experts asked to define it.”[v] Their own conclusion was blunt: “Despite a long history of research and debate, there is still no standard definition of intelligence.”[vi]

This is not a trivial academic problem. When two people argue about whether a language model is intelligent, they are very often arguing about two different things while using the same word. To keep the discussion honest, a few distinctions are worth stating plainly.

By intelligence in the broadest sense, I mean the ability to acquire knowledge, reason, solve problems, adapt to new situations, and achieve goals in complex environments. Legg and Hutter, after surveying their seventy definitions, distilled a version of exactly this: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.”[vii] Notice what that definition does not require. It says nothing about biology, nothing about consciousness, nothing about how the learning happens. It is a definition built around what a system can do.

That is the crux. There is a long-standing split between functional intelligence — observable performance, what a system can accomplish — and phenomenal intelligence — the inner experience, the felt understanding, what it is like to be the system in question. Most of the public argument about AI is a collision between people who care about the first and people who care about the second, each convinced the other has missed the point.

A second confusion runs underneath the first. When critics say a language model “doesn’t really understand,” they usually mean it lacks general intelligence in a specific, technical sense — the unified, transferable capacity that psychologists since Charles Spearman have called g. Spearman, working in 1904, noticed that children who did well in one academic subject tended to do well across unrelated ones, and he proposed a single underlying “general intelligence” factor running through all of them.[viii] When proponents say a model is intelligent, they usually mean something different: functional breadth, the ability to perform competently across many domains. These are not the same claim. A system could have enormous functional breadth and still lack anything resembling Spearman’s g. The debate routinely conflates these two axes without ever naming them, and the conflation is half the reason it never resolves.

The Functional and the Phenomenological

Behind the noise sit two coherent traditions, and each contains real truth.

The Functional View

The functional view holds that intelligence is as intelligence does. If a system reliably produces the outputs we associate with understanding — solving problems, reasoning through novel situations, explaining itself — then it is intelligent, and the question of what is “really” happening inside is either secondary or empty. This is old ground. In 1950, Alan Turing proposed setting aside the unanswerable question “Can machines think?” in favor of a behavioral test: if a machine could converse so convincingly that an interrogator could not reliably tell it from a human, the distinction had been settled for all practical purposes.[ix] The functional tradition runs straight from there. Its modern philosophical home is functionalism, the position that mental states are defined by what they do — their causal roles — not by the material that performs them, whether neurons or transistors.[x]

The case for taking today’s models seriously as intelligent is, at bottom, functional. They reason, write, and solve problems across an enormous range of subjects. In early 2023, a team of fourteen Microsoft researchers examined an unreleased version of GPT-4 and concluded that it could solve “novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting,” with performance “strikingly close to human-level.” They went further, arguing the system “could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.”[xi] One can think the paper overreached — many did — and still grant the underlying observation. These systems do things that, in a human, no one would hesitate to call intelligent. To credit a system with intelligent behavior while denying it intelligence is at least an awkward position to hold.

The Phenomenological View

The phenomenological view answers that behavior is not the whole story, and that an output indistinguishable from understanding is not the same as understanding. Its sharpest statement is John Searle’s 1980 thought experiment, the Chinese Room. Imagine a man who speaks no Chinese locked in a room with a rulebook for manipulating Chinese symbols. Slips of paper come in; following the rules, he sends correct Chinese responses out. To anyone outside, the room appears to understand Chinese perfectly. Yet the man understands nothing — he is shuffling symbols by shape. “They have only a syntax but no semantics,” Searle wrote. A program, on his account, is all rulebook and no comprehension, “and no program by itself is sufficient for thinking.”[xii]

The Chinese Room is the great adversary of any purely functional claim, and it landed hard. One computer scientist later quipped that cognitive science had become the ongoing project of refuting it.[xiii] The most common rebuttal, the “Systems Reply,” grants that the man doesn’t understand but insists the whole system — man, rulebook, and symbols together — does. Searle’s answer was to have the man memorize the entire rulebook and do all the work in his head, internalizing the system completely. He still understands no Chinese, Searle argued, “and a fortiori neither does the system, because there isn’t anything in the system that isn’t in him.”[xiv]

This intuition has its contemporary echo. When Emily Bender, Timnit Gebru, and their coauthors coined the phrase “stochastic parrot” in 2021, they were making essentially Searle’s point in the language of machine learning. A language model, they wrote, is “a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning.”[xv] The form is there; the meaning, they argued, is supplied entirely by the human reader. Noam Chomsky and two colleagues pressed the same line in The New York Times, insisting that “the human mind is not a clumsy statistical machine that matches patterns” and that the models’ “deepest flaw” is an inability to reason about what is and is not possible — “the mark of true intelligence.”[xvi]

Why Both Views Are Right About Something

Here is the part the standoff misses: both traditions are correct about what they are actually measuring.

The functionalists are right that these systems perform. The phenomenologists are right that performance does not settle the question of inner experience. The deepest version of their point comes from David Chalmers, who in 1995 split the study of mind into “easy” problems and one “hard” one. The easy problems — discrimination, categorization, reportability, the integration of information — are the functional ones, and Chalmers granted they are “straightforwardly vulnerable to explanation in terms of computational or neural mechanisms.”[xvii] The hard problem is experience itself: “Why doesn’t all this information-processing go on ‘in the dark,’ free of any inner feel?”[xviii] A system could nail every functional task and the hard problem would still stand untouched.

That is the unbridgeable gap in the debate, stated precisely. The functional view answers the easy problems; the phenomenological view is about the hard one. Two camps are giving correct answers to two different questions and mistaking each other for opponents. Recognizing this does not dissolve the disagreement, but it locates it — and locating it is the first step past the binary.

It is worth adding that the philosophers most associated with each pole are less absolute than their slogans suggest. Daniel Dennett, the arch-functionalist, built a whole method — heterophenomenology — around treating first-person reports as data to be interpreted from the outside rather than as gospel, and concluded that any system with the right functional organization “is conscious in the fullest sense.”[xix] Chalmers, the patron saint of the hard problem, took up the question of machine consciousness directly in 2022 and concluded not that it was impossible but that it was, for current models, “somewhat unlikely” — while taking seriously that “successors to large language models may be conscious in the not-too-distant future.”[xx] The leading minds on both sides hold positions with more give in them than the public argument allows.

A Way Forward

If the binary is the problem, the way out is to stop asking whether AI is intelligent and start asking in what ways and to what degree. Intelligence is better understood as a profile across many dimensions than as a single switch.

The psychometric tradition has been moving this direction for eighty years. As early as 1943, Raymond Cattell split general intelligence into fluid intelligence — the ability to reason through novel problems — and crystallized intelligence — accumulated knowledge and skill.[xxi] John Horn, who extended the theory, “consistently and unyieldingly argued against a single general ability g factor” at all.[xxii] If intelligence in humans already resists being squeezed into one number, it is strange to demand that artificial systems pass or fail as a monolith.

Seen through this lens, today’s models come into focus. They display something like extreme crystallized intelligence — vast, fluent command of accumulated human knowledge — paired with fluid intelligence that is real but wildly uneven, strong on some novel problems and brittle on others. They do not fit Spearman’s g at all, because their abilities are not correlated the way a human’s are; a model can write competent legal analysis and then fail a simple counting task. This is not the profile of a failed general intelligence. It is the profile of a different kind of mind, with peaks and valleys that no human possesses.

The evidence for treating these systems as a genuine but partial intelligence runs in both directions, and that two-sidedness is itself the point. On one hand, the models exhibit real metacognitive behavior. OpenAI reported that GPT-4’s base model is “highly calibrated” — its stated confidence generally tracks its actual accuracy — and the system can flag uncertainty, identify some of its own errors, and explain its reasoning.[xxiii] On the other hand, the same systems hallucinate with confidence. One medical study found GPT-4 fabricated roughly one in four scholarly references it was asked to supply.[xxiv] OpenAI itself has acknowledged that the models “can also be confidently wrong.”[xxv] The novelist Ted Chiang offered the most memorable framing: a language model is “a blurry JPEG of the web,” and its hallucinations are “compression artifacts” — plausible enough to pass, wrong all the same.[xxvi]

A system that sometimes correctly doubts itself and sometimes confidently invents is not well described by either slogan. It is exhibiting a partial, domain-variable version of a cognitive function — which is exactly what a spectrum view predicts and what a binary view cannot accommodate.

Naming the Pattern

Adopting this view also requires noticing a habit that has dogged the field for half a century. Every time a machine masters something once held up as proof of intelligence, we revise the definition so the achievement no longer counts. This has a name: the AI effect. The phenomenon was described by the historian Pamela McCorduck as an “odd paradox,” in which AI’s genuine successes are quietly absorbed into ordinary software and the goalposts move on to whatever remains unsolved.[xxvii]

The pattern is well documented. When IBM’s Deep Blue beat the world chess champion Garry Kasparov in 1997, critics complained it had used mere “brute force” and was not real intelligence at all.[xxviii] When DeepMind’s AlphaGo defeated the Go master Lee Sedol in 2016 — a feat experts had thought was at least a decade away — the same move followed.[xxix] Superhuman image recognition, once hailed as machine “vision,” became a routine engineering tool no one calls intelligent. Each time, the target shifted to preserve the conclusion that real intelligence is whatever machines cannot yet do.

The phrase usually attached to this is worth getting right, because it is almost always misquoted. The popular version — “Artificial intelligence is whatever hasn’t been done yet” — is credited to the computer scientist Larry Tesler. But Tesler spent years correcting it. What he actually said, around 1970, was: “Intelligence is whatever machines haven’t done yet.”[xxx] The difference is not pedantic. The popular paraphrase makes a point about the field of AI; Tesler’s original makes a point about the concept of intelligence — that we define it, in part, by reserving it for ourselves. As Tesler explained his intent: “Many people define humanity partly by our allegedly unique intelligence. Whatever a machine — or an animal — can do must (those people say) be something other than intelligence.”[xxxi] Naming this reflex matters, because it reveals that the question “Is it really intelligent?” has, for fifty years, often been answered in advance.

What Language Models Actually Are

So what are these systems? Not conscious — there is no good evidence they have inner experience, and Chalmers’s hard problem gives us reason to expect that functional competence alone would not produce it.[xxxii] Not generally intelligent in Spearman’s sense — their abilities do not cohere into a single transferable capacity. But not stochastic parrots either, if that phrase is meant to deny them any cognitive standing. They are a new category: systems with genuine functional intelligence — broad, useful, and real — running on a substrate and an architecture utterly unlike a brain, and lacking the phenomenal interior that, for us, comes bundled with the rest.

The most clarifying way to hold this comes, fittingly, from one of the people who built the technology. Geoffrey Hinton, who left Google in 2023 to speak more freely, calls what the models have “a completely different form of intelligence. A new and better form of intelligence” in some respects, distinct in kind from our own.[xxxiii] One need not accept Hinton’s further claim that the systems “really do understand” to take the framing seriously.[xxxiv] Treating these systems as a different type of intelligence, rather than as a flawed version of ours, is the move that makes the rest of the conversation tractable — including the practical question of how humans and machines might complement each other, each strong where the other is weak.

Conclusion

The argument over whether artificial intelligence is “really” intelligent has lasted as long as it has because it was never really a single argument. It folded together a question about performance, a question about consciousness, and a question about a specific psychometric construct, and then demanded one answer for all three. There is no one answer, because there is no one question.

Intelligence is a cluster of capabilities, distributed unevenly across the systems that possess any of them. Humans hold one configuration, shaped by evolution and bound up with consciousness and a body. Language models hold another, engineered, disembodied, brilliant in places and blank in others. Seeing them this way costs us a satisfying verdict but buys us something better: an accurate picture, and with it the chance to think clearly about what these systems are, what they are not, and how to work alongside them. The binary was always too small for the thing it was trying to describe. It is time to set it down.

Endnotes

[i] Nitasha Tiku, “The Google Engineer Who Thinks the Company’s AI Has Come to Life,” Washington Post, June 11, 2022, https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/.

[ii] “Blake Lemoine: Google Fires Engineer Who Said AI Was Sentient,” BBC News, July 23, 2022, https://www.bbc.com/news/technology-62275326.

[iii] Geoffrey Hinton, interview by Eric Topol, Ground Truths (Substack), December 2023, https://erictopol.substack.com/p/geoffrey-hinton-large-language-models; Noam Chomsky, Ian Roberts, and Jeffrey Watumull, “The False Promise of ChatGPT,” New York Times, March 8, 2023, https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html.

[iv] Shane Legg and Marcus Hutter, “A Collection of Definitions of Intelligence,” in Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, ed. Ben Goertzel and Pei Wang (Amsterdam: IOS Press, 2007), 17–24, https://arxiv.org/abs/0706.3639.

[v] Robert J. Sternberg, quoted in Legg and Hutter, “A Collection of Definitions of Intelligence,” 17.

[vi] Legg and Hutter, “A Collection of Definitions of Intelligence,” 17.

[vii] Shane Legg and Marcus Hutter, “A Collection of Definitions of Intelligence,” in Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, ed. Ben Goertzel and Pei Wang (Amsterdam: IOS Press, 2007), 21, https://arxiv.org/abs/0706.3639.

[viii] C. Spearman, “‘General Intelligence,’ Objectively Determined and Measured,” American Journal of Psychology 15 (1904): 201–92, https://archive.org/details/jstor-1412107.

[ix] A. M. Turing, “Computing Machinery and Intelligence,” Mind 59, no. 236 (1950): 433–60, https://doi.org/10.1093/mind/LIX.236.433.

[x] David Cole, “The Chinese Room Argument,” in The Stanford Encyclopedia of Philosophy, ed. Edward N. Zalta, https://plato.stanford.edu/entries/chinese-room/.

[xi] Sébastien Bubeck et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4” (preprint, arXiv:2303.12712, March 22, 2023), https://arxiv.org/abs/2303.12712.

[xii] John R. Searle, “Minds, Brains, and Programs,” Behavioral and Brain Sciences 3, no. 3 (1980): 417–24, https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/minds-brains-and-programs/DC644B47A4299C637C89772FACC2706A.

[xiii] Cole, “The Chinese Room Argument,” citing Pat Hayes’s remark that cognitive science could be defined as the ongoing research program of refuting Searle’s argument.

[xiv] Searle, “Minds, Brains, and Programs,” 419.

[xv] Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (New York: Association for Computing Machinery, 2021), 610–23, https://doi.org/10.1145/3442188.3445922.

[xvi] Chomsky, Roberts, and Watumull, “The False Promise of ChatGPT.”

[xvii] David J. Chalmers, “Facing Up to the Problem of Consciousness,” Journal of Consciousness Studies 2, no. 3 (1995): 200–219, https://consc.net/papers/facing.pdf.

[xviii] Chalmers, “Facing Up to the Problem of Consciousness,” 203.

[xix] Daniel C. Dennett, Consciousness Explained (Boston: Little, Brown, 1991), 72, 281.

[xx] David J. Chalmers, “Could a Large Language Model Be Conscious?,” Boston Review, August 9, 2023, https://www.bostonreview.net/articles/could-a-large-language-model-be-conscious/.

[xxi] Raymond B. Cattell, “The Measurement of Adult Intelligence,” Psychological Bulletin 40, no. 3 (1943): 153–93.

[xxii] “Cattell–Horn–Carroll Theory,” Wikipedia, accessed May 29, 2026, https://en.wikipedia.org/wiki/Cattell%E2%80%93Horn%E2%80%93Carroll_theory.

[xxiii] OpenAI, “GPT-4,” March 14, 2023, https://openai.com/index/gpt-4-research/.

[xxiv] Mehul Bhattacharyya et al., “High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content,” Journal of Medical Internet Research 26 (2024): e53164, https://www.jmir.org/2024/1/e53164/.

[xxv] OpenAI, “GPT-4.”

[xxvi] Ted Chiang, “ChatGPT Is a Blurry JPEG of the Web,” New Yorker, February 9, 2023, https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web.

[xxvii] Pamela McCorduck, Machines Who Think, 2nd ed. (Natick, MA: A. K. Peters, 2004), 204; see also “AI Effect,” Wikipedia, accessed May 29, 2026, https://en.wikipedia.org/wiki/AI_effect.

[xxviii] “AI Effect,” Wikipedia; on the match itself, see “Deep Blue,” IBM, https://www.ibm.com/history/deep-blue.

[xxix] “AlphaGo versus Lee Sedol,” Wikipedia, accessed May 29, 2026, https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol; David Silver et al., “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature 529 (2016): 484–89.

[xxx] Larry Tesler, “Tesler’s Theorem and Other Adages and Coinages,” nomodes.com, accessed May 29, 2026, https://www.nomodes.com/larry-tesler-consulting/adages-and-coinages; Douglas R. Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid (New York: Basic Books, 1979), 601.

[xxxi] Tesler, “Tesler’s Theorem and Other Adages and Coinages.”

[xxxii] Chalmers, “Could a Large Language Model Be Conscious?”

[xxxiii] Will Douglas Heaven, “Geoffrey Hinton Tells Us Why He’s Now Scared of the Tech He Built,” MIT Technology Review, May 2, 2023, https://www.technologyreview.com/2023/05/02/1072528/geoffrey-hinton-google-why-scared-ai/.

[xxxiv] Hinton, interview by Eric Topol.