Is this AGI?

Recently, OpenAI announced their newest language model GPT-3, which was picked up by Olle Häggström on his blog with the question: Is GPT-3 one more step towards artificial general intelligence (AGI)?

The blog linked to a video created by a PhD student at ETH Zurich who went over the paper in detail. His views are unequivocal: at various places he comments on how the model has simply “memorized” the training data and how many of its results then amount to no more than a very crude but very large “pattern matching”  and “copy-paste” exercise. Nevertheless Olle leans towards the view that this is yet another step in the direction of AGI. He cites a well known tech blogger called gwern but no researcher in NLP or linguistics.

I took some questions from the linked blog of gwern and posed them in slightly modified form to my colleagues who are active researchers in natural language processing (NLP), linguistics and cognitive science at Chalmers (Richard Johansson) and Gothenburg University (Shalom Lappin and Asad Sayeed). Here are the questions and their responses:

  • Does GPT-3 represent an advance towards AGI or is it simply an impressive engineering achievement?

    Shalom Lappin: Clearly a software engineering advance. It is part of a family of powerful transformer deep learning systems, which process input and achieve robust learning outcomes, through attentional heads. It does well across a wide range of AI tasks, but it is hardly “intelligent” in the requisite human sense.

Asad Sayeed: It is an impressive engineering achievement, but apparently not          greatly different in kind from other recent work.

Richard Johansson: GPT-3 is an interesting demonstration of what can be achieved with large-scale language models trained on massive amounts of data. But strictly speaking GPT-3 is an increment over previous language models and it is not *qualitatively* different from them. In a nutshell, the major paradigm shift in NLP took place in the early 1990s with the shift to data-driven models and the rest is basically tweaking. Some researchers have pointed out that we don’t have to rely on the particular curve-fitting technique that is fashionable today (“neural” approaches) to get good language models. See for instance Goldberg’s example in this blog post, where he shows that simple old-school statistical models also perform surprisingly well:

I personally expect that narrow AI will continue its rapid growth, but it’s likely that we will see new shifts in what underlying mathematical methods are popular. Maybe in some years, we will shift to mathematical techniques that don’t come with any associated “nerve”-related terminological baggage and the AGI discussions might then be rendered moot.

 

  • Did you predict capabilities like GPT-3? So it is not very surprising to you?

    Shalom Lappin: Of course. It is a natural extension of DNNs in the transformer mode.

 Asad Sayeed: No, I’m not surprised. For example, in terms of writing realistic news articles, AllenAI’s GROVER already showed that you could do this.

 Richard Johansson: Like in many areas of narrow AI, progress in NLP has been faster in the last few years than most NLP researchers anticipated. Ten years ago, I would not have predicted language models with the quality of GPT-3. However with the perspective of the last couple of years, GPT-3 is not particularly surprising, considering the performance of its predecessors (GPT-2, BERT, XLM etc). Again, we are talking about incremental improvements.

In my view, a take-home message of the developments in the last years is that many tasks that were previously considered “AI-complete” can be solved by applying fairly straightforward narrow AI models, if enough data is available. This was probably surprising to many in the field, but unexpectedly rapid progress in narrow AI does not tell us anything about how AGI is supposed to emerge.

  • What specific task, what specific number etc would convince you that there is genuine AGI type progress in understanding natural language?

    Shalom Lappin: A robust dialogue management system that could conduct fluent NL conversations with humans, changing topics, initiating discussions, updating its knowledge state with new information, moving seamlessly across disparate subjects, asking relevant questions, and understanding figurative language.

Asad Sayeed:

If someone builds a system that can contribute cogently and interactively to a discussion about the social origins of its own deficiencies, that would suggest that we’ve moved closer to AGI.
The question challenges us to come up with a benchmark or task that would indicate progress towards AGI with what I believe to be a not-so-subtle hint (especially considering the source material, the gwern post…) that we are ingrates who are impossible to satisfy given the benchmarks that already exist.  There is the presupposition that success at these benchmarks represents progress towards AGI.  I believe that there is an unacknowledged problem that the tasks and benchmarks themselves are designed around a language modeling approach.  That narrow (but in absolute terms, very large) slice of human behaviour is what these models are increasingly good at, and people are implicitly constructing tasks that these models are designed for, give or take increasing scale and model “tweaks” (multihead self-attention being considered a very large tweak).
Richard Johansson: The notion of AGI is poorly defined. It is also difficult to come up with a benchmark that can’t be “gamed” by providing a language model with enough data of the right type. But if I would have to try to measure AGI in language models, I would probably design a set of questions showing that the model has some basic understanding of the physical world and of people’s mental states. The questions would need to be designed in a way so that they require creative problem-solving and that it would be difficult to come up with answers by pattern-matching or parroting what has been expressed in a huge collection of text.

However we should also note that unrestricted language understanding requires full intelligence comparable to an adult human, which is a very high bar. But even if we scale down the ambitions, we are equally far from constructing AI systems comparable to e.g. mice of birds in intentional behavior and understanding of the physical world.

  • Are you still pretty sure AGI (in understanding language)  is very far away?  Why?

    Shalom Lappin: I think that it is likely to the point of near impossibility. The idea of the singularity strikes me as a ludicrous science fiction fantasy. I strongly doubt that humans are able to devise computational devises that can replicate and surpass their general cognitive abilities. These are complex and opaque in a way that, I suspect, defies human modelling ability.

 Asad Sayeed: It is very far away.  In the NLP domain, there are very interesting advances, but they ultimately involve cognitive tasks (often very complicated ones) in isolation or very artificial applied tasks (e.g. machine translation), neither of which represent a realistic picture of human behaviour, assuming that AGI aims at replicating in some sense human capabilities.  The definition of AGI is of course part of the problem: neurocognitively, we have only scratched the surface of how to define what human cognitive capabilities even *are*.

Richard Johansson: The implicit assumption in AGI discussions seems to be that if we improve narrow AI models enough, we will cross some threshold and AGI will “emerge”. Assuming we can even define AGI precisely, so far I haven’t seen any rigorous explanation of how this “emergence” is supposed to occur.

Interestingly, in section 6 of the OpenAI paper called “Broader Impact”, the authors themselves make no mention at all about AGI. Instead they list a number of concrete ways in which their technology poses dangers: malicious misuse in disseminating misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting, fairness and bias problems (including race, gender and religion) and massive use of energy resources. These are the real and present dangers of such AI technologies that we should focus on addressing today, not AGI or superintelligence.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s