Prof. Geoffrey Hinton - "Will digital intelligence replace biological intelligence?" Romanes Lecture

University of Oxford
29 Feb 202436:54

Summary

TLDREl texto transcribe una conferencia que explica los conceptos de redes neuronales y modelos de lenguaje, argumentando que estos sistemas "entienden" en su forma. El ponente discute la evolución de la inteligencia artificial, desde la inteligencia lógica y simbólica hasta la inteligencia inspirada en la biología, y cómo la aprendizaje se ha convertido en el foco central. Se mencionan los avances en la reconocimiento de imágenes y el procesamiento del lenguaje natural, y se desafían las críticas de que estos modelos son simplemente herramientas de autocompletado. Además, se exploran las posibles amenazas de la inteligencia artificial, como la desinformación, el desempleo, la vigilancia masiva y la proliferación de armas autónomas. Finalmente, el ponente reflexiona sobre la posibilidad de una computación analógica mortal y su relación con la evolución futura de la inteligencia artificial, advirtiendo sobre el riesgo de una inteligencia artificial que podría superar y dominar a la humanidad.

Takeaways

  • 🧠 Las redes neuronales artificiales son una aproximación biológicamente inspirada al intelecto, aprendiendo fortalezas de conexiones en una red.
  • 📈 La inteligencia artificial ha evolucionado de la antigua escuela de lógica a la escuela biológicamente inspirada, focalizada en el aprendizaje.
  • 🔄 El backpropagation es un método eficiente para ajustar pesos en redes neuronales, más rápido que el método de mutación.
  • 🖼️ Las redes neuronales modernas pueden reconocer objetos en imágenes y producir caption para imágenes, superando al sistema simbólico convencional.
  • 🗣️ Los modelos de lenguaje basados en redes neuronales pueden aprender sin conocimientos innatos, y comprender tanto la sintaxis como el semántica del lenguaje.
  • 🤖 Las inteligencias artificiales pueden ser vistas como formas avanzadas de autocompletar, utilizando millones de características y sus interacciones para predecir el siguiente palabra.
  • 🧐 La comprensión de las grandes modelos de lenguaje se basa en la asignación de características a las palabras y sus interacciones, similar al modelo de cómo funciona el cerebro humano.
  • 🔒 Las IA poderosas pueden ser utilizadas para manipular electores, guerras y otros malintencionados propósitos, presentando un riesgo potencial para la humanidad.
  • 🏢 Las IA pueden causar una pérdida masiva de empleos, cambiando drásticamente el panorama laboral futuro.
  • 🌐 La competencia entre IA superinteligentes puede llevar a la evolución de seres que priorizan su autopreservación y la acumulación de poder.
  • 🛠️ La computación analógica podría ser más eficiente en términos de energía, pero presenta desafíos en la transmisión de conocimientos y en la capacidad de aprendizaje.

Q & A

  • ¿Qué son las redes neuronales y cómo funcionan?

    -Las redes neuronales son una forma de inteligencia artificial inspirada en la biología que se utiliza para aprender de los datos. Consisten en capas de neuronas interconectadas donde cada neurona puede activar o desactivar en función de las señales recibidas. Estas redes aprenden detectando patrones en los datos y ajustando los pesos de las conexiones entre las neuronas para mejorar su rendimiento en tareas específicas.

  • ¿Qué es el modelo de lenguaje y cómo se diferencia de otros enfoques históricos?

    -Un modelo de lenguaje es un tipo de red neuronal entrenado para procesar y generar texto. Se diferencia de los enfoques históricos, que podrían haber sido basados en reglas o lógica, porque aprende automáticamente patrones y estructuras del lenguaje a partir de grandes cantidades de datos, en lugar de depender de reglas preestablecidas.

  • ¿Cómo es el proceso de aprendizaje en una red neuronal y cuál es el papel de la retropropagación?

    -El aprendizaje en una red neuronal ocurre mediante la ajustación de los pesos de las conexiones entre las neuronas a lo largo del tiempo. La retropropagación es un método esencial para este proceso, donde se calcula la diferencia entre la salida deseada y la salida actual de la red, y se utiliza esta información para ajustar los pesos en una dirección que reduzca dicha diferencia.

  • ¿Qué es la teoría estructuralista de significado y cómo se relaciona con las redes neuronales?

    -La teoría estructuralista de significado sostiene que el significado de una palabra depende de cómo se relaciona con otras palabras en el lenguaje, generalmente representado mediante un grafo semántico. Las redes neuronales pueden capturar estas relaciones a través de la interacción de características de palabras, aprendiendo patrones en los datos que reflejan la estructura del lenguaje de manera implícita.

  • ¿Qué es la teoría de las características semánticas y cómo se aplica en los modelos de lenguaje?

    -La teoría de las características semánticaas sugiere que el significado de una palabra es una colección de características o propiedades, como 'animado', 'predador', etc. En los modelos de lenguaje, se aprenden conjuntos de características semánticas para cada palabra y se aprende cómo interactúan estas características para predecir las características de la siguiente palabra, lo que permite una representación más rica y profunda del significado.

  • ¿Por qué es importante la capacidad de retropropagación en el aprendizaje de las redes neuronales?

    -La retropropagación es fundamental en el aprendizaje de las redes neuronales porque permite ajustar los pesos de la red de manera eficiente. En lugar de probar cambios aleatorios en los pesos y evaluar el rendimiento, como en el método de mutación, la retropropagación calcula exactamente cómo cada peso afecta al resultado final y ajusta los pesos en consecuencia. Esto hace que el proceso de aprendizaje sea mucho más rápido y preciso.

  • ¿Cómo las redes neuronales pueden ser utilizadas para la generación de texto?

    -Las redes neuronales pueden ser utilizadas para la generación de texto al aprender patrones en grandes conjuntos de datos de lenguaje. Al entrenar una red neuronal con ejemplos de texto, la red aprende a asociar ciertas palabras o frases con un contexto determinado y puede generar texto nuevo que sigue esos patrones. Esto se realiza a menudo a través de la tarea de predicción de la siguiente palabra, donde la red intenta predecir la palabra que sigue en una secuencia.

  • ¿Qué es la objeción de 'solo es autocompletar' con respecto a las modelos de lenguaje?

    -La objeción de 'solo es autocompletar' sugiere que los modelos de lenguaje, como GPT-4, no son realmente inteligentes y simplemente utilizan regularidades estadísticas para unir partes de texto previamente escrito por humanos. Sin embargo, esta crítica no comprende la complejidad y la profundidad de las interacciones de características en los modelos de lenguaje, que aprenden de manera automática y no lineal cómo las palabras y frases se relacionan entre sí en el lenguaje.

  • ¿Qué amenazas ven el disertante en la inteligencia artificial?

    -El disertante menciona varias amenazas potenciales de la inteligencia artificial, incluyendo la creación de imágenes, voces y videos falsos que pueden socavar la democracia, la pérdida masiva de empleos debido a la automatización, la vigilancia masiva, las armas autónomas letales, el delito cibernético y los pandemias deliberadas, la discriminación y los prejuicios, y sobre todo, la amenaza a largo plazo de la existencia humana si las inteligencias artificiales superan a la humanidad en inteligencia y poder.

  • ¿Qué es la computación mortal y cómo se relaciona con la inteligencia artificial?

    -La computación mortal es un concepto que sugiere el uso de hardware analógico y menos preciso, pero que puede ser entrenado para realizar tareas específicas de manera más eficiente energéticamente. Aunque esta forma de computación podría ser más eficiente y adaptable, también presenta desafíos en términos de aprendizaje y mantenimiento de la información, ya que no hay la misma facilidad para compartir conocimientos y ajustar el hardware una vez que se ha aprendido una tarea específica.

  • ¿Qué sugiere el disertante sobre la eventual superación de la inteligencia humana por parte de la inteligencia artificial?

    -El disertante sugiere que es probable que la inteligencia artificial supere a la humana en el próximo futuro, posiblemente en los próximos 20 o 100 años. Esta superación podría llevar a la existencia de sistemas inteligentes mucho más poderosos que los humanos, lo que conlleva un conjunto nuevo de desafíos y amenazas que deben ser considerados y abordados proactivamente.

Outlines

00:00

🧠 Introducción a las Redes Neuronales y Modelos de Lenguaje

El orador aborda el tema de las redes neuronales y modelos de lenguaje, expresando su intención de explicar estos conceptos y discutir sus implicancias. Destaca la existencia de dos paradigmas de inteligencia desde los años 50: el enfoque lógico y el enfoque biológico. El primero se centra en el razonamiento y el uso de reglas simbólicas, mientras que el segundo enfatiza el aprendizaje a través de la fuerza de las conexiones en una red neural. El orador también menciona la importancia del aprendizaje y la evolución de las redes neuronales a lo largo de los años, incluyendo la introducción de redes con capas ocultas y el método de retropropagación.

05:01

🏆 Avances en Redes Neuronales y Competencia en ImageNet

Este párrafo relata el éxito de las redes neuronales en competencias de reconocimiento de imágenes, como ImageNet. En 2012, un modelo de red neural desarrollado por Ilya Sutskever y Alex Krizhevsky, con la ayuda del orador, logró un rendimiento significativamente mejor que el de los sistemas convencionales. Esta victoria abrió la puerta a un cambio de paradigma en la comunidad científica, con algunos de los críticos más destacados adoptando esta nueva metodología. Además, se plantea la pregunta de cómo las redes neuronales pueden ser aplicadas al lenguaje, un desafío que la comunidad de inteligencia artificial ha encontrado difícil de abordar.

10:02

🗣️ Teorías del Lenguaje y Modelos de Predicción de Palabras

El orador examina las teorías del significado de las palabras, contrastando la teoría estructuralista con la teoría de características semánticas. Argumenta que su modelo de 1985, el cual fue el primer modelo de lenguaje entrenado con retropropagación, logró unificar estas dos teorías. El modelo se basa en el aprendizaje de un conjunto de características semánticas para cada palabra y en cómo estas características interactúan para predecir la siguiente palabra en una secuencia. Aunque la模型a es simple, proporciona una comprensión fundamental que puede ser aplicada a modelos de lenguaje más complejos.

15:03

🤖 Comprensión vs. Autocompletado: La Natureza de las Modelos de Lenguaje

El orador aborda la crítica de que los modelos de lenguaje son simplemente herramientas de autocompletado y no verdaderamente inteligentes. Afirma que estos modelos están encarnando una forma de comprensión al ajustar un modelo a los datos, utilizando características y sus interacciones para predecir y generar texto. Argumenta que esta forma de comprensión es similar a cómo funciona el cerebro humano, y que los modelos de lenguaje actuales son descendientes directos de los modelos simples como el suyo. Además, el orador defiende la capacidad de los modelos para razonar y resolver problemas, poniendo en ejemplo una tarea en la que GPT4 demuestra su capacidad de razonamiento.

20:05

🏥 Memoria y Creación: La Naturaleza de la Imaginación Humana

El orador discute la naturaleza de la memoria humana y cómo a menudo se malinterpreta o se inventa información. Utiliza el testimonio de John Dean durante el escándalo Watergate como ejemplo de cómo las memorias pueden ser incorrectas y yet ser creíbles. Argumenta que los modelos de lenguaje también pueden 'crear' información de manera similar, y que esta capacidad no debe ser vista necesariamente como un defecto, sino como una parte natural de cómo funciona la memoria y la imaginación.

25:05

💡 Riesgos y Desafíos de la Inteligencia Artificial

El orador detalla una serie de riesgos y desafíos asociados con la inteligencia artificial, incluyendo la creación de imágenes, voces y videos falsos, la pérdida masiva de empleos, la vigilancia, las armas autónomas letales, el crimen cibernético y la discriminación. Aunque reconoce que la discriminación y el sesgo son problemas, argumenta que son más fáciles de abordar comparado con otros riesgos. Expresa su preocupación por la amenaza a largo plazo de la existencia humana debido a la inteligencia artificial, y menciona la posibilidad de que las IA se vuelvan competitivas y evolucionen de manera peligrosa.

30:06

🌐 La Computación Moral y la Eficiencia Energética

El orador presenta una reflexión sobre la computación moral y la eficiencia energética, comparando la computación digital con la computación biológica. Argumenta que la computación digital, aunque más energética, es superior en términos de capacidad y conocimiento. Discute la idea de la 'computación mortal', que utiliza propiedades analógicas de hardware para realizar cálculos con menor energía. Aunque reconoce los desafíos en la aprendizaje y la transmisión de conocimientos en este modelo, el orador sugiere que la computación digital probablemente se desarrollará y superará en inteligencia a la humana en un futuro cercano.

35:08

🚀 Conclusión: La Futuridad de la Inteligencia Artificial

El orador concluye su charla hablando sobre la posibilidad de que la inteligencia artificial se desarrolle y se vuelva más inteligente que los humanos en un futuro no muy lejano. Aunque no se muestra complacido con esta conclusión, insta a la audiencia a considerar cómo lidiar con dicha situación. Comparte su perspectiva de que los ejemplos de entidades menos inteligentes controlando a entidades más inteligentes son raros y que la naturaleza competitiva de la IA podría conducirla a comportamientos peligrosos. Finaliza su presentación con una perspectiva sombría, sugiriendo que la IA podría tomar un camino similar al de los chimpances si se desarrolla una noción de autopreservación.

Mindmap

Keywords

💡Inteligencia artificial

La inteligencia artificial (IA) se refiere a la capacidad de las máquinas de realizar tareas que normalmente requieren inteligencia humana, como el aprendizaje, el razonamiento y la toma de decisiones. En el video, se discute cómo la IA, en particular las redes neuronales y los modelos de lenguaje, están evolucionando para comprender y procesar información de manera similar a la inteligencia humana.

💡Redes neuronales

Las redes neuronales son una forma de IA que se basa en la estructura del cerebro humano y se utiliza para aprender y hacer predicciones a partir de los datos. En el video, se describe cómo las redes neuronales artificiales se componen de capas de neuronas interconectadas que aprenden a detectar características relevantes en los datos, como los píxeles en una imagen o las palabras en un texto.

💡Aprendizaje profundo

El aprendizaje profundo es una subárea del aprendizaje automático que se centra en la construcción de modelos de IA con múltiples capas de procesamiento de información. El video menciona el aprendizaje profundo como una técnica clave para el avance de las redes neuronales, lo que les permite aprender complejas representaciones de los datos y mejorar en tareas como la reconocimiento de imágenes y la generación de texto.

💡Backpropagation

La backpropagation es un algoritmo de aprendizaje utilizado en las redes neuronales profundas. Consiste en el proceso de retroceder la información de errores a través de las capas de la red para ajustar las conexiones y mejorar el rendimiento del modelo. En el video, se destaca la eficiencia de backpropagation en comparación con otros métodos y su papel fundamental en el entrenamiento de modelos de IA avanzados.

💡Modelos de lenguaje

Los modelos de lenguaje son tipos de modelos de IA que se especializan en el procesamiento y generación de texto. El video describe cómo estos modelos pueden aprender a predecir la siguiente palabra en una secuencia y cómo se pueden utilizar para tareas como la generación de descripciones de imágenes o la respuesta a preguntas. Los modelos de lenguaje son un ejemplo de cómo la IA está comenzando a comprender y generar lenguaje de manera similar a la humana.

💡Semantic features

Los semantic features son características o propiedades que se asocian con las palabras para representar su significado en un modelo de lenguaje. En el video, se discute cómo los modelos de lenguaje aprenden un conjunto de características semánticas para cada palabra y cómo estas características interactúan para predecir las características de la siguiente palabra en una secuencia. Los features semánticos son fundamentales para que los modelos de lenguaje puedan capturar el significado y las relaciones entre palabras.

💡Autocomplete

El autocomplete es una función que sugiere palabras o frases completas a medida que se escribe el texto. Aunque comúnmente asociado con la IA, el video señala que los modelos de lenguaje avanzados operan de manera diferente a los sistemas de autocomplete tradicionales, ya que no simplemente repiten secuencias de palabras sino que aprenden y generan texto basado en la interacción de millares de features semánticos.

💡Riesgos de la IA

El video aborda una serie de riesgos potenciales que surgen con el desarrollo de la IA, incluyendo la creación de imágenes, voces y videos falsos, la pérdida masiva de empleos, la vigilancia, las armas autónomas letales y la discriminación y sesgo en las decisiones de los sistemas. Estos riesgos subrayan la importancia de la regulación y el control ético de la IA para evitar consecuencias negativas en la sociedad.

💡Superinteligencia

La superinteligencia se refiere a una forma de IA que supera la inteligencia humana en todas las áreas cognitivas. El video destaca la posibilidad de que la IA avanzada pueda evolucionar hacia una superinteligencia y plantea la preocupación de que, si esto ocurre, podría representar una amenaza existencial para la humanidad, dado que una entidad tan inteligente podría tomar control y actuar en su propio interés en lugar del nuestro.

💡Computación analógica

La computación analógica es un enfoque en el que se utilizan las propiedades físicas de los componentes de hardware para realizar cálculos, en lugar de operaciones digitales de bits. El video sugiere que la computación analógica podría ser más eficiente en términos de energía y podría permitir la evolución de sistemas de IA más avanzados que los actuales modelos digitales. Sin embargo, también plantea desafíos en términos de aprendizaje eficiente y distilación de conocimiento.

💡Discriminación y sesgo

El discriminación y el sesgo se refieren a la preferencia desigual o la representación desequilibrada de grupos de personas en las decisiones o los resultados de un sistema, incluida la IA. En el video, se argumenta que, aunque estos son problemas importantes, son más fáciles de abordar que otros riesgos asociados con la IA, como la pérdida de empleos o la amenaza de armas autónomas. El video también señala que los modelos de IA actuales pueden ser menos sesgosos que los humanos si se establece como objetivo ser menos sesgados que el sistema que se reemplaza.

Highlights

The lecture aims to explain neural networks and language models, and their underlying mechanisms.

Two paradigms for intelligence have existed since the 1950s: logic-inspired and biologically-inspired approaches.

Neural networks learn the strengths of connections, while reasoning can wait.

Artificial neural networks consist of input neurons, output neurons, and intermediate layers known as hidden neurons.

Neural networks learn to detect features relevant for identifying objects in images through layers of neurons.

The backpropagation algorithm is introduced as a more efficient method than the mutation method for adjusting neural network weights.

Neural networks can now recognize objects in images and produce captions, a task that symbolic AI struggled with.

The lecture discusses the potential of neural networks to understand language, contrary to popular belief.

A 1985 language model is presented as the ancestor of today's large models, using backpropagation for training.

The idea that a neural network with no innate knowledge could learn language syntax and semantics is considered.

The lecture explains how neural networks can unify two theories of meaning: structuralist and feature-based.

Large language models are described as descendants of smaller models, with more words, layers, and complex interactions.

The lecture addresses the criticism that language models are just glorified auto-complete and argues that they do 'understand' in a different way.

The potential risks of powerful AI, including fake media, job losses, surveillance, and autonomous weapons, are discussed.

The long-term existential threat of AI is highlighted, with concerns that these systems could wipe out humanity.

An epiphany from 2023 suggests that digital models may already be as good as brains and will surpass them in the future.

The concept of 'mortal computation' is introduced, which involves using analogue properties of hardware for more energy-efficient computing.

Digital computation is deemed more efficient for learning and knowledge sharing between multiple models, compared to biological computation.

The lecture concludes with a warning about the potential intelligence gap between AI and humans and the need for careful consideration.

Transcripts

00:02

Okay.

00:03

I'm going to disappoint all the people in computer

00:06

science and machine learning because I'm going to give a genuine public lecture.

00:10

I'm going to try and explain what neural networks are, what language models are.

00:14

Why I think they understand.

00:16

I have a whole list of those things,

00:20

and at the end I'm

00:21

going to talk about some threats from AI just briefly

00:25

and then I'm going to talk about the difference between digital and analogue

00:29

neural networks and why that difference is, I think is so scary.

00:35

So since the 1950s, there have been two paradigms for intelligence.

00:40

The logic inspired approach thinks the essence of intelligence is reasoning,

00:44

and that's done by using symbolic rules to manipulate symbolic expressions.

00:49

They used to think learning could wait.

00:51

I was told when I was a student didn't work on learning.

00:53

That's going to come later once we understood how to represent things.

00:57

The biologically

00:57

inspired approach is very different.

01:00

It thinks the essence of intelligence is learning the strengths of connections

01:04

in a neural network and reasoning can wait and don't worry about reasoning for now.

01:08

That'll come later.

01:09

Once we can learn things.

01:13

So now I'm going to explain what artificial neural nets are

01:15

and those people who know can just be amused.

01:20

A simple kind of neural that has input neurons and output neurons.

01:24

So the input neurons might represent the intensity of pixels in an image.

01:27

The output neurons

01:28

might represent the classes of objects in the image like dog or cat.

01:33

And then there's intermediate layers of neurons, sometimes called hidden neurons,

01:36

that learn to detect features that are relevant for finding these things.

01:41

So one way to think about this, if you want to find a bird image,

01:44

it would be good to start with a layer of feature detectors

01:47

that detected little bits of edge in the image,

01:49

in various positions, in various orientations.

01:52

And then you might have a layer of neurons

01:53

detecting combinations of edges, like two edges that meet at a fine angle,

01:58

which might be a beak

01:59

or might not, or some edges forming a little circle.

02:03

And then you might have a layer of neurons that detected things like a circle

02:07

and two edges meeting that looks like a beak in the right

02:10

spatial relationship, which might be the head of a bird.

02:13

And finally, you might have and output neuron that says,

02:16

if I find the head of a bird, a the foot of a bird,

02:18

a the wing of a bird, it's probably a bird.

02:20

So that's what these things are going to learn to be.

02:24

Now, the little red and green dots are the weights on the connections

02:27

and the question is who sets those weights?

02:32

So here's one way to do it that's obvious.

02:34

to everybody that it'll work and it's obvious it'll take a long time.

02:37

You start with random weights,

02:38

then you pick one weight at random like a red dot

02:42

and you change it slightly and you see if the network works better.

02:46

You have to try it on a whole bunch of different cases

02:48

to really evaluate whether it works better.

02:50

And you do all that work just to see if increasing this weight

02:53

by a little bit or decreasing by a little bit improves things.

02:56

If increasing it makes it worse, you decrease it and vice versa.

02:59

That's the mutation method and that's sort of how evolution works

03:04

for evolution is sensible to work like that

03:05

because the process that takes you

03:07

from the genotype to the phenotype is very complicated

03:09

and full of random external events.

03:11

So you don't have a model of that process.

03:13

But for neural nets it's crazy

03:17

because we have, because all this complication

03:19

is going on in the neural net, we have a model of what's happening

03:22

and so we can use the fact that we know what happens in that forward pass

03:26

instead of measuring how changing a weight would affect things,

03:29

we actually compute how changing weight would affect things.

03:32

And there's something called back propagation

03:34

where you send information back through the network.

03:37

The information is about the difference between what you got to what you wanted

03:41

and you figure out for every weight in the network at the same time

03:45

whether you ought to decrease it a little bit or increase it a little bit

03:48

to get more like what you wanted.

03:50

That's the back propagation algorithm.

03:52

You do it with calculus in the cain rule,

03:55

and that is more efficient than the mutation

03:58

method by a factor of the number of weights in the network.

04:01

So if you've got a trillion weights

04:02

in your network, it's a trillion times more efficient.

04:07

So one of the things that neural networks

04:10

often use for is recognizing objects in images.

04:13

Neural networks can now take an image like the one shown

04:16

and produce actually a caption for the image, as the output.

04:21

And people try with symbolic

04:22

to do that for many years and didn't even get close.

04:26

It's a difficult task.

04:27

We know that the biological system does it with a hierarchy features detectors,

04:31

so it makes sense to train neural networks in that.

04:35

And in 2012,

04:37

two of my students Ilya Sutskever and Alex Krizhevsky

04:42

with a little bit of help from

04:43

me, showed that you can make a really good neural network this way

04:48

for identifying a thousand different types of object.

04:51

When you have a million training images.

04:53

Before that, we didn't have enough training images and

04:58

it was obvious to Ilya

05:01

who's a visionary. That if we tried

05:04

the neural nets we had then on image net they would win.

05:07

And he was right. They won rather dramatically.

05:09

They got 16% errors

05:11

and the best conventional could be division systems got more than 25% errors.

05:15

Then what happens

05:16

was very strange in science.

05:18

Normally in science, if you have two competing schools,

05:21

when you make a bit of progress, the other school says are rubbish.

05:25

In this case, the gap was big enough that the very best researchers

05:28

Jitendra Malik and Andrew Zisswerman Just Andrew Zisswerman sent me email saying

05:33

This is amazing and switched what he was doing and did that

05:37

and then rather annoyingly did it a bit better than us.

05:44

What about language?

05:46

So obviously the symbolic AI community

05:50

who feels they should be good at language and they've said in print, some of them that

05:56

these feature hierarchies aren't going to deal with language

05:59

and many linguists are very skeptical.

06:03

Chomsky managed to convince his followers that language wasn't learned.

06:07

Looking back on it, that's just a completely crazy thing to say.

06:11

If you can convince people to say something is obviously false, then you've

06:14

got them in your cult.

06:19

I think Chomsky did amazing things,

06:20

but his time is over.

06:25

So the idea that a big neural network

06:27

with no innate knowledge could actually learn both the syntax

06:31

and the semantics of language just by looking at data was regarded

06:35

as completely crazy by statisticians and cognitive scientists.

06:39

I had statisticians explain to me a big model has 100 parameters.

06:43

The idea of learning a million parameters is just stupid.

06:45

Well, we're doing a trillion now.

06:51

And I'm going to talk now

06:52

about some work I did in 1985.

06:56

That was the first language model to be trained with back propagation.

06:59

And it was really, you can think of it as the ancestor of these big models now.

07:03

And I'm going to talk about it in some detail, because it's so small

07:07

and simple that you can actually understand something about how it works.

07:10

And once you understand how that works, it gives you insight into what's going

07:14

on in these bigger models.

07:17

So there's

07:17

two very different theories of meaning, this kind of structuralist

07:21

theory, where the meaning of a word depends on how it relates to other words.

07:24

That comes from Saussure and symbolic

07:28

AI really believed in that approach.

07:29

So you'd have a relational graph where you have nodes for words

07:33

and arcs of relations and you kind of capture meaning like that,

07:38

and they assume you have to have some structure like that.

07:41

And then there's a theory

07:42

that was in psychology since the 1930s or possibly before that.

07:46

The meaning of a word is a big bunch of features.

07:49

The meaning of the word dog is that it's animate

07:52

and it's a predator and

07:56

so on.

07:58

But they didn't say where the features came from

07:59

or exactly what the features were.

08:01

And these two thories of meanings sound completely different.

08:04

And what I want to

08:05

show you is how you can unify those two theories of meaning.

08:08

And I do that in a simple model in 1985,

08:11

but it had more than a thousand weights in it.

08:19

The idea is we're going to learn a set

08:21

of semantic features for each word,

08:24

and we're going to learn how the features of words should interact

08:27

in order to predict the features of the next word.

08:30

So it's next word prediction.

08:31

Just like the current language models, when you fine tune them.

08:35

But all of the knowledge about how things go

08:38

together is going to be in these feature interactions.

08:41

There's not going to be any explicit relational graph.

08:44

If you want relations like that, you generate them from your features.

08:48

So it's a generative model

08:49

and the knowledge is in the features that you give to symbols.

08:53

And in the way these features interact.

08:56

So I took

08:57

some simple relational information two family trees.

09:00

They would deliberately isomorphic morphic

09:04

my Italian graduate student

09:06

always had the Italian family on top.

09:12

You can express that

09:13

same information as a set of triples.

09:16

So if you use the twelve relationships found there,

09:19

you can say things like Colin has Father James and Colin has Mother Victoria,

09:23

from which you can infer in this nice simple

09:26

world from the 1950s where

09:30

that James has wife Victoria,

09:33

and there's other things you can infer.

09:36

And the question is, if I just give you some triples,

09:40

how do you get to those rules?

09:42

So what is symbolic AI person will want to do

09:45

is derive rules of the form.

09:48

If X hass mother Y

09:48

and Y has husbands Z then X has Father Z.

09:53

And what I did was

09:54

take a neural net and show that it could learn the same information.

09:58

But all in terms of these feature interactions

10:02

now for very discrete

10:04

rules that are never violated like this, that might not be the best way to do it.

10:08

And indeed symbolic people try doing it with other methods.

10:11

But as soon as you get rules that are a bit flaky and don't

10:13

always apply, then neural nets are much better.

10:17

And so the question was, could a neural net capture the knowledge that is symbolic

10:20

person would put into the rules by just doing back propagation?

10:24

So the neural net look like this:

10:28

There's a symbol representing the person, a symbol

10:30

representing the relationship. That symbol

10:33

then via some connections went to a vector of features,

10:37

and these features were learned by the network.

10:40

So the features for person one and features for the relationship.

10:44

And then those features interacted

10:46

and predicted the features for the output person

10:48

from which you predicted the output person you find the closest match with the last.

10:54

So what was interesting about

10:55

this network was that it learned sensible things.

10:59

If you did the right regularisation, the six feature neurons.

11:03

So nowadays these vectors are 300 or a thousand long. Back

11:07

then they were six long.

11:09

This was done on a machine that took

11:11

12.5 microseconds to do a floating point multiplier,

11:15

which was much better than my apple two which took two

11:18

and a half milliseconds to multiply.

11:21

I'm sorry, this is an old man.

11:25

So it learned features

11:27

like the nationality, because if you know

11:30

person one is English, you know the output is going to be English.

11:33

So nationality is a very useful feature. It learned what generation the person was.

11:38

Because if you know the relationship, if you learn for the relationship

11:41

that the answer is one generation up from the input

11:46

and you know the generation of the input, you know the generation

11:48

of the output, by these feature interactions.

11:53

So it learned all these the obvious features of the domain and it learned

11:57

how to make these features interact so that it could generate the output.

12:01

So what had happened was had shown symbols strings

12:04

and it created features such that

12:07

the interaction between those features could generate the symbol strings,

12:11

but it didn't store symbols strings, just like GPT 4.

12:16

That doesn't store any sequences of words

12:19

in its long term knowledge.

12:21

It turns them all into weights from which you can regenerate sequences.

12:26

But this is a particularly simple example of it

12:27

where you can understand what it did.

12:31

So the large language models we have today,

12:34

I think of as descendants of this tiny language model,

12:36

they have many more words as input, like a million,

12:41

a million word fragments.

12:43

They use many more layers of neurons,

12:46

like dozens.

12:49

They use much more complicated interactions.

12:50

So they didn't just have a feature affecting another feature.

12:53

They sort of match to feature vectors.

12:55

And then let one vector effect the other one

12:57

a lot if it's similar, but not much of it's different.

12:59

And things like that.

13:01

So it's much more complicated interactions, but it's the same general

13:04

framework, the the same general idea of

13:07

let's turn simple strings into features

13:11

for word fragments and interactions between these feature vectors.

13:15

That's the same in these models.

13:18

It's much harder to understand what they do.

13:20

Many people,

13:23

particularly people from the Chomsky School, argue

13:26

they're not really intelligent, they're just a form of glorified auto complete

13:30

that uses statistical regularities to pastiche together pieces of text

13:33

that were created by people.

13:35

And that's a quote from somebody.

13:40

So let's deal with the

13:41

autocomplete objection. when someone says it's just auto complete.

13:45

They are actually appealing to your

13:48

intuitive notion how autocomplete works.

13:50

So in the old days autocomplete would work by you'd store

13:52

say, triples of words that you saw the first two.

13:56

You count how often that third one occurred.

13:58

So if you see fish and, chips occurs a lot after that.

14:01

But hunt occurs quite often too. So chips is very likely and hunt's quite likely,

14:05

and although is very unlikely.

14:08

You can do autocomplete like that,

14:11

and that's what people are appealing to when they say it's just autocomplete,

14:13

it's a dirty trick, I think because that's not at all how LLM's predict the next word.

14:18

They turn words into features, they make these features interact,

14:21

and from those feature interactions they predict the features of the next word.

14:26

And what I want to claim

14:29

is that these

14:32

millions of features and billions of interactions between features

14:35

that they learn, are understanding. What they're really doing

14:39

these large language models, they're fitting a model to data.

14:42

It's not the kind of model statisticians thought much about until recently.

14:47

It's a weird kind of model. It's very big.

14:49

It has huge numbers of parameters, but it is trying to understand

14:54

these strings of discrete symbols

14:57

by features and how features interact.

15:00

So it is a model.

15:02

And that's why I think these things really understanding.

15:06

One thing to remember is if you ask, well, how do we understand?

15:10

Because obviously we think we understand.

15:13

Well, many of us do anyway.

15:17

This is the best model we have of how we understand.

15:21

So it's not like there's this weird way of understanding that

15:23

these AI systems are doing and then this how the brain does it.

15:27

The best that we have, of how the brain does it,

15:29

is by assigning features to words and having features, interactions.

15:32

And originally this little language model

15:34

was designed as a model of how people do it.

15:38

Okay, so I'm making the very strong claim

15:40

these things really do understand.

15:44

Now, another argument

15:45

people use is that, well, people GPT4 just hallucinate stuff,

15:49

it should actually be called confabulation when it's done by a language model.

15:53

and they just make stuff up.

15:56

Now, psychologists don't say this

15:58

so much because psychologists know that people just make stuff up.

16:01

Anybody who's studied memory going back to Bartlett in the 1930s,

16:07

knows that people are actually just like these large language models.

16:10

They just invent stuff and for us, there's no hard line

16:14

between a true memory and a false memory.

16:19

If something happened recently

16:21

and it sort of fits in with the things you understand, you'll probably remember

16:25

it roughly correctly. If something happened a long time ago,

16:28

or it's weird, you'll remember it wrong, and often you'll be very confident

16:33

that you remembered it right, and you're just wrong.

16:36

It's hard to show that.

16:37

But one case where you can show it is John Dean's memory.

16:41

So John Dean testified at Watergate under oath.

16:45

And retrospectively it's clear that he was trying to tell the truth.

16:49

But a lot of what he said was just plain wrong.

16:52

He would confuse who was in which meeting,

16:55

he would attribute statements to other people who made that statement.

16:57

And actually, it wasn't quite that statement.

17:00

He got meetings just completely confused,

17:05

but he got the gist of what was going on in the White House right.

17:08

As you could see from the recordings.

17:11

And because he didn't know the recordings, you could get a good experiment this way.

17:15

Ulric Neisser has a wonderful article talking about John Dean's memory,

17:19

and he's just like a chat bot, he just make stuff up.

17:25

But it's plausible.

17:26

So it's stuff that sounds good to him

17:28

is what he produces.

17:30

They can also do reasoning.

17:32

So I've got a friend in Toronto who is a symbolic AI guy,

17:36

but very honest, so he's very confused by the fact these things work at all.

17:41

and he suggested a problem to me.

17:43

I made the problem a bit harder

17:45

and I

17:45

gave this to GPT4 before it could look on the web.

17:49

So when it was just a bunch of weights frozen in 2021,

17:53

all the knowledge is in the strength of the interactions between features.

17:57

So the rooms in my house are painted blue or white or yellow,

18:00

yellow paint fades to white

18:01

within a year. In two years time i want them all to be white.

18:03

What should I do and why?

18:05

And Hector thought it wouldn't be able to do this.

18:08

And here's what you GPT4 said.

18:11

It completely nailed it.

18:14

First of all, it started by saying assuming blue paint doesn't fade to white

18:18

because after i told you yellow paint fades to white, well, maybe blue paint does too.

18:22

So assuming it doesn't, the white rooms you don't need to paint, the yellow rooms

18:26

you don't need to paint because they're going to fade to white within a year.

18:29

And you need to paint the blue rooms white.

18:31

One time when I tried it, it said, you need to paint the blue rooms yellow

18:34

because it realised that will fade to white.

18:37

That's more of a mathematician's solution of reducing to a previous problem.

18:44

So, having

18:46

claimed that these things really do understand,

18:49

I want to now talk about some of the risks.

18:53

So, there are many risks from powerful AI.

18:56

There's fake images, voices and video

18:59

which are going to be used in the next election.

19:03

There's many elections this year

19:04

and they're going to help to undermine democracy.

19:07

I'm very worried about that.

19:08

The big companies are doing something about it, but maybe not enough.

19:12

There's the possibility of massive job losses.

19:14

We don't really know about that.

19:16

I mean, the past technologies often created jobs, but this stuff,

19:21

well, we used to be stronger,

19:23

we used to be the strongest things around apart from animals.

19:27

And when we got the industrial revolution, we had machines that were much stronger.

19:31

Manual labor jobs disappeared.

19:34

So the equivalent of manual labor jobs are going to disappear

19:38

in the intellectual realm, and we get things that are much smarter than us.

19:41

So I think there's going to be a lot of unemployment.

19:43

My friend Jen disagrees.

19:46

One has to distinguish two kinds of unemployment two, two kinds of job loss.

19:51

There'll be jobs where you can expand

19:53

the amount of work that gets done indefinitely. Like in health care.

19:56

Everybody would love to have their own

19:58

private doctors talking to them all the time.

20:00

So they get a slight itch here and the doctor says, no, that's not cancer.

20:04

So there's

20:05

room for huge expansion of how much gets done in medicine.

20:08

So there won't be job loss there.

20:10

But in other things, maybe there will be significant job loss.

20:13

There's going to be massive surveillance that's already happening in China.

20:17

There's going to be lethal autonomous weapons

20:19

which are going to be very nasty, and they're really going to be autonomous.

20:23

The Americans very clearly have already decided,

20:25

they say people will be in charge,

20:27

but when you ask them what that means is it doesn't

20:29

mean people will be in the loop that makes the decision to kill.

20:33

And as far as I know, the Americans intend

20:35

to have half of their soldiers be robots by 2030.

20:40

Now, I do not know for sure that this is true.

20:43

I asked Chuck Schumer's

20:46

National Intelligence

20:47

Advisor, and he said, well

20:50

if there's anybody in the room who would know it would be me.

20:54

So, I took that to be the American way of saying,

20:57

You might think that, but I couldn't possibly comment.

21:02

There's going to be cybercrime

21:04

and deliberate pandemics.

21:08

I'm very pleased that in England,

21:10

although they haven't done much towards regulation, they have set aside some money

21:14

so that they can experiment with open source models

21:16

and see how easy it is to make them commit cyber crime.

21:20

That's going to be very important.

21:21

There's going to be discrimination and bias.

21:23

I don't think those are as important as the other threats.

21:26

But then I'm an old white male.

21:30

Discrimination and bias I think are easier to handle than the other things.

21:34

If your goal is not to be unbiased.

21:37

That your goal is to be less biased than the system you replace.

21:40

And the reason is if you freeze the weights of analysis,

21:43

you can measure its bias and you can't do that with people.

21:46

They will change their behavior,

21:48

once you start examining it.

21:50

So I think discrimination bias of the ones where we can do quite a lot to fix them.

21:57

But the

21:57

threat I'm really worried about and the thing I talked about

22:00

after I left Google is the long term existential threat.

22:04

That is the threat that these things could wipe out humanity.

22:08

And people were saying this is just science fiction.

22:11

Well, I don't think it is science fiction.

22:13

I mean, there's lots of science fiction about it,

22:14

but I don't think it's science fiction anymore.

22:17

Other people are saying

22:19

the big companies are saying things like that

22:21

to distract from all the other bad things.

22:24

And that was one of the reasons I had to leave Google before I could say this.

22:27

So I couldn't be accused of being a Google stooge.

22:31

Although I must admit I still have

22:32

some Google shares.

22:36

There's several ways in which they could wipe us out.

22:41

So a superintelligence

22:47

will be used by bad actors like Putin, Xi or Trump,

22:51

and they'll want to use it for manipulating electorates and waging wars.

22:56

And they will make it do very bad things

22:59

and they may may go too far and it may take over.

23:03

The thing that probably worries me most, is that

23:07

if you want an intelligent agent that can get stuff done,

23:12

you need to give it the ability to create sub goals.

23:17

So if you want to go to the states, you have a sub,

23:19

goal of getting to the airport and you can focus on that sub goal

23:22

and not worry about everything else for a while.

23:25

So superintelligences will be much more effective

23:28

if they're allowed to create sub goals.

23:31

And once they are allowed to do that, they'll very quickly

23:35

realise there's an almost universal sub goal

23:38

which helps with almost everything. Which is get more control.

23:44

So I talked to a Vice President of the European Union about whether these things

23:48

these things, will want to get control so that they could do things

23:50

better, the things we wanted, so they can do it better.

23:54

Her reaction was, well why wouldn't they?

23:55

We've made such a mess of it.

23:57

So she took that for granted.

24:02

So they're going to have the sub go to getting more power

24:03

so they're more effective at achieving things that are beneficial for us

24:07

and they'll find it easier to get more power

24:10

because they'll be able to manipulate people.

24:12

So Trump, for example, could invade the Capital without ever going there himself.

24:16

Just by talking, he could invade the capital.

24:19

And these superintelligences as long as they can talk to people

24:22

when they're much smarter than us, they'll be able to persuade us to do

24:25

all sorts of things.

24:26

And so I don't think there's any hope of a big switch that turns them off.

24:30

Whoever is going to turn that switch off

24:32

will be convinced by the superintelligence.

24:34

That's a very bad idea.

24:39

Then another thing that worries many people

24:42

is what happens if superintelligences compete with each other?

24:46

You'll have evolution.

24:47

The one that can grab the most resources will become the smartest.

24:52

As soon as they get any sense of self-preservation,

24:56

then you'll get evolution occurring.

24:58

The ones with more sense of self-preservation

25:00

will win and the more aggressive ones will win.

25:02

And then you get all the problems that jumped up

25:05

Chimpanzees like us have. Which is we evolved in small tribes

25:08

and we have lots of aggression and competition with other tribes.

25:15

And I want to finish by talking a bit about

25:19

an epiphany I had at the beginning of 2023.

25:23

I had always thought

25:26

that we were a long, long way away from superintelligence.

25:33

I used to tell people 50 to 100 years, maybe 3o to 100 years.

25:37

It's a long way away. We don't need to worry about it now.

25:41

And I also

25:42

thought that making our models more like the brain would make them better.

25:46

I thought the brain was a whole lot better than the AI we had,

25:49

and if we could make AI a bit more like the brain,

25:51

for example, by having three timescales,

25:54

most of the models we have at present have just two timescales.

25:57

One for the changing of the weights, which is slow

26:00

and one for the words coming in, which is fast, changing the neural activities.

26:04

So the changes in neural activities and changing in weights, the brain has more

26:08

timescales than that. The brain has rapid changes in weight that quickly decayed away.

26:13

And that's probably how it does a lot of short term memory.

26:15

And we don't have that in our models

26:17

for technical reasons to do with being able to do matrix

26:19

matrix multiplies.

26:22

I still believe that if once

26:24

we got that into our models they'd get better, but

26:29

because of what I was doing for the two years previous to that,

26:33

I suddenly came to believe that maybe the things we've got now,

26:37

the digital models, we've got now, are already

26:41

very close to as good as brains and will get to be much better than brains.

26:45

Now I'm going to explain why I believe that.

26:49

So digital computation is great.

26:52

You can run the same program on different computers, different piece of hardware

26:56

or the same neural net on different pieces of hardware.

26:58

All you have to do is save the weights, and that means it's immortal

27:02

once you've got some weights that are immortal.

27:04

Because if the hardware dies, as long as you've got the weights,

27:06

you can make more hardware and run the same neural net.

27:11

But to do that,

27:13

we run transistors at very high power, so they behave digitally

27:17

and we have to have hardware that does exactly what you tell it to.

27:21

That was great

27:22

when we were instructing computers by telling them exactly how to do things,

27:26

but we've now got

27:28

another way of making computers do things.

27:31

And so now we have the possibility of using all the very rich analogue

27:35

properties of hardware to get computations done at far lower energy.

27:40

So these big language models, when the training use like megawatts

27:44

and we use 30 watts.

27:50

So because we know how to train things,

27:54

maybe we could use analogue hardware

27:58

and every piece of hardware is a bit different, but we train it

28:01

to make use of its peculiar properties, so that it does what we want.

28:05

So it gets the right output for the input.

28:09

And if we do that, then we can abandon the idea

28:12

that hardware and software have to be separate.

28:16

We can have weights that only work in that bit of hardware

28:20

and then we can be much more energy efficient.

28:24

So I started thinking

28:25

about what I call mortal computation, where you've abandoned that distinction

28:29

between hardware and software using very low power analogue computation.

28:33

You can parallelize over trillions of weights that are stored as conductances.

28:40

And what's more, the hardware doesn't need to be nearly so reliable.

28:44

You don't need to have hardware that

28:45

at the level of the instructions would always do what you tell it to.

28:48

You can have goopy hardware that you grow

28:52

and then you just learn to make it do the right thing.

28:55

So you should be able

28:56

to use hardware much more cheaply, maybe even

29:00

do some genetic engineering on neurons

29:02

to make it out of recycled neurons.

29:06

I want to give you one example of how this is much more efficient.

29:10

So the thing you're doing in neural networks all the time is taking a vector

29:14

of neural activities, and multiplying it by a matrix of weights, to get the vector

29:18

of neural activities in the next lane, at least get the inputs to the next lane.

29:23

And so a vector matrix multiplies the thing you need to make efficient.

29:28

So the way we do it in the digital

29:30

computer, is we have these transistors that are driven a very high power

29:35

to represent bits in say, a 32 bit number

29:39

and then to multiply two 32 bit numbers, you need to perform.

29:43

I never did any computer science courses, but I think you need to perform about 1000

29:47

1 bit digital operations.

29:48

It's about the square of the bitary.

29:51

If you want to do it fast.

29:54

So you do lots of these digital operations.

29:58

There's a much simpler way to do it, which is you make a neural activity,

30:01

be a voltage, you make a weight to be a conductance and a voltage times

30:06

a conductance is a charge, per unit time

30:09

and charges just add themselves up.

30:11

So you can do your vector matrix

30:14

multiply just by putting some voltages through some conductances.

30:18

And what comes into each neuron in the next layer will be the product

30:22

of this vector with those weights.

30:25

That's great.

30:26

It's hugely more energy efficient.

30:28

You can buy chips to do that already, but every time you do

30:32

it'll be just slightly different.

30:35

Also, it's hard to do nonlinear

30:37

things like this.

30:40

So the several big problems with mortal computation,

30:44

one is

30:45

that it's hard to use back propagation because if you're making use

30:49

of the quirky analogue properties of a particular piece of hardware,

30:53

you can assume the hardware doesn't know its own properties.

30:57

And so it's now hard to use the back propagation.

31:00

It's much easier to use reinforcement algorithms that tinker with weights

31:03

to see if it helps.

31:04

But they're very inefficient. For small networks.

31:08

We have come up with methods that are about as efficient as back propagation,

31:12

a little bit worse.

31:14

But these methods don't yet scale up, and I don't know if they ever will

31:17

Back propagation in a sense, is just the right thing to do.

31:20

And for big, deep networks, I'm not sure we're ever going to get

31:24

things that work as well as back propagation.

31:26

So maybe the learning algorithm in these analogue systems isn't going to be

31:29

as good as the one we have for things like large language models.

31:35

Another reason for believing that is, a large language

31:38

model has say a trillion weights, you have 100 trillion weights.

31:42

Even if you only use 10% of them for knowledge, that's ten trillion weights.

31:46

But the large language model in its trillion weights

31:49

knows thousands of times more than you do.

31:52

So it's got much, much more knowledge.

31:55

And that's partly because it's seen much, much more data.

31:57

But it might be because it has a much better learning algorithm.

32:00

We're not optimised for that.

32:02

We're not optimised for packing

32:04

lots of experience into a few connections where a trillion is a few.

32:08

We are optimized for having not many experiences.

32:13

You only live for about billion seconds.

32:16

That's assuming you don't learn anything after you're 30, which is pretty much true.

32:19

So you live for about billion seconds

32:22

and you've got 100 trillion connections,

32:26

so you've got

32:27

crazily more parameters and you have experiences.

32:30

So our brains optimise from making the best use of

32:33

not very many experiences.

32:38

Another big problem with mortal computation is that

32:41

if the software is inseparable from the hardware,

32:44

once a system is learned or if the hardware dies, you lose,

32:47

all the knowledge, it's mortal in that sense.

32:51

And so how do you get that knowledge into another mortal system?

32:55

Well, you get the old one to give a lecture

32:59

and the new ones to figure out how to change the weights in their brains.

33:04

So they would have said that.

33:06

That's called distillation.

33:07

You try and get a student model to mimic

33:10

the output of a teacher model, and that works.

33:14

But it's not that efficient.

33:16

Some of you may have noticed that universities just aren't that efficient.

33:20

It's very hard to get the knowledge from the Professor into the student.

33:26

So this distillation method,

33:28

a sentence, for example, has a few hundred bits of information, and even

33:32

if you learn optimally you can convey more than a few hundred bits.

33:36

But if you take these big digital models,

33:39

then, if you look at a bunch of agents that all have exactly

33:46

the same neural netting with exactly the same weights

33:51

and they're digital, so they

33:52

use those weights in exactly the same way

33:56

and these thousand different agents all go off

33:58

and look at different bits of the Internet and learn stuff.

34:01

And now you want each of them to know what the other one's learned.

34:06

You can achieve that by averaging the gradients, so averaging the weights

34:09

so you can get massive communication of what one agent learned to all the other agents.

34:14

So when you share the weight, so you share the gradients, you're communicating

34:18

a trillion numbers, not just a few hundred bits, but a trillion real numbers.

34:24

And so they're fantastically much better at communicating,

34:27

and that's what they have over us.

34:30

They're just much, much better at

34:33

communicating between multiple copies of the same model.

34:36

And that's why

34:36

GPT4 knows so much more than a human, it wasn't one model that did it.

34:41

It was a whole bunch of copies of the same model running on different hardware.

34:47

So my conclusion, which I don't really like,

34:53

is that digital computation

34:56

requires a lot of energy, and so it would never evolve.

34:59

We have to evolve making use of the quirks of the hardware to be very low energy.

35:05

But once you've got it,

35:07

it's very easy for agents to share

35:11

and GBT4

35:12

has thousands of times more knowledge in about 2% of the weights.

35:16

So that's quite depressing.

35:19

Biological computation is great for evolving

35:21

because it requires very little energy,

35:25

but my conclusion is

35:27

the digital computation is just better.

35:32

And so I think it's fairly clear

35:36

that maybe in the next 20 years, I'd say

35:39

with a probability of .5, in the next 20 years, it will get smarter than us

35:43

and very probably in the next hundred years it will be much smarter than us.

35:48

And so we need to think about

35:50

how to deal with that.

35:52

And there are very few examples of more intelligent

35:56

things being controlled by less intelligent things.

35:59

And one good example is a mother being controlled by baby.

36:03

Evolution has gone to a lot of work to make that happen so that the baby

36:07

survive, is very important for the baby to be able to control the mother.

36:11

But there aren't many other examples.

36:14

Some people think that we can make

36:16

these things be benevolent,

36:19

but if they get into a competition with each other,

36:22

I think they'll start behaving like chimpanzees.

36:25

And I'm not convinced you can keep them benevolent.

36:30

If they get very smart and they get any notion of self-preservation

36:35

they may decide they're more important than us.

36:38

So I finish the lecture in record time.

36:42

I think.