In conversation | Geoffrey Hinton and Joel Hellermark
Summary
TLDR这段对话涵盖了深度学习和人工智能领域的多个主题。讨论了如何选拔人才,以及直觉在选择过程中的作用。回顾了在卡内基梅隆大学和爱丁堡大学的研究经历,探讨了神经网络、深度学习和大脑工作方式之间的联系。提到了早期对人工智能的探索,包括与Terry Sejnowski和Peter Brown的合作,以及对神经网络权重调整的兴趣。还讨论了大型语言模型的潜力,以及它们如何通过寻找共同结构来编码信息,从而实现创造性的类比和推理。最后,讨论了GPU在神经网络训练中的作用,以及对未来计算发展的思考。
Takeaways
- 🧠 对话中提到了关于大脑如何学习和AI发展的深刻见解,强调了大脑学习机制与AI算法之间的联系。
- 🤖 讨论了早期对AI的探索,包括对神经网络和机器学习的兴趣,以及早期研究的挑战和失望。
- 🔍 强调了直觉在选择人才和研究方向时的重要性,以及与Ilia的合作如何推动了AI领域的发展。
- 🤝 描述了与不同学者的合作经历,如与Terry Sejnowski和Peter Brown的合作,以及这些合作如何影响了AI的进步。
- 📚 讨论了早期对哲学和生理学的失望,以及转向AI和神经网络研究的过程。
- 💡 提到了Donald Hebb和John von Neumann的工作对AI研究的影响,以及他们对神经网络和大脑计算的兴趣。
- 🧐 强调了对大型神经网络的直觉和早期对这些模型潜力的认识,以及它们如何能够超越简单的符号处理。
- 🔗 讨论了如何通过预测下一个符号或单词来训练模型,以及这种方法如何迫使模型进行理解。
- 🔢 描述了如何使用梯度和优化器来改进神经网络,以及Ilia在这方面的直觉如何帮助推动研究。
- 🌐 讨论了多模态学习的重要性,以及它如何帮助模型更好地理解空间和对象。
- 🚀 强调了GPU在训练大型神经网络中的作用,以及这一技术如何推动了整个AI领域的发展。
Q & A
在卡内基梅隆大学的实验室工作环境是怎样的?
-在卡内基梅隆大学,学生们在周六晚上仍然在实验室编程,因为他们相信自己在研究未来的计算机科学。这与英格兰的文化形成了鲜明对比,在那里研究人员在晚上六点后会去酒吧放松。
你在剑桥大学学习脑科学的经历如何?
-在剑桥大学学习脑科学让我感到失望,因为他们只教导神经元如何传导动作电位,而没有真正解释大脑如何工作。后来我转向哲学,但也没有找到满意的答案,最终我在爱丁堡大学学习人工智能,开始对模拟大脑运作产生兴趣。
是什么引导你对人工智能产生兴趣?
-我受到唐纳德·赫布的一本书的影响,他对如何通过学习来调整神经网络的连接强度非常感兴趣。还有冯·诺依曼的一本书也影响了我,他探讨了大脑如何计算以及与普通计算机的不同。
你与特里·塞诺夫斯基的合作如何?
-我与特里·塞诺夫斯基在博尔兹曼机研究上的合作非常紧密,我们每月会面,共同研究和讨论。尽管许多技术成果很有趣,但最终我们认为这并不是大脑工作的方式。
伊利亚·苏茨克维尔第一次找你时的情景是怎样的?
-伊利亚第一次找我时是一个周日,他敲门并告诉我他暑假在炸薯条,但更想在我的实验室工作。我让他预约,但他当时就想讨论。后来他证明了自己的直觉和数学能力都非常出色。
你对大规模语言模型的看法是什么?
-我认为大规模语言模型通过预测下一个符号,迫使模型理解已经被说过的话。虽然有人认为这些模型只是简单地预测下一个符号,但实际上它们在预测过程中需要进行一定的推理,从而逐渐变得更有创意和智能。
你对多模态模型的看法是什么?
-多模态模型可以通过结合视觉、声音等多种数据源,提升对空间和物体的理解能力。这种模型不仅可以从语言中学习,还可以通过视频和图像数据,显著提高其推理和理解能力。
你对人工智能发展方向的主要关注点是什么?
-我对人工智能的主要关注点包括其在医疗保健中的应用和可能的社会影响。AI有潜力大大提高医疗效率,但也有可能被恶意使用,如大规模监控或操纵舆论。这需要我们在发展技术的同时,谨慎对待其潜在的负面影响。
你对大规模神经网络训练的见解是什么?
-我认为大规模神经网络训练中,反向传播是一种正确的做法,通过获取梯度来调整参数,这种方法在实际应用中非常成功。尽管可能存在其他替代方法,但反向传播在理论和实践中都被证明是有效的。
你认为如何有效选择和培养人才?
-在选择和培养人才时,有时直觉非常重要。例如,与伊利亚的初次会面让我感受到他的才华。此外,我认为实验室需要多样化的学生,有些学生在技术上非常强,而有些则非常有创意。不同类型的学生共同合作可以带来更好的研究成果。
Outlines
🤖 人工智能与神经网络的探索之旅
本段落讲述了一位研究者在卡内基梅隆大学的经历,以及他对人工智能和神经网络的早期探索。他回忆了在英国剑桥学习生理学和哲学时的失望,因为这些学科没有解答他关于大脑如何工作的疑惑。他最终转向了爱丁堡大学研究人工智能,并被Donald Hebb和John von Neumann关于神经网络和大脑计算方式的书籍所吸引。他认为大脑学习的方式不是通过逻辑规则,而是通过改变神经网络中的连接强度。
👨💼 研究合作与直觉在选择人才中的作用
这段落讲述了研究者与非卡内基梅隆大学的Terry Sejnowski的合作经历,以及他们如何共同研究神经网络。研究者强调了选择人才时直觉的重要性,分享了他如何通过直觉选择Ilia这样的学生,并强调了Ilia在数学和AI方面的早期兴趣和直觉。研究者还提到了与Peter Brown的合作,他是一位统计学家,对隐藏马尔可夫模型的研究产生了重要影响。
🚀 神经网络与直觉的碰撞
这段对话展示了研究者与Ilia之间的互动,Ilia是一位对神经网络和优化器有深刻见解的学生。他们讨论了梯度下降和函数优化器的使用,以及Ilia如何迅速理解并质疑现有的神经网络训练方法。研究者分享了他与Ilia合作的乐趣,以及他们如何共同解决问题,推动了人工智能领域的发展。
🧠 神经网络的学习和理解
研究者讨论了神经网络如何通过预测下一个符号来学习语言,他认为这种方法迫使模型进行理解,从而实现类似于人类的推理。他强调了大型神经网络能够进行推理,并可能随着规模的增大而变得更加创造性。此外,他还提到了AlphaGo的例子,展示了在特定领域内,通过强化学习可以实现超越现有知识的创新。
🔍 神经网络的推理与多模态学习
这段对话探讨了神经网络如何通过扩展到多模态数据(如图像、视频和声音)来增强其理解和推理能力。研究者认为,多模态学习将使模型在空间理解方面变得更加强大,并有助于发现不同领域之间的深层联系。他还讨论了人类大脑是否为语言而进化,以及语言如何与认知相互作用。
💡 神经网络的创新与未来发展
研究者分享了他对于神经网络未来发展的看法,他认为神经网络将通过发现不同事物之间的共同结构来提高效率,并可能在创造力方面超越人类。他还讨论了如何通过训练模型进行自我校正,以提高其推理能力,并预测了多模态模型将如何改变AI领域。
🔧 神经网络的计算与硬件发展
这段对话回顾了研究者如何预见到GPU在训练神经网络中的潜力,并分享了他在这方面的早期工作。他还讨论了未来计算的发展方向,包括模拟计算和数字计算的优劣,以及如何使AI系统更加高效和节能。
🌟 神经网络与大脑的时间尺度
研究者探讨了大脑与神经网络在时间尺度上的差异,指出大脑有多个时间尺度来改变权重,而当前的神经网络模型通常只有两个。他认为未来神经网络需要引入更多时间尺度的概念,以更接近大脑的工作方式。
🤔 神经网络的意识与情感
这段对话讨论了神经网络是否能够模拟人类的意识和情感。研究者认为,如果神经网络能够进行自我反思并拥有持久的记忆,它们可能会发展出类似人类的情感体验。他还分享了自己对于情感和意识的看法,以及它们如何与行动和约束相关联。
🎯 神经网络研究的未来方向
研究者分享了他对于神经网络研究未来方向的看法,包括对大脑是否使用反向传播的好奇,以及对多时间尺度学习的兴趣。他还讨论了如何选择合适的研究问题,并强调了好奇心在驱动研究中的重要性。
🏆 神经网络研究的成就与反思
在这段对话中,研究者反思了自己在神经网络领域的成就,特别是他在Boltzmann机器学习算法方面的工作。他表达了对这项工作的自豪感,即使它在实践中可能并不完美。他还讨论了对当前研究的关注点,以及对未来的思考。
Mindmap
Keywords
💡神经网络
💡直觉
💡梯度
💡反向传播
💡多模态数据
💡隐马尔可夫模型
💡创造力
💡自我学习
💡模拟
💡隐喻
💡意识
Highlights
在卡内基梅隆大学,学生和研究人员对计算机科学的未来充满信念,认为他们的工作将改变计算机科学的进程。
在剑桥大学学习生理学时,对大脑工作原理的失望,因为所学仅限于神经元如何传导动作电位,并未解释大脑如何工作。
转向哲学寻求对心智工作原理的理解,但同样感到失望。
在爱丁堡大学研究人工智能(AI),感到更加有趣,因为可以通过模拟来测试理论。
Donald Hebb的书籍对理解神经网络中的连接强度学习有重要影响。
John von Neumann的书籍对大脑计算方式与常规计算机的差异感兴趣。
在爱丁堡时期,对大脑如何通过修改神经网络中的连接进行学习有坚定信念。
与John Hopkins的Terry Sejnowski的合作,共同研究神经网络和大脑工作原理。
与统计学家Peter Brown的合作,学习了关于隐藏马尔可夫模型的知识。
Ilia的到来和对反向传播算法的直觉,提出了将梯度信息提供给优化器的想法。
Ilia的独立思考能力和对AI的早期兴趣对其直觉发展的影响。
在AI研究中,数据规模和计算规模的增加比新算法更为重要。
使用字符级预测的论文展示了深度学习模型在理解文本方面的能力。
深度学习模型通过预测下一个符号来理解问题,并非简单的符号预测。
大型语言模型通过寻找共同结构来编码信息,从而提高效率。
多模态模型的发展,将提高模型在空间理解、推理和创造力方面的能力。
关于语言与认知的关系,存在三种不同的观点,其中最新的认为语言符号被转化为丰富的嵌入向量,并通过这些向量的交互来理解语言。
使用GPU进行神经网络训练的早期直觉和对计算领域的影响。
关于是否使用快速权重(fast weights)的讨论,以及它们在大脑中的潜在作用。
关于意识模拟的讨论,以及AI助手可能发展出类似人类的情感和自我反思能力。
对于如何选择合适的研究问题,强调了好奇心驱动的研究和对普遍认同观点的质疑。
对于神经网络和大脑是否使用反向传播的长期疑问,以及这对未来研究的启示。
对于AI技术可能带来的负面影响的担忧,包括被用于不良目的如杀手机器人、操纵公众舆论或大规模监控。
对于AI研究过程中AI助手可能带来的影响,包括提高研究效率和帮助思考。
对于直觉发展的看法,强调了对事实的批判性接受和信任自己的直觉。
对于当前AI领域的研究方向,认为大型模型和多模态数据训练是一个有前景的方向。
对于个人最自豪的成就,认为是开发Boltzmann机器的学习算法,尽管它可能在实践中不切实际。
Transcripts
have
you reflected a lot on how to select
Talent or has that mostly been like
intuitive to you Ilia just shows up and
you're like this is a clever guy let's
let's work together or have you thought
a lot about that can we are we recording
should we should we roll This yeah let's
roll this okay we're good yeah
yeah
okay s is working
so I remember when I first got to K
melon from England in England at a
Research Unit it would get to be 6:00
and you'd all go for a drink in the pub
um at Caril melon I remember after I've
been there a few weeks it was Saturday
night I didn't have any friends yet and
I didn't know what to do so I decided
I'd go into the lab and do some
programming because I had a list machine
and you couldn't program it from home so
I went into the lab at about 9:00 on a
Saturday night and it was swarming all
the students were there and they were
all there because what they were working
on was the future they all believed that
what they did next was going to change
the course of computer science and it
was just so different from England and
so that was very refreshing take me back
to the very beginning Jeff at Cambridge
uh trying to understand the brain uh
what was that like it was very
disappointing so I did physiology and in
the summer term they were going to teach
us how the brain worked and it all they
taught us was how neurons conduct action
potentials which is very interesting but
it doesn't tell you how the brain works
so that was extremely disappointing I
switched to philosophy then I thought
maybe they'd tell us how the mind worked
um that was very disappointing I
eventually ended up going to Edinburgh
to do Ai and that was more interesting
at least you could simulate things so
you could test out theories and did you
remember what intrigued you about AI was
it a paper was it any particular person
that exposed you to those ideas I guess
it was a book I read by Donald Hebb that
influenced me a lot um he was very
interested in how you learn the
connection strengths in neural Nets I
also read a book by John Fon noyman
early on um who was very interested in
how the brain computes and how it's
different from normal computers and did
you get that conviction that this ideas
would work out at at that point or what
would was your intuition back at the
Edinburgh days it seemed to me there has
to be a way that the brain
learns and it's clearly not by having
all sorts of things programmed into it
and then using logical rules of
inference that just seemed to me crazy
from the outset um so we had to figure
out how the brain learned to modify
Connections in a neural net so that it
could do complicated things and Fon
Norman believed that churing believed
that so Forman and churing were both
pretty good at logic but they didn't
believe in this logical approach and
what was your split between studying the
ideas from from
neuroscience and just doing what seemed
to be good algorithms for for AI how
much inspiration did you take early on
so I never did that much study of
Neuroscience I was always inspired by
what I'd learned about how the brain
works that there's a bunch of neurons
they perform relatively simple
operations they're nonlinear um but they
collect inputs they wait them and then
they an output that depends on that
weighted input and the question is how
do you change those weights to make the
whole thing do something good it seems
like a fairly simple question what
collaborations do you remember from from
that time the main collaboration I had
at Carnegie melon was with someone who
wasn't at carnegy melon I was
interacting a lot with Terry sinowski
who was in Baltimore at John's Hopkins
and about once a month either he would
drive to Pittsburg or I drive to
Baltimore it's 250 miles away and we
would spend a weekend together working
on boltimore machines that was a
wonderful collaboration we were both
convinced it was how the brain worked
that was the most exciting research I've
ever done and a lot of technical results
came out that were very interesting but
I think it's not how the brain works um
I also had a very good collaboration
with um Peter Brown who was a very good
statistician and he worked on speech
recognition at IBM and then he came as a
more mature student to kind melon just
to get a PhD um but he already knew a
lot he taught me a lot about spee
and he in fact taught me about hidden
Markov models I think I learn more from
him than he learned from me that's the
kind of student you want and when he Tau
me about hidden Markov models I was
doing back propop with hidden layers
only they weren't called hidden layers
then and I decided that name they use in
Hidden Markov models is a great name for
variables that you don't know what
they're up to um and so that's where the
name hidden in neur NS came from me and
P decided that was a great name for the
hidden hidden L and your all Nets um but
I learned a lot from Peter about speech
take us back to when Ilia showed up at
your at your office I was in my office I
probably on a Sunday um and I was
programming I think and there was a
knock on the door not just any knock but
it won't
cutter it's sort of an urgent knock so I
went and answer to the door and this was
this young student there and he said he
was cooking Fries over the summer but
he'd rather be working in my lab and so
I said well why don't you make an
appointment and we'll talk and so Ilia
said how about now and that sort of was
Ila's character so we talked for a bit
and I gave him a paper to read which was
the nature paper on back
propagation and we made another meeting
for a week later and he came back and he
said I didn't understand it and I was
very disappointed I thought he seemed
like a bright guy but it's only the
chain rule it's not that hard to
understand and he said oh no no I
understood that I just don't understand
why you don't give the gradient to a
sensal a sensible function
Optimizer which took us quite a few
years to think about um and it kept on
like that with a he had very good his
raw intuitions about things were always
very good what do you think had enabled
those uh those intuitions for for Ilia I
don't know I think he always thought for
himself he was always interested in AI
from a young age um he's obviously good
at math so but it's very hard to know
and what was that collaboration between
between the two of you like what part
would you play and what part would Ilia
play it was a lot of fun um I remember
one occasion when we were trying to do a
complicated thing with producing maps of
data where I had a kind of mixture model
so you could take the same bunch of
similarities and make two maps so that
in one map Bank could be close to Greed
and in another map Bank could be close
to River um cuz in one map you can't
have it close to both right cuz River
and greed along wayon so we'd have a
mixture maps and we were doing it in mat
lab and this involved a lot of
reorganization of the code to do the
right Matrix multiplies and only got fed
up with that so he came one day and said
um I'm going to write a an interface for
Matlab so I program in this different
language and then I have something that
just converts it into Matlab and I said
no Ilia um that'll take you a month to
do we've got to get on with this project
don't get diverted by that and I said
it's okay I did it this
morning and that's that's quite quite
incredible and throughout those those
years the biggest shift wasn't
necessarily just the the algorithms but
but also the the skill how did you sort
of view that skill uh over over the
years Ilia got that intuition very early
so Ilia was always preaching that um you
just make it bigger and it'll work
better and I always thought that was a
bit of a copout do you going to have to
have new ideas too it turns out I was
basically right new ideas help things
like Transformers helped a lot but it
was really the scale of the data and the
scale of the computation and back then
we had no idea computers would get like
a billion times faster we thought maybe
they' get a 100 times faster we were
trying to do things by coming up with
clever ideas that would have just solved
themselves if we had had bigger scale of
the data and computation in about
2011 Ilia and another graduate student
called James Martins and
had a paper using character level
prediction so we took Wikipedia and we
tried to predict the next HTML character
and that worked remarkably well and we
were always amazed at how well it worked
and that was using a fancy Optimizer on
gpus and we could never quite believe
that it understood anything but it
looked as though it
understood and that just seemed
incredible can you take us through how
are do models trained to predict the
next word and why is it the wrong way of
of thinking about them okay I don't
actually believe it is the wrong way so
in fact I think I made the first
neuronet language model that used
embeddings and back propagation so it's
very simple data just
triples and it was turning each symbol
into an embedding then having the
embeddings interact to predict the
embedding of the next symbol and from
that predic the next symbol and then it
was back propagating through that whole
process to learn these triples and I
showed it could generalize um about 10
years later Yoshua Benji used a very
similar Network and showed it work with
real text and about 10 years after that
linguist started believing in embeddings
it was a slow process the reason I think
it's not just predicting the next symbol
is if you ask well what does it take to
predict the next symbol particularly if
you ask me a question and then the first
word of the answer is the next symbol um
you have to understand the question so I
think by predicting the next
symbol it's very unlike oldfashioned
autocomplete oldfashioned autocomplete
you'd store sort of triples of words and
then if you sort a pair of words you see
how often different words came third and
that way you can predict the next symbol
and that's what most people think auto
complete is like it's no longer at all
like that um to predict the next symbol
you have to understand what's been said
so I think you're forcing it to
understand by making it predict the next
symbol and I think it's understanding in
much the same way we are so a lot of
people will tell you these things aren't
like us um they're just predicting the
next symbol they're not reasoning like
us but actually in order to predict the
next symbol it's have going to have to
do some reasoning and we've seen now
that if you make big ones without
putting in any special stuff to do
reasoning they can already do some
reasoning and I think as you make them
bigger they're going to be able to do
more and more reasoning do you think I'm
doing anything else than predicting the
next symbol right now I think that's how
you're learning I think you're
predicting the next video frame um
you're predicting the next sound um but
I think that's a pretty plausible theory
of how the brain's learning what enables
these models to learn such a wide
variety of of fields what these big
language models are doing is they
looking for common structure and by
finding common structure they can encode
things using the common structure and
that more efficient so let me give you
an example if you ask
gp4 why is a compost heap like an atom
bomb most people can't answer that most
people haven't thought they think atom
bombs and compost heeps are very
different things but gp4 will tell you
well the energy scales are very
different and the time scales are very
different but the thing that's the same
is that when the compost Heep gets
hotter it generates heat faster and when
the atom bomb produces more NE neutrons
it produces more neutrons faster
and so it gets the idea of a chain
reaction and I believe it's understood
they're both forms of chain reaction
it's using that understanding to
compress all that information into its
weights and if it's doing that then it's
going to be doing that for hundreds of
things where we haven't seen the
analogies yet but it has and that's
where you get creativity from from
seeing these analogies between
apparently very different things and so
I think gp4 is going to end up when it
gets bigger being very creative I think
this idea that it's just just
regurgitating what it's learned just
pasing together text it's learned
already that's completely wrong it's
going to be even more creative than
people I think you'd argue that it won't
just repeat the human knowledge we've
developed so far but could also progress
beyond that I think that's something we
haven't quite seen yet we've started
seeing some examples of it but to a to a
large extent we're sort of still at the
current level of of of science what do
you think will enable it to go beyond
that well we've seen that in more
limited context like if you take Alpha
go in that famous competition with Leo
um there was move 37 where Alpha go made
a move that all the experts said must
have been a mistake but actually later
they realized it was a brilliant move um
so that was created within that limited
domain um I think we'll see a lot more
of that as these things get bigger the
difference with alphao as well was that
it was using reinforcement learning that
that subsequently sort of enabled it to
to go beyond the current state so it
started with imitation learning watching
how humans play the game and then it
would through selfplay develop Way
Beyond that do you think that's the
missing component of the I think that
may well be a missing component yes that
the the self-play in Alpha in Alpha go
and Alpha zero are are a large part of
why it could make these creative moves
but I don't think it's entirely
necessary
so there's a little experiment I did a
long time ago where you your training in
neuronet to recognize handwritten digits
I love that example the mest example and
you give it training data where half the
answers are
wrong um and the question is how well
will it
learn and you make half the answers
wrong once and keep them like that so it
can't average away the wrongness by just
seeing the same example but with the
right answer sometimes and the wrong
answer sometimes when it sees that
example half half of the examples when
it sees the example the answer is always
wrong and so the training data has 50%
error but if you train up back
propagation it gets down to 5% error or
less other words from badly labeled data
it can get much better results it can
see that the training data is wrong and
that's how smart students can be smarter
than their advisor and their advisor
tells them all this stuff
and for half of what their advisor tells
them they think no rubbish and they
listen to the other half and then they
end up smarter than the advisor so these
big neural Nets can actually do they can
do much better than their training data
and most people don't realize that so
how how do you expect this models to add