NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)
Summary
TLDR这段视频展示了计算行业的重大变革,尤其是加速计算和生成式AI的兴起。演讲强调了软件开发的新模式,即计算机工程师与AI超级计算机共同编程,及其对各行各业的深远影响。介绍了价值20万美元的H100计算系统和其生产过程,强调了其在大数据处理和AI领域的应用潜力。此外,还探讨了加速计算的挑战、发展历程和未来方向,以及Nvidia如何通过创新的Grace Hopper超级芯片和AI工厂概念来重新定义数据中心,以应对计算需求的爆炸性增长。视频展望了AI技术如何塑造未来,减少能耗并提高计算效率。
Takeaways
- 🚀 软件开发已经迈入新时代,不再仅由计算机工程师独立完成,而是与AI超级计算机共同协作。
- 🌐 我们已经达到加速计算和生成式AI的转折点,这一变化正在全面影响各个行业。
- 💡 H100计算机的生产标志着这一变革的具体实现,它集成了35,000个组件和8个Hopper GPUs,重达60-65磅,需要机器人操作。
- 💰 H100是世界上最昂贵的单台计算机,售价20万美元,但它能替代整个计算机房的工作,体现了“越买越省”的概念。
- 🔬 加速计算和生成式AI重新定义了软件开发和计算方式,这一转变耗时近三十年才实现。
- 🏭 NVIDIA的加速计算用于大型语言模型,展示了通过GPU服务器实现的成本效率和性能提升。
- 🌟 引入Grace Hopper超级芯片,它集成了近2000亿个晶体管,代表了加速处理器的新阶段,具有巨大的内存和高速度。
- 📈 NVIDIA通过创建能看到144TB内存的超级计算系统,将256个Grace Hopper Super Chips连接成一个拥有1 ExaFLOPS的系统。
- 🌍 每个主要公司未来都将拥有AI工厂,生产和提炼公司的智能,这是加速计算和AI技术的直接应用。
- 🔧 NVIDIA MGX提出了一个开放的模块化服务器设计规范,旨在满足加速计算的需求,支持多代技术的平滑过渡。
Q & A
H100是如何生产的?
-H100的生产涉及到35,000个组件被集成到系统板上,包括8个Hopper GPUs。其组装过程高度依赖机器人,因为产品本身重达60至65磅,且插入压力极高,需要精确操作。
为什么说H100是目前世界上最昂贵的计算机之一?
-H100的价格高达200,000美元,其性能强大到可以替代一个满是其他计算机的房间,因此被视为世界上单个最昂贵的计算机之一。
加速计算和生成式AI在计算机行业中的转变是什么?
-这两种技术标志着计算机行业的两个基本趋势。因为CPU性能提升的天花板已到,同时深度学习的出现,加速计算和生成式AI正在重新定义软件的开发方式,推动计算能力的飞跃性增长。
为什么GPU服务器被认为是昂贵的?
-虽然GPU服务器的初始成本高,但它们提供的加速计算能力意味着更高的性能和效率,尤其是在处理大型语言模型和其他复杂计算任务时。
Nvidia AI操作系统在AI领域的作用是什么?
-Nvidia AI是世界上唯一的AI操作系统,它从数据处理到训练、优化到部署和推理,提供端到端的深度学习处理能力,是今天AI技术的核心引擎。
为什么未来每个主要公司都将拥有AI工厂?
-随着加速计算和人工智能的应用日益广泛,企业将需要拥有自己的AI工厂来生产和优化公司智能,以保持竞争力和创新能力。
Grace Hopper超级芯片的特点是什么?
-Grace Hopper超级芯片拥有近2000亿个晶体管,配备了近600GB的内存,支持CPU和GPU之间的内存共享,避免了不必要的数据复制,适用于处理非常大的数据集。
为什么说当前是一个新的计算时代?
-当前的计算时代能够理解多模态信息,并通过大型语言模型降低编程障碍,使每个人都能成为程序员。此外,它为过去和未来的应用提供了强大的支持,预示着技术和应用的巨大进步。
Nvidia如何通过Grace Hopper扩展加速计算和生成式AI?
-Nvidia通过将Grace Hopper与其他GPU连接,构建了世界上最大的GPU,DGX GH200,它拥有256个Grace Hopper超级芯片,提供了前所未有的计算能力和内存资源。
Nvidia MGX在加速计算系统设计中的创新是什么?
-Nvidia MGX是一个开放的模块化服务器设计规范,旨在满足加速计算的特殊要求,包括机械、热量和电气设计,以支持高密度计算系统,确保对未来几代GPU的兼容性和投资保值。
Outlines
💻 新的计算机产业时代
新的计算机产业时代标志着软件开发的根本变革,计算机工程师现在与AI超级计算机合作进行软件编程。我们已经达到加速计算和生成式AI的转折点,全面生产H100的实现,预示着它将影响每一个行业。H100系统板搭载了35000个组件和8个Hopper GPU,重达60-65磅,需要机器人进行搬运和精确插入。这台计算机的成本为20万美元,但其性能替代了整个计算机房的其他计算机,体现了“买得越多,节省越多”的概念。H100引入了变革性的计算方式,特别是对大型语言模型的加速计算展示了其惊人的性能。
🚀 加速计算和生成式AI的转折点
随着加速计算和生成式AI达到转折点,NVIDIA GPU的利用率达到了前所未有的高度,几乎每个云服务和数据中心都需求过剩。GPU不仅仅是图形处理单元,而是经过重新设计,极其擅长张量处理,NVIDIA AI成为了唯一的AI操作系统,从数据处理到训练、优化直至部署和推理,实现了端到端深度学习处理。通过MVLink和InfiniBand技术,将GPU互连,构建出更大规模的计算系统,加速AI研究的进展,每两年实现巨大飞跃。此外,AI超级计算机的出现,被视为新工厂,预示着未来每个大公司都将拥有自己的AI工厂,生产公司智能。
🌐 Grace Hopper超级芯片的全面生产
Grace Hopper超级芯片的生产标志着加速处理器技术的新里程碑,拥有近2000亿个晶体管,整合了巨量的高速存储,允许CPU和GPU之间的无缝数据交换,避免了不必要的数据复制。这种设计使得GPU能够处理更大的数据集,整个计算机几乎集成在一块芯片上,使用了与手机相同的低功耗DDR内存,但针对数据中心的高可靠性进行了优化。通过NVLink技术,将8个Grace Hopper Superchip连接起来,构建出一个拥有256个Grace Hopper Superchips的超级计算系统,实现了1 exaflops的深度学习计算能力,拥有144TB的共享内存,展示了前所未有的计算能力和规模。
🤖 面向未来的AI数据中心
随着AI和加速计算技术的发展,全球的数据中心将转型为支持加速计算和生成式AI的高效数据中心。NVIDIA提出了一个开放的模块化服务器设计规范,NVIDIA MGX,专为加速计算而设计,以适应不同应用和配置的需求。介绍了几种服务器配置,包括Omniverse OVX服务器和Grace Hopper液冷服务器等,旨在提高计算密度和效率。同时,强调了在相同性能下Grace Superchip服务器相比传统CPU服务器能够大幅降低功耗,推动AI技术进入新的领域,标志着数据中心将成为未来计算的核心。
🌟 AI和加速计算的未来展望
展望未来,AI和加速计算的融合将使每个云数据中心都能成为一个AI数据中心,推动AI技术向企业和各行各业的广泛扩展。通过Spectrum X组件和NVIDIA MGX模块化加速计算系统,NVIDIA旨在将生成式AI引入全球企业,与台湾生态系统合作,推动硬件和软件的创新。感谢过去几年的合作伙伴关系,NVIDIA对于未来AI数据中心的发展充满信心,预示着加速计算和生成式AI技术将继续引领行业发展,实现更广泛的应用和影响。
Mindmap
Keywords
💡accelerated computing
💡generative AI
💡Grace Hopper
💡H100
💡hyperscale
💡AI supercomputer
💡modular design
💡NVLink
💡BlueField
💡full stack
Highlights
第一个重要的要点文字
第二个显着的要点文字
Transcripts
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers we have now
reached the Tipping Point of accelerated
Computing we have now reached the
Tipping Point of generative Ai and we
are so so so excited to be in full
volume production of the h100 this is
going to touch literally every single
industry let's take a look at how h100
is produced
[Music]
okay
[Music]
35
000 components on that system board
eight
Hopper gpus
let me show it to you
all right this
I would I would lift this but I I um I
still have the rest of the keynote I
would like to give this is 60 pounds 65
pounds it takes robots to lift it of
course and it takes robots to insert it
because the insertion pressure is so
high and has to be so perfect
this computer is two hundred thousand
dollars and as you know it replaces an
entire room of other computers it's the
world's single most expensive computer
that you can say the more you buy the
more you save
this is what a compute trade looks like
even this is incredibly heavy
see that
so this is the brand new h100 with the
world's first computer that has a
Transformer engine in it
the performance is utterly incredible
there are two fundamental transitions
happening in the computer industry today
all of you are deep within it and you
feel it there are two fundamental Trends
the first trend is because CPU scaling
has ended the ability to get 10 times
more performance every five years has
ended the ability to get 10 times more
performance every five years at the same
cost is the reason why computers are so
fast today
that trend has ended it happened at
exactly the time when a new way of doing
software was discovered deep learning
these two events came together and is
driving Computing today
accelerated Computing and generative AI
of doing software just a way of doing
computation is a reinvention from the
ground up and it's not easy accelerated
Computing has taken us nearly three
decades to accomplish
well this is how accelerated Computing
works
this is accelerated Computing used for
large language models basically the core
of generative AI this example is a 10
million dollar server and so 10 million
dollars gets you nearly a thousand CPU
servers and to train to process this
large language model takes 11 gigawatt
hours 11 gigawatt hours okay and this is
what happens when you accelerate this
workload with accelerated Computing and
so with 10 million dollars for a 10
million dollar server you buy 48 GPU
servers it's the reason why people say
that GPU servers are so expensive
remember people say GPS servers are so
expensive however the GPU server is no
longer the computer the computer is the
data center
your goal is to build the most cost
effective data center not build the most
cost effective server
back in the old days when the computer
was the server that would be a
reasonable thing to do but today the
computer is the data center so for 10
million dollars you buy 48 GPU servers
it only consumes 3.2 gigawatt hours and
44 times the performance
let me just show it to you one more time
this is before and this is after and
this is
we want dense computers not big ones we
want dense computers fast computers not
big ones let me show you something else
this is my favorite
if your goal if your goal is to get the
work done
and this is the work you want to get
done ISO work
okay this is ISO work all right look at
this
look at this look at this before
after you've heard me talk about this
for so many years
in fact every single time you saw me
I've been talking to you about
accelerated computing
and now
why is it that finally it's the Tipping
Point because we have now addressed so
many different domains of science so
many Industries and in data processing
in deep learning classical machine
learning
so many different ways for us to deploy
software from the cloud to Enterprise to
Super Computing to the edge
so many different configurations of gpus
from our hgx versions to our Omniverse
versions to our Cloud GPU and Graphics
version so many different versions now
the utilization is incredibly High
the utilization of Nvidia GPU is so high
almost every single cloud is
overextended almost every single data
center is overextended there are so many
different applications using it so we
have now reached the Tipping Point of
accelerated Computing we have now
reached the Tipping Point of generative
AI
people thought that gpus would just be
gpus they were completely wrong we
dedicated ourselves to Reinventing the
GPU so that it's incredibly good at
tensor processing and then all of the
algorithms and engines that sit on top
of these computers we call Nvidia AI the
only AI operating system in the world
that takes data processing from data
processing to training to optimization
to deployment and inference
end to end deep learning processing it
is the engine of AI today
we connected gpus to other gpus called
mvlink build one giant GPU and we
connected those gpus together using
infiniband into larger scale computers
the ability for us to drive the
processor and extend the scale of
computing
made it possible
for the AI research organization the
community to advance AI at an incredible
rate
so every two years we take giant leaps
forward and I'm expecting the next lead
to be giant as well
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers these AI
supercomputers
are a new type of factory
it is very logical that a car industry
has factories they build things so you
can see cars
it is very logical that computer
industry has computer factories you
build things that you can see computers
in the future
every single major company will also
have ai factories
and you will build and produce your
company's intelligence
and it's a very sensible thing
we are intelligence producers already
it's just that the intelligence
producers the intelligence are people in
the future we will be intelligence
producers artificial intelligence
producers and every single company will
have factories and the factories will be
built this way
using accelerated Computing and
artificial intelligence we accelerated
computer Graphics by 1 000 times in five
years
Moore's Law is probably currently
running at about two times
a thousand times in five years a
thousand times in five years is one
million times in ten we're doing the
same thing in artificial intelligence
now question is what can you do when
your computer is one million times
faster
what would you do if your computer was
one million times faster well it turns
out that we can now apply the instrument
of our industry to so many different
fields that were impossible before
this is the reason why everybody is so
excited
there's no question that we're in a new
Computing era
there's just absolutely no question
about it every single Computing era you
could do different things that weren't
possible before and artificial
intelligence certainly qualifies this
particular Computing era is special in
several ways one
it is able to understand information of
more than just text and numbers it can
Now understand multi-modality which is
the reason why this Computing Revolution
can impact every industry
every industry two
because this computer
doesn't care how you program it
it will try to understand what you mean
because it has this incredible large
language model capability and so the
programming barrier is incredibly low we
have closed the digital divide
everyone is a programmer now you just
have to say something to the computer
third
this computer
not only is it able to do amazing things
for the for the future
it can do amazing things for every
single application of the previous era
which is the reason why all of these
apis are being connected into Windows
applications here and there in browsers
and PowerPoint and word every
application that exists will be better
because of AI
you don't have to just AI this
generation this Computing era does not
need
new applications it can succeed with old
applications and it's going to have new
applications
the rate of progress the rate of
progress because it's so easy to use
is the reason why it's growing so fast
this is going to touch literally every
single industry and at the core with
just as with every single Computing era
it needs a new Computing approach
the last several years I've been talking
to you about the new type of processor
we've been creating
and this is the reason we've been
creating it
ladies and gentlemen
Grace Hopper is now in full production
this is Grace Hopper
nearly 200 billion transistors in this
computer oh
foreign
look at this this is Grace Hopper
this this processor
this processor is really quite amazing
there are several characteristics about
it this is the world's first accelerated
processor
accelerated Computing processor that
also has a giant memory it has almost
600 gigabytes of memory that's coherent
between the CPU and the GPU and so the
GPU can reference the memory the CPU can
represent reference the memory and
unnecessary any unnecessary copying back
and forth could be avoided
the amazing amount of high-speed memory
lets the GPU work on very very large
data sets this is a computer this is not
a chip practically the Entire Computer
is on here all of the Lo this is uh uses
low power DDR memory just like your cell
phone except this has been optimized and
designed for high resilience data center
applications so let me show you what
we're going to do so the first thing is
of course we have the Grace Hopper
Superchip
put that into a computer the second
thing that we're going to do is we're
going to connect eight of these together
using ndlink this is an Envy link switch
so eight of this eight of this Connect
into three switch trays into eight eight
Grace Hopper pod
these eight Grace Hopper pods each one
of the grace Hoppers are connected to
the other Grace Hopper at 900 gigabytes
per second
Aid them connected together
as a pod and then we connect 32 of them
together
with another layer of switches
and in order to build in order to build
this
256 Grace Hopper Super Chips connected
into one exoflops one exaflops you know
that countries and Nations have been
working on exaflops Computing and just
recently achieved it
256 Grace Hoppers for deep learning is
one exaflop Transformer engine and it
gives us
144 terabytes of memory that every GPU
can see
this is not 144 terabytes distributed
this is 144 terabytes connected
why don't we take a look at what it
really looks like play please
foreign
[Applause]
this
is
150 miles of cables
fiber optic cables
2 000 fans
70
000 cubic feet per minute
it probably
recycles the air in this entire room in
a couple of minutes
forty thousand pounds
four elephants
one GPU
if I can get up on here this is actual
size
so this is this is our brand new
Grace Hopper AI supercomputer it is one
giant GPU
utterly incredible we're building it now
and we're so we're so excited that
Google Cloud meta and Microsoft will be
the first companies in the world to have
access
and they will be doing
exploratory research on the pioneering
front the boundaries of artificial
intelligence with us so this is the dgx
gh200 it is one giant GPU
okay I just talked about how we are
going to extend the frontier of AI
data centers all over the world and all
of them over the next decade will be
recycled
and re-engineered into accelerated data
centers and generative AI capable data
centers but there are so many different
applications in so many different areas
scientific computing
data processing cloud and video and
Graphics generative AI for Enterprise
and of course the edge each one of these
applications have different
configurations of servers
different focus of applications
different deployment methods and so
security is different operating system
is different how it's managed it's
different
well this is just an enormous number of
configurations and so today we're
announcing in partnership with so many
companies here in Taiwan the Nvidia mgx
it's an open modular server design
specification and the design for
Accelerated Computing most of the
servers today are designed for general
purpose Computing the mechanical thermal
and electrical is insufficient for a
very highly dense Computing system
accelerated computers take as you know
many servers and compress it into one
you save a lot of money you save a lot
of floor space but the architecture is
different and we designed it so that
it's multi-generation standardized so
that once you make an investment our
next generation gpus and Next Generation
CPUs and next generation dpus will
continue to easily configure into it so
that we can best time to Market and best
preservation of our investment different
data centers have different requirements
and we've made this modular and flexible
so that it could address all of these
different domains now this is the basic
chassis let's take a look at some of the
other things you can do with it this is
the Omniverse ovx server
it has x86 four l40s Bluefield three two
CX-7 six PCI Express Lots this is the
grace Omniverse server
Grace same for l40s BF3 Bluefield 3 and
2 cx-7s okay this is the grace Cloud
Graphics server
this is the hopper NV link generative AI
inference server
and of course Grace Hopper liquid cooled
okay for very dense servers and then
this one is our dense general purpose
Grace Superchip server this is just CPU
and has the ability to accommodate four
CPU four gray CPUs or two gray
Superchips enormous amounts of
performance in ISO performance Grace
only consumes 580 Watts for the whole
for the whole server versus the latest
generation CPU servers x86 servers 1090
Watts it's basically half the power at
the same performance or another way of
saying
you know at the same power if your data
center is power constrained you get
twice the performance most data centers
today are power limited and so this is
really a terrific capability
we're going to expand AI into a new
territory
if you look at the world's data centers
the data center is now the computer and
the network defines what that data
center does largely there are two types
of data centers today there's the data
center that's used for hyperscale where
you have application workloads of all
different kinds the number of CPUs you
the number of gpus you connect to it is
relatively low the number of tenants is
very high the workloads are Loosely
coupled
and you have another type of data center
they're like super Computing data
centers AI supercomputers where the
workloads are tightly coupled
the number of tenants far fewer and
sometimes just one
its purpose is high throughput on very
large Computing problems
and so super Computing centers and Ai
supercomputers and the world's cloud
hyperscale cloud are very different in
nature
the ability for ethernet to interconnect
components of almost from anywhere is
the reason why the world's internet was
created if it required too much
coordination how could we have built
today's internet so ethernet's profound
contribution it's this lossy capability
is resilient capability and because so
it basically can connect almost anything
together
however a super Computing data center
can't afford that you can't interconnect
random things together because that
billion dollar supercomputer the
difference between 95 percent
networking throughput achieved versus 50
is effectively 500 million dollars
now it's really really important to
realize that in a high performance
Computing application every single GPU
must finish their job so that the
application can move on
in many cases where you do all
reductions you have to wait until the
results of every single one so if one
node takes too long everybody gets held
back
the question is how do we introduce
a new type of ethernet that's of course
backwards compatible with everything but
it's engineered in a way that achieves
the type of capabilities that we that we
can bring AI workloads to the world's
any data center first
adaptive routing adaptive routing
basically says based on the traffic that
is going through your data center
depending on which one of the ports of
that switch is over congested it will
tell Bluefield 3 to send and will send
it to another Port Bluefield 3 on the
other end would reassemble it and
present the data to the GPU without any
CPU intervention second congestion
control congestion control it is
possible for a certain different ports
to become heavily congested in which
case each switch will see how the
network is performing and communicate to
the senders please don't send any more
data right away
because you're congesting the network
that congestion control requires
basically a overriding system which
includes software the switch working
with all of the endpoints to overall
manage the congestion or the traffic and
the throughput of the data center this
capability is going to increase
ethernet's overall performance
dramatically
now one of the things that very few
people realize
is that today there's only one software
stack that is Enterprise secure and
Enterprise grade
that software stack is CPU
and the reason for that is because in
order to be Enterprise grade it has to
be Enterprise secure and has to be
Enterprise managed and Enterprise
supported over 4 000 software packages
is what it takes for people to use
accelerated Computing today in data
processing and training and optimization
all the way to inference so for the very
first time we are taking all of that
software
and we're going to maintain it and
manage it like red hat does for Linux
Nvidia AI Enterprise will do it for all
of nvidia's libraries now Enterprise can
finally have an Enterprise grade and
Enterprise secure software stack this is
such a big deal otherwise
even though the promise of accelerated
Computing is possible for many
researchers and scientists is not
available for Enterprise companies and
so let's take a look at the benefit for
them this is a simple image processing
application if you were to do it on a
CPU versus on a GPU running on
Enterprise Nvidia AI Enterprise you're
getting
31.8 images per minute or basically 24
times the throughput or you only pay
five percent of the cost
this is really quite amazing this is the
benefit of accelerated Computing in the
cloud but for many companies Enterprises
is simply not possible unless you have
this stack
Nvidia AI Enterprise is now fully
integrated into AWS Google cloud and
Microsoft Azure or an oracle Cloud it is
also integrated into the world's machine
learning operations pipeline as I
mentioned before AI is a different type
of workload and this type of new type of
software this new type of software has a
whole new software industry and this
software industry 100 of them we have
now connected with Nvidia Enterprise
I told you several things I told you
that we are going through two
simultaneous Computing industry
transition accelerated Computing and
generative AI
two
this form of computing is not like the
traditional general purpose Computing it
is full stack
it is Data Center scale because the data
center is the computer and it is domain
specific for every domain that you want
to go into every industry you go into
you need to have the software stack and
if you have the software stack then the
utility the utilization of your machine
the utilization of your computer will be
high
so number two
it is full stack data scanner scale and
domain specific we are in full
production of the engine of generative
Ai and that is hgx h100 meanwhile
this engine that's going to be used for
AI factories will be scaled out using
Grace Hopper the engine that we created
for the era of generative AI we also
took Grace Hopper connected to 256 node
nvlink and created the largest GPU in
the world dgx
gh200
we're trying to extend generative Ai and
accelerated Computing in several
different directions at the same time
number one we would like to of course
extend it in the cloud
so that every cloud data center can be
an AI data center not just AI factories
and hyperscale but every hyperscale data
center can now be a generative AI Data
Center and the way we do that is the
Spectrum X it takes four components to
make Spectrum X possible the switch
the Bluefield 3 Nick the interconnects
themselves the cables are so important
in high speed high-speed Communications
and the software stack that goes on top
of it we would like to extend generative
AI to the world's Enterprise and there
are so many different configurations of
servers and the way we're doing that
with partnership with our Taiwanese
ecosystem the mgx modular accelerated
Computing systems we put Nvidia into
Cloud so that every Enterprise in the
world can engage us to create generative
AI models and deploy it in a Enterprise
grade Enterprise secure way in every
single Cloud I want to thank all of you
for your partnership over the years
thank you
[Applause]
5.0 / 5 (0 votes)