NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)
Summary
TLDRHuang discusses the computing industry reaching a tipping point with accelerated computing and AI, enabled by NVIDIA's new Grace Hopper and H100 chips. He explains how NVIDIA's full stack of hardware and software will power the next generation of AI across industries and use cases, from cloud to enterprise. Key announcements include connecting 256 Grace Hopper chips into an exascale AI supercomputer, new modular server designs optimized for AI, and an enterprise-grade software stack to make AI more accessible.
Takeaways
- ๐ฒ Nvidia has reached a tipping point in accelerated computing and generative AI.
- ๐ฉโ๐ป Software is now programmed by engineers working with AI supercomputers.
- ๐ป H100 is a new AI supercomputer touching every industry.
- ๐ Accelerated computing is reinventing software from the ground up.
- ๐ Nvidia AI is an AI operating system for end-to-end deep learning.
- ๐ค Grace Hopper, the new AI superchip, has nearly 200 billion transistors.
- ๐ The Hopper superchip scales to 256 nodes for 1 exaflop AI power.
- ๐ Spectrum-X extends accelerated computing and AI to data centers.
- ๐ผ Nvidia AI Enterprise brings secure enterprise-grade AI stack.
- ๐ค Nvidia partners to enable modular accelerated computing systems.
Q & A
What marks the tipping point in computing according to the transcript?
-The tipping point in computing is marked by the accelerated computing and generative AI, highlighting a significant shift in how software is developed and the capabilities of computing systems.
What is the significance of the H100 mentioned in the transcript?
-The H100, mentioned as being in full volume production, is significant because it represents a leap in computing technology with 35,000 components and eight Hopper GPUs, aimed at impacting every industry due to its advanced capabilities.
Why is the H100 described as the world's single most expensive computer?
-The H100 is described as the world's single most expensive computer, priced at $200,000, because it replaces an entire room of computers with its advanced capabilities, making it a cost-effective solution despite its high price.
What are the two fundamental transitions happening in the computer industry as described?
-The two fundamental transitions in the computer industry are the end of CPU scaling, which limits performance improvements from traditional methods, and the discovery of a new way of doing software through deep learning, driving today's computing.
How does accelerated computing transform the processing of large language models?
-Accelerated computing transforms the processing of large language models by significantly reducing the resources needed, from 11 gigawatt hours and nearly a thousand CPU servers to 3.2 gigawatt hours and 48 GPU servers, increasing efficiency and performance.
What is the role of NVIDIA AI in the context of the transcript?
-NVIDIA AI is described as the only AI operating system in the world that spans from data processing to training, optimization, and deployment, underpinning the development and application of AI technologies across various industries.
How does the Grace Hopper AI supercomputer differ from traditional computing systems?
-The Grace Hopper AI supercomputer differs by integrating accelerated computing processors with large, coherent memory spaces, enabling efficient handling of very large datasets and reducing unnecessary data copying, signifying a major advancement in AI-driven computing.
What is the envisioned future role of AI factories according to the transcript?
-AI factories are envisioned as a fundamental part of major companies, where they will build and produce their company's intelligence through accelerated computing and artificial intelligence, marking a shift towards widespread artificial intelligence production.
What does the comparison of Moore's Law with the advancements in computer graphics and AI imply?
-The comparison implies that the advancements in computer graphics and AI, accelerated by a factor of a thousand times in five years, vastly outpace the progress predicted by Moore's Law, indicating a revolutionary pace of technological improvement in these areas.
How does the concept of the data center as the computer change the approach to building computing infrastructure?
-The concept of the data center as the computer changes the approach by emphasizing the importance of building cost-effective, highly efficient data centers over individual servers, focusing on the collective power and efficiency of the data center infrastructure for computational tasks.
Outlines
๐ฎ Introducing the new h100 AI supercomputer
The paragraph introduces the h100, a new $200K AI supercomputer that replaces an entire room of computers. It has 35K components, 8 GPUs, and 60-65 lbs system boards that require robots for assembly and insertion. The more you buy, the more you save on this world's most expensive computer.
๐ Reaching the tipping point of accelerated computing
The paragraph explains how accelerated computing with GPUs has reached a tipping point after decades of development across scientific domains, industries, and applications. Combined with the end of CPU scaling and emergence of deep learning, accelerated computing and generative AI represent fundamental industry transitions.
๐ก AI supercomputers program software and touch every industry
The paragraph discusses how in the new computer industry, software is programmed by engineers working with AI supercomputers. These AI factories produce intelligence and will exist at every major company. The low programming barrier enables anyone to be a programmer. AI will improve all applications, succeeding without needing new apps, but also enabling new apps.
๐ฉโ๐ป Introducing Grace Hopper, the world's first accelerated processor
The paragraph introduces Grace Hopper, the world's first accelerated processor for AI with integrated GPUs and giant 600GB coherent memory. Eight Grace Hopper pods connected can achieve one AI exaflops for transformer engines and 144TB of shared memory.
๐ Extending AI with accelerated computing networks
The paragraph contrasts hyperscale vs supercomputing data centers and explains how Ethernet connectivity needs to be reengineered for adaptive routing and congestion control to support tightly coupled AI workloads without slowing down collective communications.
๐ข Nvidia AI Enterprise enables accelerated computing for business
The paragraph introduces Nvidia AI Enterprise which makes accelerated computing with GPUs enterprise-grade and secure for the first time. Integrated with major cloud platforms, it allows businesses to leverage AI and accelerate applications by 24x at 5% of the cost.
Mindmap
Keywords
๐กAccelerated Computing
๐กGenerative AI
๐กH100
๐กTensor Processing
๐กData Center as Computer
๐กGrace Hopper
๐กExaflops
๐กAI Factories
๐กNvidia AI
๐กDigital Divide
Highlights
CPU scaling has ended, ending the ability to get 10x more performance every 5 years at the same cost
Deep learning and accelerated computing came together, driving AI progress today
GPUs are optimized for tensor processing, enabling algorithms for data processing, training, optimization and deployment
Connected GPUs with NvLink to build one giant GPU, then connected GPUs using InfiniBand into larger scale computers
Software is no longer programmed just by engineers, it's co-created by engineers and AI supercomputers
AI supercomputers are a new type of factory that produce a company's intelligence
Accelerated 1,000x in 5 years versus Moore's Law at 2x; aiming for 1 million x in 10 years
This era understands multi-modality, has low programming barriers, upgrades old apps, and progresses rapidly
Announcing Grace Hopper, the world's first accelerated computing AI processor with almost 600GB of coherent memory
Connecting 256 Grace Hopper chips into one AI supercomputer delivers 1 exaflop of processing
Announcing NVIDIA MGX, an open, modular accelerated computing server architecture
Introducing new Ethernet with adaptive routing and congestion control for high performance computing
NVIDIA AI Enterprise makes accelerated computing enterprise-grade secure and supported
This era is accelerated computing, generative AI, full stack, data center scale, and domain specific
In production with H100, scaling with Grace Hopper, aiming to extend generative AI everywhere
Transcripts
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers we have now
reached the Tipping Point of accelerated
Computing we have now reached the
Tipping Point of generative Ai and we
are so so so excited to be in full
volume production of the h100 this is
going to touch literally every single
industry let's take a look at how h100
is produced
[Music]
okay
[Music]
35
000 components on that system board
eight
Hopper gpus
let me show it to you
all right this
I would I would lift this but I I um I
still have the rest of the keynote I
would like to give this is 60 pounds 65
pounds it takes robots to lift it of
course and it takes robots to insert it
because the insertion pressure is so
high and has to be so perfect
this computer is two hundred thousand
dollars and as you know it replaces an
entire room of other computers it's the
world's single most expensive computer
that you can say the more you buy the
more you save
this is what a compute trade looks like
even this is incredibly heavy
see that
so this is the brand new h100 with the
world's first computer that has a
Transformer engine in it
the performance is utterly incredible
there are two fundamental transitions
happening in the computer industry today
all of you are deep within it and you
feel it there are two fundamental Trends
the first trend is because CPU scaling
has ended the ability to get 10 times
more performance every five years has
ended the ability to get 10 times more
performance every five years at the same
cost is the reason why computers are so
fast today
that trend has ended it happened at
exactly the time when a new way of doing
software was discovered deep learning
these two events came together and is
driving Computing today
accelerated Computing and generative AI
of doing software just a way of doing
computation is a reinvention from the
ground up and it's not easy accelerated
Computing has taken us nearly three
decades to accomplish
well this is how accelerated Computing
works
this is accelerated Computing used for
large language models basically the core
of generative AI this example is a 10
million dollar server and so 10 million
dollars gets you nearly a thousand CPU
servers and to train to process this
large language model takes 11 gigawatt
hours 11 gigawatt hours okay and this is
what happens when you accelerate this
workload with accelerated Computing and
so with 10 million dollars for a 10
million dollar server you buy 48 GPU
servers it's the reason why people say
that GPU servers are so expensive
remember people say GPS servers are so
expensive however the GPU server is no
longer the computer the computer is the
data center
your goal is to build the most cost
effective data center not build the most
cost effective server
back in the old days when the computer
was the server that would be a
reasonable thing to do but today the
computer is the data center so for 10
million dollars you buy 48 GPU servers
it only consumes 3.2 gigawatt hours and
44 times the performance
let me just show it to you one more time
this is before and this is after and
this is
we want dense computers not big ones we
want dense computers fast computers not
big ones let me show you something else
this is my favorite
if your goal if your goal is to get the
work done
and this is the work you want to get
done ISO work
okay this is ISO work all right look at
this
look at this look at this before
after you've heard me talk about this
for so many years
in fact every single time you saw me
I've been talking to you about
accelerated computing
and now
why is it that finally it's the Tipping
Point because we have now addressed so
many different domains of science so
many Industries and in data processing
in deep learning classical machine
learning
so many different ways for us to deploy
software from the cloud to Enterprise to
Super Computing to the edge
so many different configurations of gpus
from our hgx versions to our Omniverse
versions to our Cloud GPU and Graphics
version so many different versions now
the utilization is incredibly High
the utilization of Nvidia GPU is so high
almost every single cloud is
overextended almost every single data
center is overextended there are so many
different applications using it so we
have now reached the Tipping Point of
accelerated Computing we have now
reached the Tipping Point of generative
AI
people thought that gpus would just be
gpus they were completely wrong we
dedicated ourselves to Reinventing the
GPU so that it's incredibly good at
tensor processing and then all of the
algorithms and engines that sit on top
of these computers we call Nvidia AI the
only AI operating system in the world
that takes data processing from data
processing to training to optimization
to deployment and inference
end to end deep learning processing it
is the engine of AI today
we connected gpus to other gpus called
mvlink build one giant GPU and we
connected those gpus together using
infiniband into larger scale computers
the ability for us to drive the
processor and extend the scale of
computing
made it possible
for the AI research organization the
community to advance AI at an incredible
rate
so every two years we take giant leaps
forward and I'm expecting the next lead
to be giant as well
this is the new computer industry
software is no longer programmed just by
computer Engineers software is
programmed by computer Engineers working
with AI supercomputers these AI
supercomputers
are a new type of factory
it is very logical that a car industry
has factories they build things so you
can see cars
it is very logical that computer
industry has computer factories you
build things that you can see computers
in the future
every single major company will also
have ai factories
and you will build and produce your
company's intelligence
and it's a very sensible thing
we are intelligence producers already
it's just that the intelligence
producers the intelligence are people in
the future we will be intelligence
producers artificial intelligence
producers and every single company will
have factories and the factories will be
built this way
using accelerated Computing and
artificial intelligence we accelerated
computer Graphics by 1 000 times in five
years
Moore's Law is probably currently
running at about two times
a thousand times in five years a
thousand times in five years is one
million times in ten we're doing the
same thing in artificial intelligence
now question is what can you do when
your computer is one million times
faster
what would you do if your computer was
one million times faster well it turns
out that we can now apply the instrument
of our industry to so many different
fields that were impossible before
this is the reason why everybody is so
excited
there's no question that we're in a new
Computing era
there's just absolutely no question
about it every single Computing era you
could do different things that weren't
possible before and artificial
intelligence certainly qualifies this
particular Computing era is special in
several ways one
it is able to understand information of
more than just text and numbers it can
Now understand multi-modality which is
the reason why this Computing Revolution
can impact every industry
every industry two
because this computer
doesn't care how you program it
it will try to understand what you mean
because it has this incredible large
language model capability and so the
programming barrier is incredibly low we
have closed the digital divide
everyone is a programmer now you just
have to say something to the computer
third
this computer
not only is it able to do amazing things
for the for the future
it can do amazing things for every
single application of the previous era
which is the reason why all of these
apis are being connected into Windows
applications here and there in browsers
and PowerPoint and word every
application that exists will be better
because of AI
you don't have to just AI this
generation this Computing era does not
need
new applications it can succeed with old
applications and it's going to have new
applications
the rate of progress the rate of
progress because it's so easy to use
is the reason why it's growing so fast
this is going to touch literally every
single industry and at the core with
just as with every single Computing era
it needs a new Computing approach
the last several years I've been talking
to you about the new type of processor
we've been creating
and this is the reason we've been
creating it
ladies and gentlemen
Grace Hopper is now in full production
this is Grace Hopper
nearly 200 billion transistors in this
computer oh
foreign
look at this this is Grace Hopper
this this processor
this processor is really quite amazing
there are several characteristics about
it this is the world's first accelerated
processor
accelerated Computing processor that
also has a giant memory it has almost
600 gigabytes of memory that's coherent
between the CPU and the GPU and so the
GPU can reference the memory the CPU can
represent reference the memory and
unnecessary any unnecessary copying back
and forth could be avoided
the amazing amount of high-speed memory
lets the GPU work on very very large
data sets this is a computer this is not
a chip practically the Entire Computer
is on here all of the Lo this is uh uses
low power DDR memory just like your cell
phone except this has been optimized and
designed for high resilience data center
applications so let me show you what
we're going to do so the first thing is
of course we have the Grace Hopper
Superchip
put that into a computer the second
thing that we're going to do is we're
going to connect eight of these together
using ndlink this is an Envy link switch
so eight of this eight of this Connect
into three switch trays into eight eight
Grace Hopper pod
these eight Grace Hopper pods each one
of the grace Hoppers are connected to
the other Grace Hopper at 900 gigabytes
per second
Aid them connected together
as a pod and then we connect 32 of them
together
with another layer of switches
and in order to build in order to build
this
256 Grace Hopper Super Chips connected
into one exoflops one exaflops you know
that countries and Nations have been
working on exaflops Computing and just
recently achieved it
256 Grace Hoppers for deep learning is
one exaflop Transformer engine and it
gives us
144 terabytes of memory that every GPU
can see
this is not 144 terabytes distributed
this is 144 terabytes connected
why don't we take a look at what it
really looks like play please
foreign
[Applause]
this
is
150 miles of cables
fiber optic cables
2 000 fans
70
000 cubic feet per minute
it probably
recycles the air in this entire room in
a couple of minutes
forty thousand pounds
four elephants
one GPU
if I can get up on here this is actual
size
so this is this is our brand new
Grace Hopper AI supercomputer it is one
giant GPU
utterly incredible we're building it now
and we're so we're so excited that
Google Cloud meta and Microsoft will be
the first companies in the world to have
access
and they will be doing
exploratory research on the pioneering
front the boundaries of artificial
intelligence with us so this is the dgx
gh200 it is one giant GPU
okay I just talked about how we are
going to extend the frontier of AI
data centers all over the world and all
of them over the next decade will be
recycled
and re-engineered into accelerated data
centers and generative AI capable data
centers but there are so many different
applications in so many different areas
scientific computing
data processing cloud and video and
Graphics generative AI for Enterprise
and of course the edge each one of these
applications have different
configurations of servers
different focus of applications
different deployment methods and so
security is different operating system
is different how it's managed it's
different
well this is just an enormous number of
configurations and so today we're
announcing in partnership with so many
companies here in Taiwan the Nvidia mgx
it's an open modular server design
specification and the design for
Accelerated Computing most of the
servers today are designed for general
purpose Computing the mechanical thermal
and electrical is insufficient for a
very highly dense Computing system
accelerated computers take as you know
many servers and compress it into one
you save a lot of money you save a lot
of floor space but the architecture is
different and we designed it so that
it's multi-generation standardized so
that once you make an investment our
next generation gpus and Next Generation
CPUs and next generation dpus will
continue to easily configure into it so
that we can best time to Market and best
preservation of our investment different
data centers have different requirements
and we've made this modular and flexible
so that it could address all of these
different domains now this is the basic
chassis let's take a look at some of the
other things you can do with it this is
the Omniverse ovx server
it has x86 four l40s Bluefield three two
CX-7 six PCI Express Lots this is the
grace Omniverse server
Grace same for l40s BF3 Bluefield 3 and
2 cx-7s okay this is the grace Cloud
Graphics server
this is the hopper NV link generative AI
inference server
and of course Grace Hopper liquid cooled
okay for very dense servers and then
this one is our dense general purpose
Grace Superchip server this is just CPU
and has the ability to accommodate four
CPU four gray CPUs or two gray
Superchips enormous amounts of
performance in ISO performance Grace
only consumes 580 Watts for the whole
for the whole server versus the latest
generation CPU servers x86 servers 1090
Watts it's basically half the power at
the same performance or another way of
saying
you know at the same power if your data
center is power constrained you get
twice the performance most data centers
today are power limited and so this is
really a terrific capability
we're going to expand AI into a new
territory
if you look at the world's data centers
the data center is now the computer and
the network defines what that data
center does largely there are two types
of data centers today there's the data
center that's used for hyperscale where
you have application workloads of all
different kinds the number of CPUs you
the number of gpus you connect to it is
relatively low the number of tenants is
very high the workloads are Loosely
coupled
and you have another type of data center
they're like super Computing data
centers AI supercomputers where the
workloads are tightly coupled
the number of tenants far fewer and
sometimes just one
its purpose is high throughput on very
large Computing problems
and so super Computing centers and Ai
supercomputers and the world's cloud
hyperscale cloud are very different in
nature
the ability for ethernet to interconnect
components of almost from anywhere is
the reason why the world's internet was
created if it required too much
coordination how could we have built
today's internet so ethernet's profound
contribution it's this lossy capability
is resilient capability and because so
it basically can connect almost anything
together
however a super Computing data center
can't afford that you can't interconnect
random things together because that
billion dollar supercomputer the
difference between 95 percent
networking throughput achieved versus 50
is effectively 500 million dollars
now it's really really important to
realize that in a high performance
Computing application every single GPU
must finish their job so that the
application can move on
in many cases where you do all
reductions you have to wait until the
results of every single one so if one
node takes too long everybody gets held
back
the question is how do we introduce
a new type of ethernet that's of course
backwards compatible with everything but
it's engineered in a way that achieves
the type of capabilities that we that we
can bring AI workloads to the world's
any data center first
adaptive routing adaptive routing
basically says based on the traffic that
is going through your data center
depending on which one of the ports of
that switch is over congested it will
tell Bluefield 3 to send and will send
it to another Port Bluefield 3 on the
other end would reassemble it and
present the data to the GPU without any
CPU intervention second congestion
control congestion control it is
possible for a certain different ports
to become heavily congested in which
case each switch will see how the
network is performing and communicate to
the senders please don't send any more
data right away
because you're congesting the network
that congestion control requires
basically a overriding system which
includes software the switch working
with all of the endpoints to overall
manage the congestion or the traffic and
the throughput of the data center this
capability is going to increase
ethernet's overall performance
dramatically
now one of the things that very few
people realize
is that today there's only one software
stack that is Enterprise secure and
Enterprise grade
that software stack is CPU
and the reason for that is because in
order to be Enterprise grade it has to
be Enterprise secure and has to be
Enterprise managed and Enterprise
supported over 4 000 software packages
is what it takes for people to use
accelerated Computing today in data
processing and training and optimization
all the way to inference so for the very
first time we are taking all of that
software
and we're going to maintain it and
manage it like red hat does for Linux
Nvidia AI Enterprise will do it for all
of nvidia's libraries now Enterprise can
finally have an Enterprise grade and
Enterprise secure software stack this is
such a big deal otherwise
even though the promise of accelerated
Computing is possible for many
researchers and scientists is not
available for Enterprise companies and
so let's take a look at the benefit for
them this is a simple image processing
application if you were to do it on a
CPU versus on a GPU running on
Enterprise Nvidia AI Enterprise you're
getting
31.8 images per minute or basically 24
times the throughput or you only pay
five percent of the cost
this is really quite amazing this is the
benefit of accelerated Computing in the
cloud but for many companies Enterprises
is simply not possible unless you have
this stack
Nvidia AI Enterprise is now fully
integrated into AWS Google cloud and
Microsoft Azure or an oracle Cloud it is
also integrated into the world's machine
learning operations pipeline as I
mentioned before AI is a different type
of workload and this type of new type of
software this new type of software has a
whole new software industry and this
software industry 100 of them we have
now connected with Nvidia Enterprise
I told you several things I told you
that we are going through two
simultaneous Computing industry
transition accelerated Computing and
generative AI
two
this form of computing is not like the
traditional general purpose Computing it
is full stack
it is Data Center scale because the data
center is the computer and it is domain
specific for every domain that you want
to go into every industry you go into
you need to have the software stack and
if you have the software stack then the
utility the utilization of your machine
the utilization of your computer will be
high
so number two
it is full stack data scanner scale and
domain specific we are in full
production of the engine of generative
Ai and that is hgx h100 meanwhile
this engine that's going to be used for
AI factories will be scaled out using
Grace Hopper the engine that we created
for the era of generative AI we also
took Grace Hopper connected to 256 node
nvlink and created the largest GPU in
the world dgx
gh200
we're trying to extend generative Ai and
accelerated Computing in several
different directions at the same time
number one we would like to of course
extend it in the cloud
so that every cloud data center can be
an AI data center not just AI factories
and hyperscale but every hyperscale data
center can now be a generative AI Data
Center and the way we do that is the
Spectrum X it takes four components to
make Spectrum X possible the switch
the Bluefield 3 Nick the interconnects
themselves the cables are so important
in high speed high-speed Communications
and the software stack that goes on top
of it we would like to extend generative
AI to the world's Enterprise and there
are so many different configurations of
servers and the way we're doing that
with partnership with our Taiwanese
ecosystem the mgx modular accelerated
Computing systems we put Nvidia into
Cloud so that every Enterprise in the
world can engage us to create generative
AI models and deploy it in a Enterprise
grade Enterprise secure way in every
single Cloud I want to thank all of you
for your partnership over the years
thank you
[Applause]
5.0 / 5 (0 votes)