NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)

Ticker Symbol: YOU
11 Jun 202326:07

Summary

TLDRHuang discusses the computing industry reaching a tipping point with accelerated computing and AI, enabled by NVIDIA's new Grace Hopper and H100 chips. He explains how NVIDIA's full stack of hardware and software will power the next generation of AI across industries and use cases, from cloud to enterprise. Key announcements include connecting 256 Grace Hopper chips into an exascale AI supercomputer, new modular server designs optimized for AI, and an enterprise-grade software stack to make AI more accessible.

Takeaways

  • 😲 Nvidia has reached a tipping point in accelerated computing and generative AI.
  • 👩‍💻 Software is now programmed by engineers working with AI supercomputers.
  • 💻 H100 is a new AI supercomputer touching every industry.
  • 🔋 Accelerated computing is reinventing software from the ground up.
  • 🚀 Nvidia AI is an AI operating system for end-to-end deep learning.
  • 🤖 Grace Hopper, the new AI superchip, has nearly 200 billion transistors.
  • 📈 The Hopper superchip scales to 256 nodes for 1 exaflop AI power.
  • 🌐 Spectrum-X extends accelerated computing and AI to data centers.
  • 💼 Nvidia AI Enterprise brings secure enterprise-grade AI stack.
  • 🤝 Nvidia partners to enable modular accelerated computing systems.

Q & A

  • What marks the tipping point in computing according to the transcript?

    -The tipping point in computing is marked by the accelerated computing and generative AI, highlighting a significant shift in how software is developed and the capabilities of computing systems.

  • What is the significance of the H100 mentioned in the transcript?

    -The H100, mentioned as being in full volume production, is significant because it represents a leap in computing technology with 35,000 components and eight Hopper GPUs, aimed at impacting every industry due to its advanced capabilities.

  • Why is the H100 described as the world's single most expensive computer?

    -The H100 is described as the world's single most expensive computer, priced at $200,000, because it replaces an entire room of computers with its advanced capabilities, making it a cost-effective solution despite its high price.

  • What are the two fundamental transitions happening in the computer industry as described?

    -The two fundamental transitions in the computer industry are the end of CPU scaling, which limits performance improvements from traditional methods, and the discovery of a new way of doing software through deep learning, driving today's computing.

  • How does accelerated computing transform the processing of large language models?

    -Accelerated computing transforms the processing of large language models by significantly reducing the resources needed, from 11 gigawatt hours and nearly a thousand CPU servers to 3.2 gigawatt hours and 48 GPU servers, increasing efficiency and performance.

  • What is the role of NVIDIA AI in the context of the transcript?

    -NVIDIA AI is described as the only AI operating system in the world that spans from data processing to training, optimization, and deployment, underpinning the development and application of AI technologies across various industries.

  • How does the Grace Hopper AI supercomputer differ from traditional computing systems?

    -The Grace Hopper AI supercomputer differs by integrating accelerated computing processors with large, coherent memory spaces, enabling efficient handling of very large datasets and reducing unnecessary data copying, signifying a major advancement in AI-driven computing.

  • What is the envisioned future role of AI factories according to the transcript?

    -AI factories are envisioned as a fundamental part of major companies, where they will build and produce their company's intelligence through accelerated computing and artificial intelligence, marking a shift towards widespread artificial intelligence production.

  • What does the comparison of Moore's Law with the advancements in computer graphics and AI imply?

    -The comparison implies that the advancements in computer graphics and AI, accelerated by a factor of a thousand times in five years, vastly outpace the progress predicted by Moore's Law, indicating a revolutionary pace of technological improvement in these areas.

  • How does the concept of the data center as the computer change the approach to building computing infrastructure?

    -The concept of the data center as the computer changes the approach by emphasizing the importance of building cost-effective, highly efficient data centers over individual servers, focusing on the collective power and efficiency of the data center infrastructure for computational tasks.

Outlines

00:00

😮 Introducing the new h100 AI supercomputer

The paragraph introduces the h100, a new $200K AI supercomputer that replaces an entire room of computers. It has 35K components, 8 GPUs, and 60-65 lbs system boards that require robots for assembly and insertion. The more you buy, the more you save on this world's most expensive computer.

05:01

🚀 Reaching the tipping point of accelerated computing

The paragraph explains how accelerated computing with GPUs has reached a tipping point after decades of development across scientific domains, industries, and applications. Combined with the end of CPU scaling and emergence of deep learning, accelerated computing and generative AI represent fundamental industry transitions.

10:03

💡 AI supercomputers program software and touch every industry

The paragraph discusses how in the new computer industry, software is programmed by engineers working with AI supercomputers. These AI factories produce intelligence and will exist at every major company. The low programming barrier enables anyone to be a programmer. AI will improve all applications, succeeding without needing new apps, but also enabling new apps.

15:04

👩‍💻 Introducing Grace Hopper, the world's first accelerated processor

The paragraph introduces Grace Hopper, the world's first accelerated processor for AI with integrated GPUs and giant 600GB coherent memory. Eight Grace Hopper pods connected can achieve one AI exaflops for transformer engines and 144TB of shared memory.

20:08

🌐 Extending AI with accelerated computing networks

The paragraph contrasts hyperscale vs supercomputing data centers and explains how Ethernet connectivity needs to be reengineered for adaptive routing and congestion control to support tightly coupled AI workloads without slowing down collective communications.

25:08

🏢 Nvidia AI Enterprise enables accelerated computing for business

The paragraph introduces Nvidia AI Enterprise which makes accelerated computing with GPUs enterprise-grade and secure for the first time. Integrated with major cloud platforms, it allows businesses to leverage AI and accelerate applications by 24x at 5% of the cost.

Mindmap

Keywords

💡Accelerated Computing

Accelerated computing refers to the use of specialized hardware to perform computational tasks more efficiently than traditional CPU-based systems. In the video, accelerated computing is highlighted as a fundamental shift in computing, driven by the end of CPU scaling and the emergence of deep learning. This concept is crucial as it underpins the transition to using GPUs and other accelerators to achieve significant performance gains, particularly in AI and large-scale data processing. The speaker discusses how accelerated computing has evolved over three decades and its impact on enabling generative AI by drastically reducing computational time and energy consumption.

💡Generative AI

Generative AI involves algorithms that can generate new data instances (like images, text, or sounds) that resemble the training data. The video emphasizes the tipping point reached in generative AI, marking a new era where AI can produce highly accurate and diverse outputs. Generative AI's role in the video is tied to its applications in various industries and the development of AI supercomputers, showcasing its transformative potential across fields by leveraging large language models and other AI techniques.

💡H100

The H100, as mentioned in the video, is a cutting-edge GPU designed for accelerated computing and AI applications. It symbolizes a leap in computing power, capable of replacing rooms of computers and significantly impacting every industry. The discussion around the H100 includes its production process, the impressive number of components on its system board, and its role in powering AI supercomputers, illustrating the advancements in hardware that are driving the AI revolution.

💡Tensor Processing

Tensor processing is essential for performing complex calculations on multi-dimensional data arrays, which are common in machine learning and AI. The video highlights Nvidia's dedication to reinventing GPUs to excel at tensor processing, enabling more efficient AI computations. This focus on tensor processing is key to understanding the improvements in AI model training and inference, making AI applications more accessible and powerful.

💡Data Center as Computer

The concept of 'the data center as the computer' reflects a shift in computing architecture where the entire data center functions as a single, massive computing resource. This idea is central to the video's theme, showcasing how advances in accelerated computing and AI have transformed data centers from collections of individual servers to integrated computing powerhouses. This paradigm shift enables unprecedented computational capabilities, essential for processing the vast amounts of data generated by modern AI applications.

💡Grace Hopper

Grace Hopper refers to a superchip combining CPU and GPU architectures for high-performance computing tasks. In the video, it's described as a milestone in accelerated processor design, with a focus on enabling AI applications. The Grace Hopper superchip exemplifies the integration of different computing paradigms to achieve significant performance gains, illustrating the continuous innovation in hardware necessary to support the growing demands of AI and deep learning.

💡Exaflops

An exaflop is a measure of computing performance, equivalent to a quintillion (10^18) floating-point operations per second. The video mentions achieving exaflop computing power through the assembly of Grace Hopper Super Chips, marking a significant milestone in computational capabilities. This level of performance enables breakthroughs in AI, scientific research, and complex simulations, highlighting the rapid advancement and ambition of modern computing projects.

💡AI Factories

AI factories, as described in the video, are specialized facilities or computational frameworks designed to produce AI models and intelligence for companies. This concept represents a future where companies leverage their own AI capabilities to generate proprietary knowledge and solutions. The video ties this idea to the broader theme of the industrialization of AI, where accelerated computing and AI technologies enable the mass production of AI-driven insights and innovations.

💡Nvidia AI

Nvidia AI is referred to as the only AI operating system that encompasses the entire workflow of AI applications, from data processing and training to optimization and deployment. The video underscores Nvidia's role in providing a comprehensive ecosystem for AI development, highlighting how Nvidia AI facilitates the efficient and scalable use of accelerated computing resources for AI applications across industries.

💡Digital Divide

The digital divide traditionally refers to the gap between those who have access to modern information and communication technology and those who do not. In the context of the video, closing the digital divide involves making programming and AI technologies accessible to a broader audience, empowering more people to leverage AI without needing extensive technical expertise. The speaker emphasizes how advancements in AI and computing are democratizing access to technology, enabling more individuals and organizations to participate in the AI revolution.

Highlights

CPU scaling has ended, ending the ability to get 10x more performance every 5 years at the same cost

Deep learning and accelerated computing came together, driving AI progress today

GPUs are optimized for tensor processing, enabling algorithms for data processing, training, optimization and deployment

Connected GPUs with NvLink to build one giant GPU, then connected GPUs using InfiniBand into larger scale computers

Software is no longer programmed just by engineers, it's co-created by engineers and AI supercomputers

AI supercomputers are a new type of factory that produce a company's intelligence

Accelerated 1,000x in 5 years versus Moore's Law at 2x; aiming for 1 million x in 10 years

This era understands multi-modality, has low programming barriers, upgrades old apps, and progresses rapidly

Announcing Grace Hopper, the world's first accelerated computing AI processor with almost 600GB of coherent memory

Connecting 256 Grace Hopper chips into one AI supercomputer delivers 1 exaflop of processing

Announcing NVIDIA MGX, an open, modular accelerated computing server architecture

Introducing new Ethernet with adaptive routing and congestion control for high performance computing

NVIDIA AI Enterprise makes accelerated computing enterprise-grade secure and supported

This era is accelerated computing, generative AI, full stack, data center scale, and domain specific

In production with H100, scaling with Grace Hopper, aiming to extend generative AI everywhere

Transcripts

00:00

this is the new computer industry

00:03

software is no longer programmed just by

00:05

computer Engineers software is

00:07

programmed by computer Engineers working

00:09

with AI supercomputers we have now

00:12

reached the Tipping Point of accelerated

00:15

Computing we have now reached the

00:17

Tipping Point of generative Ai and we

00:20

are so so so excited to be in full

00:23

volume production of the h100 this is

00:27

going to touch literally every single

00:28

industry let's take a look at how h100

00:32

is produced

00:36

[Music]

01:13

okay

01:18

[Music]

01:28

35

01:29

000 components on that system board

01:32

eight

01:33

Hopper gpus

01:36

let me show it to you

01:43

all right this

01:45

I would I would lift this but I I um I

01:48

still have the rest of the keynote I

01:49

would like to give this is 60 pounds 65

01:53

pounds it takes robots to lift it of

01:55

course and it takes robots to insert it

01:57

because the insertion pressure is so

01:58

high and has to be so perfect

02:00

this computer is two hundred thousand

02:02

dollars and as you know it replaces an

02:05

entire room of other computers it's the

02:07

world's single most expensive computer

02:10

that you can say the more you buy the

02:12

more you save

02:18

this is what a compute trade looks like

02:20

even this is incredibly heavy

02:24

see that

02:25

so this is the brand new h100 with the

02:29

world's first computer that has a

02:31

Transformer engine in it

02:33

the performance is utterly incredible

02:37

there are two fundamental transitions

02:39

happening in the computer industry today

02:41

all of you are deep within it and you

02:43

feel it there are two fundamental Trends

02:46

the first trend is because CPU scaling

02:49

has ended the ability to get 10 times

02:52

more performance every five years has

02:54

ended the ability to get 10 times more

02:56

performance every five years at the same

02:58

cost is the reason why computers are so

03:00

fast today

03:01

that trend has ended it happened at

03:04

exactly the time when a new way of doing

03:07

software was discovered deep learning

03:09

these two events came together and is

03:13

driving Computing today

03:16

accelerated Computing and generative AI

03:20

of doing software just a way of doing

03:21

computation is a reinvention from the

03:23

ground up and it's not easy accelerated

03:26

Computing has taken us nearly three

03:28

decades to accomplish

03:30

well this is how accelerated Computing

03:32

works

03:33

this is accelerated Computing used for

03:35

large language models basically the core

03:38

of generative AI this example is a 10

03:41

million dollar server and so 10 million

03:43

dollars gets you nearly a thousand CPU

03:45

servers and to train to process this

03:49

large language model takes 11 gigawatt

03:52

hours 11 gigawatt hours okay and this is

03:55

what happens when you accelerate this

03:58

workload with accelerated Computing and

04:00

so with 10 million dollars for a 10

04:02

million dollar server you buy 48 GPU

04:04

servers it's the reason why people say

04:06

that GPU servers are so expensive

04:10

remember people say GPS servers are so

04:13

expensive however the GPU server is no

04:17

longer the computer the computer is the

04:19

data center

04:20

your goal is to build the most cost

04:22

effective data center not build the most

04:24

cost effective server

04:26

back in the old days when the computer

04:28

was the server that would be a

04:30

reasonable thing to do but today the

04:32

computer is the data center so for 10

04:34

million dollars you buy 48 GPU servers

04:36

it only consumes 3.2 gigawatt hours and

04:41

44 times the performance

04:43

let me just show it to you one more time

04:46

this is before and this is after and

04:48

this is

04:53

we want dense computers not big ones we

04:57

want dense computers fast computers not

04:58

big ones let me show you something else

05:01

this is my favorite

05:03

if your goal if your goal is to get the

05:05

work done

05:07

and this is the work you want to get

05:08

done ISO work

05:10

okay this is ISO work all right look at

05:13

this

05:24

look at this look at this before

05:27

after you've heard me talk about this

05:29

for so many years

05:31

in fact every single time you saw me

05:33

I've been talking to you about

05:34

accelerated computing

05:35

and now

05:37

why is it that finally it's the Tipping

05:39

Point because we have now addressed so

05:42

many different domains of science so

05:44

many Industries and in data processing

05:47

in deep learning classical machine

05:49

learning

05:51

so many different ways for us to deploy

05:52

software from the cloud to Enterprise to

05:55

Super Computing to the edge

05:56

so many different configurations of gpus

05:58

from our hgx versions to our Omniverse

06:02

versions to our Cloud GPU and Graphics

06:04

version so many different versions now

06:06

the utilization is incredibly High

06:10

the utilization of Nvidia GPU is so high

06:13

almost every single cloud is

06:15

overextended almost every single data

06:18

center is overextended there are so many

06:19

different applications using it so we

06:22

have now reached the Tipping Point of

06:25

accelerated Computing we have now

06:27

reached the Tipping Point of generative

06:29

AI

06:30

people thought that gpus would just be

06:32

gpus they were completely wrong we

06:33

dedicated ourselves to Reinventing the

06:35

GPU so that it's incredibly good at

06:38

tensor processing and then all of the

06:40

algorithms and engines that sit on top

06:42

of these computers we call Nvidia AI the

06:45

only AI operating system in the world

06:47

that takes data processing from data

06:49

processing to training to optimization

06:52

to deployment and inference

06:54

end to end deep learning processing it

06:57

is the engine of AI today

06:59

we connected gpus to other gpus called

07:02

mvlink build one giant GPU and we

07:04

connected those gpus together using

07:07

infiniband into larger scale computers

07:09

the ability for us to drive the

07:11

processor and extend the scale of

07:14

computing

07:15

made it possible

07:17

for the AI research organization the

07:19

community to advance AI at an incredible

07:22

rate

07:23

so every two years we take giant leaps

07:26

forward and I'm expecting the next lead

07:28

to be giant as well

07:29

this is the new computer industry

07:33

software is no longer programmed just by

07:35

computer Engineers software is

07:37

programmed by computer Engineers working

07:39

with AI supercomputers these AI

07:42

supercomputers

07:43

are a new type of factory

07:46

it is very logical that a car industry

07:49

has factories they build things so you

07:50

can see cars

07:51

it is very logical that computer

07:54

industry has computer factories you

07:56

build things that you can see computers

07:59

in the future

08:01

every single major company will also

08:05

have ai factories

08:08

and you will build and produce your

08:11

company's intelligence

08:12

and it's a very sensible thing

08:15

we are intelligence producers already

08:18

it's just that the intelligence

08:19

producers the intelligence are people in

08:22

the future we will be intelligence

08:24

producers artificial intelligence

08:26

producers and every single company will

08:28

have factories and the factories will be

08:30

built this way

08:31

using accelerated Computing and

08:33

artificial intelligence we accelerated

08:34

computer Graphics by 1 000 times in five

08:37

years

08:38

Moore's Law is probably currently

08:40

running at about two times

08:42

a thousand times in five years a

08:45

thousand times in five years is one

08:47

million times in ten we're doing the

08:49

same thing in artificial intelligence

08:51

now question is what can you do when

08:53

your computer is one million times

08:55

faster

08:57

what would you do if your computer was

08:59

one million times faster well it turns

09:02

out that we can now apply the instrument

09:04

of our industry to so many different

09:07

fields that were impossible before

09:10

this is the reason why everybody is so

09:12

excited

09:14

there's no question that we're in a new

09:15

Computing era

09:17

there's just absolutely no question

09:18

about it every single Computing era you

09:20

could do different things that weren't

09:23

possible before and artificial

09:25

intelligence certainly qualifies this

09:27

particular Computing era is special in

09:30

several ways one

09:32

it is able to understand information of

09:35

more than just text and numbers it can

09:38

Now understand multi-modality which is

09:39

the reason why this Computing Revolution

09:41

can impact every industry

09:44

every industry two

09:46

because this computer

09:49

doesn't care how you program it

09:51

it will try to understand what you mean

09:53

because it has this incredible large

09:55

language model capability and so the

09:57

programming barrier is incredibly low we

10:00

have closed the digital divide

10:03

everyone is a programmer now you just

10:06

have to say something to the computer

10:08

third

10:10

this computer

10:12

not only is it able to do amazing things

10:14

for the for the future

10:16

it can do amazing things for every

10:19

single application of the previous era

10:22

which is the reason why all of these

10:24

apis are being connected into Windows

10:25

applications here and there in browsers

10:27

and PowerPoint and word every

10:29

application that exists will be better

10:31

because of AI

10:33

you don't have to just AI this

10:35

generation this Computing era does not

10:37

need

10:38

new applications it can succeed with old

10:41

applications and it's going to have new

10:43

applications

10:45

the rate of progress the rate of

10:47

progress because it's so easy to use

10:50

is the reason why it's growing so fast

10:52

this is going to touch literally every

10:54

single industry and at the core with

10:57

just as with every single Computing era

10:58

it needs a new Computing approach

11:01

the last several years I've been talking

11:03

to you about the new type of processor

11:05

we've been creating

11:06

and this is the reason we've been

11:08

creating it

11:09

ladies and gentlemen

11:11

Grace Hopper is now in full production

11:13

this is Grace Hopper

11:16

nearly 200 billion transistors in this

11:20

computer oh

11:23

foreign

11:29

look at this this is Grace Hopper

11:33

this this processor

11:36

this processor is really quite amazing

11:37

there are several characteristics about

11:39

it this is the world's first accelerated

11:41

processor

11:42

accelerated Computing processor that

11:44

also has a giant memory it has almost

11:47

600 gigabytes of memory that's coherent

11:50

between the CPU and the GPU and so the

11:52

GPU can reference the memory the CPU can

11:55

represent reference the memory and

11:57

unnecessary any unnecessary copying back

12:00

and forth could be avoided

12:02

the amazing amount of high-speed memory

12:05

lets the GPU work on very very large

12:07

data sets this is a computer this is not

12:10

a chip practically the Entire Computer

12:13

is on here all of the Lo this is uh uses

12:15

low power DDR memory just like your cell

12:18

phone except this has been optimized and

12:20

designed for high resilience data center

12:22

applications so let me show you what

12:24

we're going to do so the first thing is

12:26

of course we have the Grace Hopper

12:27

Superchip

12:28

put that into a computer the second

12:30

thing that we're going to do is we're

12:32

going to connect eight of these together

12:33

using ndlink this is an Envy link switch

12:36

so eight of this eight of this Connect

12:39

into three switch trays into eight eight

12:44

Grace Hopper pod

12:46

these eight Grace Hopper pods each one

12:49

of the grace Hoppers are connected to

12:51

the other Grace Hopper at 900 gigabytes

12:53

per second

12:54

Aid them connected together

12:56

as a pod and then we connect 32 of them

12:59

together

13:02

with another layer of switches

13:05

and in order to build in order to build

13:08

this

13:09

256 Grace Hopper Super Chips connected

13:12

into one exoflops one exaflops you know

13:18

that countries and Nations have been

13:20

working on exaflops Computing and just

13:23

recently achieved it

13:26

256 Grace Hoppers for deep learning is

13:28

one exaflop Transformer engine and it

13:30

gives us

13:32

144 terabytes of memory that every GPU

13:36

can see

13:38

this is not 144 terabytes distributed

13:41

this is 144 terabytes connected

13:45

why don't we take a look at what it

13:47

really looks like play please

13:57

foreign

14:04

[Applause]

14:11

this

14:13

is

14:14

150 miles of cables

14:17

fiber optic cables

14:19

2 000 fans

14:23

70

14:24

000 cubic feet per minute

14:27

it probably

14:28

recycles the air in this entire room in

14:31

a couple of minutes

14:34

forty thousand pounds

14:38

four elephants

14:43

one GPU

14:52

if I can get up on here this is actual

14:54

size

14:56

so this is this is our brand new

15:00

Grace Hopper AI supercomputer it is one

15:04

giant GPU

15:06

utterly incredible we're building it now

15:09

and we're so we're so excited that

15:11

Google Cloud meta and Microsoft will be

15:14

the first companies in the world to have

15:16

access

15:17

and they will be doing

15:18

exploratory research on the pioneering

15:21

front the boundaries of artificial

15:24

intelligence with us so this is the dgx

15:28

gh200 it is one giant GPU

15:33

okay I just talked about how we are

15:36

going to extend the frontier of AI

15:39

data centers all over the world and all

15:41

of them over the next decade will be

15:44

recycled

15:46

and re-engineered into accelerated data

15:49

centers and generative AI capable data

15:51

centers but there are so many different

15:52

applications in so many different areas

15:54

scientific computing

15:56

data processing cloud and video and

15:58

Graphics generative AI for Enterprise

16:01

and of course the edge each one of these

16:03

applications have different

16:04

configurations of servers

16:06

different focus of applications

16:08

different deployment methods and so

16:11

security is different operating system

16:13

is different how it's managed it's

16:14

different

16:15

well this is just an enormous number of

16:17

configurations and so today we're

16:19

announcing in partnership with so many

16:21

companies here in Taiwan the Nvidia mgx

16:24

it's an open modular server design

16:26

specification and the design for

16:28

Accelerated Computing most of the

16:30

servers today are designed for general

16:32

purpose Computing the mechanical thermal

16:35

and electrical is insufficient for a

16:38

very highly dense Computing system

16:40

accelerated computers take as you know

16:42

many servers and compress it into one

16:46

you save a lot of money you save a lot

16:48

of floor space but the architecture is

16:51

different and we designed it so that

16:53

it's multi-generation standardized so

16:55

that once you make an investment our

16:57

next generation gpus and Next Generation

16:58

CPUs and next generation dpus will

17:00

continue to easily configure into it so

17:02

that we can best time to Market and best

17:05

preservation of our investment different

17:07

data centers have different requirements

17:08

and we've made this modular and flexible

17:10

so that it could address all of these

17:12

different domains now this is the basic

17:14

chassis let's take a look at some of the

17:16

other things you can do with it this is

17:17

the Omniverse ovx server

17:19

it has x86 four l40s Bluefield three two