NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)

Ticker Symbol: YOU
11 Jun 202326:07

Summary

TLDRHuang discusses the computing industry reaching a tipping point with accelerated computing and AI, enabled by NVIDIA's new Grace Hopper and H100 chips. He explains how NVIDIA's full stack of hardware and software will power the next generation of AI across industries and use cases, from cloud to enterprise. Key announcements include connecting 256 Grace Hopper chips into an exascale AI supercomputer, new modular server designs optimized for AI, and an enterprise-grade software stack to make AI more accessible.

Takeaways

  • 😲 Nvidia has reached a tipping point in accelerated computing and generative AI.
  • 👩‍💻 Software is now programmed by engineers working with AI supercomputers.
  • 💻 H100 is a new AI supercomputer touching every industry.
  • 🔋 Accelerated computing is reinventing software from the ground up.
  • 🚀 Nvidia AI is an AI operating system for end-to-end deep learning.
  • 🤖 Grace Hopper, the new AI superchip, has nearly 200 billion transistors.
  • 📈 The Hopper superchip scales to 256 nodes for 1 exaflop AI power.
  • 🌐 Spectrum-X extends accelerated computing and AI to data centers.
  • 💼 Nvidia AI Enterprise brings secure enterprise-grade AI stack.
  • 🤝 Nvidia partners to enable modular accelerated computing systems.

Q & A

  • What marks the tipping point in computing according to the transcript?

    -The tipping point in computing is marked by the accelerated computing and generative AI, highlighting a significant shift in how software is developed and the capabilities of computing systems.

  • What is the significance of the H100 mentioned in the transcript?

    -The H100, mentioned as being in full volume production, is significant because it represents a leap in computing technology with 35,000 components and eight Hopper GPUs, aimed at impacting every industry due to its advanced capabilities.

  • Why is the H100 described as the world's single most expensive computer?

    -The H100 is described as the world's single most expensive computer, priced at $200,000, because it replaces an entire room of computers with its advanced capabilities, making it a cost-effective solution despite its high price.

  • What are the two fundamental transitions happening in the computer industry as described?

    -The two fundamental transitions in the computer industry are the end of CPU scaling, which limits performance improvements from traditional methods, and the discovery of a new way of doing software through deep learning, driving today's computing.

  • How does accelerated computing transform the processing of large language models?

    -Accelerated computing transforms the processing of large language models by significantly reducing the resources needed, from 11 gigawatt hours and nearly a thousand CPU servers to 3.2 gigawatt hours and 48 GPU servers, increasing efficiency and performance.

  • What is the role of NVIDIA AI in the context of the transcript?

    -NVIDIA AI is described as the only AI operating system in the world that spans from data processing to training, optimization, and deployment, underpinning the development and application of AI technologies across various industries.

  • How does the Grace Hopper AI supercomputer differ from traditional computing systems?

    -The Grace Hopper AI supercomputer differs by integrating accelerated computing processors with large, coherent memory spaces, enabling efficient handling of very large datasets and reducing unnecessary data copying, signifying a major advancement in AI-driven computing.

  • What is the envisioned future role of AI factories according to the transcript?

    -AI factories are envisioned as a fundamental part of major companies, where they will build and produce their company's intelligence through accelerated computing and artificial intelligence, marking a shift towards widespread artificial intelligence production.

  • What does the comparison of Moore's Law with the advancements in computer graphics and AI imply?

    -The comparison implies that the advancements in computer graphics and AI, accelerated by a factor of a thousand times in five years, vastly outpace the progress predicted by Moore's Law, indicating a revolutionary pace of technological improvement in these areas.

  • How does the concept of the data center as the computer change the approach to building computing infrastructure?

    -The concept of the data center as the computer changes the approach by emphasizing the importance of building cost-effective, highly efficient data centers over individual servers, focusing on the collective power and efficiency of the data center infrastructure for computational tasks.

Outlines

00:00

😮 Introducing the new h100 AI supercomputer

The paragraph introduces the h100, a new $200K AI supercomputer that replaces an entire room of computers. It has 35K components, 8 GPUs, and 60-65 lbs system boards that require robots for assembly and insertion. The more you buy, the more you save on this world's most expensive computer.

05:01

🚀 Reaching the tipping point of accelerated computing

The paragraph explains how accelerated computing with GPUs has reached a tipping point after decades of development across scientific domains, industries, and applications. Combined with the end of CPU scaling and emergence of deep learning, accelerated computing and generative AI represent fundamental industry transitions.

10:03

💡 AI supercomputers program software and touch every industry

The paragraph discusses how in the new computer industry, software is programmed by engineers working with AI supercomputers. These AI factories produce intelligence and will exist at every major company. The low programming barrier enables anyone to be a programmer. AI will improve all applications, succeeding without needing new apps, but also enabling new apps.

15:04

👩‍💻 Introducing Grace Hopper, the world's first accelerated processor

The paragraph introduces Grace Hopper, the world's first accelerated processor for AI with integrated GPUs and giant 600GB coherent memory. Eight Grace Hopper pods connected can achieve one AI exaflops for transformer engines and 144TB of shared memory.

20:08

🌐 Extending AI with accelerated computing networks

The paragraph contrasts hyperscale vs supercomputing data centers and explains how Ethernet connectivity needs to be reengineered for adaptive routing and congestion control to support tightly coupled AI workloads without slowing down collective communications.

25:08

🏢 Nvidia AI Enterprise enables accelerated computing for business

The paragraph introduces Nvidia AI Enterprise which makes accelerated computing with GPUs enterprise-grade and secure for the first time. Integrated with major cloud platforms, it allows businesses to leverage AI and accelerate applications by 24x at 5% of the cost.

Mindmap

Keywords

💡Accelerated Computing

Accelerated computing refers to the use of specialized hardware to perform computational tasks more efficiently than traditional CPU-based systems. In the video, accelerated computing is highlighted as a fundamental shift in computing, driven by the end of CPU scaling and the emergence of deep learning. This concept is crucial as it underpins the transition to using GPUs and other accelerators to achieve significant performance gains, particularly in AI and large-scale data processing. The speaker discusses how accelerated computing has evolved over three decades and its impact on enabling generative AI by drastically reducing computational time and energy consumption.

💡Generative AI

Generative AI involves algorithms that can generate new data instances (like images, text, or sounds) that resemble the training data. The video emphasizes the tipping point reached in generative AI, marking a new era where AI can produce highly accurate and diverse outputs. Generative AI's role in the video is tied to its applications in various industries and the development of AI supercomputers, showcasing its transformative potential across fields by leveraging large language models and other AI techniques.

💡H100

The H100, as mentioned in the video, is a cutting-edge GPU designed for accelerated computing and AI applications. It symbolizes a leap in computing power, capable of replacing rooms of computers and significantly impacting every industry. The discussion around the H100 includes its production process, the impressive number of components on its system board, and its role in powering AI supercomputers, illustrating the advancements in hardware that are driving the AI revolution.

💡Tensor Processing

Tensor processing is essential for performing complex calculations on multi-dimensional data arrays, which are common in machine learning and AI. The video highlights Nvidia's dedication to reinventing GPUs to excel at tensor processing, enabling more efficient AI computations. This focus on tensor processing is key to understanding the improvements in AI model training and inference, making AI applications more accessible and powerful.

💡Data Center as Computer

The concept of 'the data center as the computer' reflects a shift in computing architecture where the entire data center functions as a single, massive computing resource. This idea is central to the video's theme, showcasing how advances in accelerated computing and AI have transformed data centers from collections of individual servers to integrated computing powerhouses. This paradigm shift enables unprecedented computational capabilities, essential for processing the vast amounts of data generated by modern AI applications.

💡Grace Hopper

Grace Hopper refers to a superchip combining CPU and GPU architectures for high-performance computing tasks. In the video, it's described as a milestone in accelerated processor design, with a focus on enabling AI applications. The Grace Hopper superchip exemplifies the integration of different computing paradigms to achieve significant performance gains, illustrating the continuous innovation in hardware necessary to support the growing demands of AI and deep learning.

💡Exaflops

An exaflop is a measure of computing performance, equivalent to a quintillion (10^18) floating-point operations per second. The video mentions achieving exaflop computing power through the assembly of Grace Hopper Super Chips, marking a significant milestone in computational capabilities. This level of performance enables breakthroughs in AI, scientific research, and complex simulations, highlighting the rapid advancement and ambition of modern computing projects.

💡AI Factories

AI factories, as described in the video, are specialized facilities or computational frameworks designed to produce AI models and intelligence for companies. This concept represents a future where companies leverage their own AI capabilities to generate proprietary knowledge and solutions. The video ties this idea to the broader theme of the industrialization of AI, where accelerated computing and AI technologies enable the mass production of AI-driven insights and innovations.

💡Nvidia AI

Nvidia AI is referred to as the only AI operating system that encompasses the entire workflow of AI applications, from data processing and training to optimization and deployment. The video underscores Nvidia's role in providing a comprehensive ecosystem for AI development, highlighting how Nvidia AI facilitates the efficient and scalable use of accelerated computing resources for AI applications across industries.

💡Digital Divide

The digital divide traditionally refers to the gap between those who have access to modern information and communication technology and those who do not. In the context of the video, closing the digital divide involves making programming and AI technologies accessible to a broader audience, empowering more people to leverage AI without needing extensive technical expertise. The speaker emphasizes how advancements in AI and computing are democratizing access to technology, enabling more individuals and organizations to participate in the AI revolution.

Highlights

CPU scaling has ended, ending the ability to get 10x more performance every 5 years at the same cost

Deep learning and accelerated computing came together, driving AI progress today

GPUs are optimized for tensor processing, enabling algorithms for data processing, training, optimization and deployment

Connected GPUs with NvLink to build one giant GPU, then connected GPUs using InfiniBand into larger scale computers

Software is no longer programmed just by engineers, it's co-created by engineers and AI supercomputers

AI supercomputers are a new type of factory that produce a company's intelligence

Accelerated 1,000x in 5 years versus Moore's Law at 2x; aiming for 1 million x in 10 years

This era understands multi-modality, has low programming barriers, upgrades old apps, and progresses rapidly

Announcing Grace Hopper, the world's first accelerated computing AI processor with almost 600GB of coherent memory

Connecting 256 Grace Hopper chips into one AI supercomputer delivers 1 exaflop of processing

Announcing NVIDIA MGX, an open, modular accelerated computing server architecture

Introducing new Ethernet with adaptive routing and congestion control for high performance computing

NVIDIA AI Enterprise makes accelerated computing enterprise-grade secure and supported

This era is accelerated computing, generative AI, full stack, data center scale, and domain specific

In production with H100, scaling with Grace Hopper, aiming to extend generative AI everywhere

Transcripts

00:00

this is the new computer industry

00:03

software is no longer programmed just by

00:05

computer Engineers software is

00:07

programmed by computer Engineers working

00:09

with AI supercomputers we have now

00:12

reached the Tipping Point of accelerated

00:15

Computing we have now reached the

00:17

Tipping Point of generative Ai and we

00:20

are so so so excited to be in full

00:23

volume production of the h100 this is

00:27

going to touch literally every single

00:28

industry let's take a look at how h100

00:32

is produced

00:36

[Music]

01:13

okay

01:18

[Music]

01:28

35

01:29

000 components on that system board

01:32

eight

01:33

Hopper gpus

01:36

let me show it to you

01:43

all right this

01:45

I would I would lift this but I I um I

01:48

still have the rest of the keynote I

01:49

would like to give this is 60 pounds 65

01:53

pounds it takes robots to lift it of

01:55

course and it takes robots to insert it

01:57

because the insertion pressure is so

01:58

high and has to be so perfect

02:00

this computer is two hundred thousand

02:02

dollars and as you know it replaces an

02:05

entire room of other computers it's the

02:07

world's single most expensive computer

02:10

that you can say the more you buy the

02:12

more you save

02:18

this is what a compute trade looks like

02:20

even this is incredibly heavy

02:24

see that

02:25

so this is the brand new h100 with the

02:29

world's first computer that has a

02:31

Transformer engine in it

02:33

the performance is utterly incredible

02:37

there are two fundamental transitions

02:39

happening in the computer industry today

02:41

all of you are deep within it and you

02:43

feel it there are two fundamental Trends

02:46

the first trend is because CPU scaling

02:49

has ended the ability to get 10 times

02:52

more performance every five years has

02:54

ended the ability to get 10 times more

02:56

performance every five years at the same

02:58

cost is the reason why computers are so

03:00

fast today

03:01

that trend has ended it happened at

03:04

exactly the time when a new way of doing

03:07

software was discovered deep learning

03:09

these two events came together and is

03:13

driving Computing today

03:16

accelerated Computing and generative AI

03:20

of doing software just a way of doing

03:21

computation is a reinvention from the

03:23

ground up and it's not easy accelerated

03:26

Computing has taken us nearly three

03:28

decades to accomplish

03:30

well this is how accelerated Computing

03:32

works

03:33

this is accelerated Computing used for

03:35

large language models basically the core

03:38

of generative AI this example is a 10

03:41

million dollar server and so 10 million

03:43

dollars gets you nearly a thousand CPU

03:45

servers and to train to process this

03:49

large language model takes 11 gigawatt

03:52

hours 11 gigawatt hours okay and this is

03:55

what happens when you accelerate this

03:58

workload with accelerated Computing and

04:00

so with 10 million dollars for a 10

04:02

million dollar server you buy 48 GPU

04:04

servers it's the reason why people say

04:06

that GPU servers are so expensive

04:10

remember people say GPS servers are so

04:13

expensive however the GPU server is no

04:17

longer the computer the computer is the

04:19

data center

04:20

your goal is to build the most cost

04:22

effective data center not build the most

04:24

cost effective server

04:26

back in the old days when the computer

04:28

was the server that would be a

04:30

reasonable thing to do but today the

04:32

computer is the data center so for 10

04:34

million dollars you buy 48 GPU servers

04:36

it only consumes 3.2 gigawatt hours and

04:41

44 times the performance

04:43

let me just show it to you one more time

04:46

this is before and this is after and

04:48

this is

04:53

we want dense computers not big ones we

04:57

want dense computers fast computers not

04:58

big ones let me show you something else

05:01

this is my favorite

05:03

if your goal if your goal is to get the

05:05

work done

05:07

and this is the work you want to get

05:08

done ISO work

05:10

okay this is ISO work all right look at

05:13

this

05:24

look at this look at this before

05:27

after you've heard me talk about this

05:29

for so many years

05:31

in fact every single time you saw me

05:33

I've been talking to you about

05:34

accelerated computing

05:35

and now

05:37

why is it that finally it's the Tipping

05:39

Point because we have now addressed so

05:42

many different domains of science so

05:44

many Industries and in data processing

05:47

in deep learning classical machine

05:49

learning

05:51

so many different ways for us to deploy

05:52

software from the cloud to Enterprise to

05:55

Super Computing to the edge

05:56

so many different configurations of gpus

05:58

from our hgx versions to our Omniverse

06:02

versions to our Cloud GPU and Graphics

06:04

version so many different versions now

06:06

the utilization is incredibly High

06:10

the utilization of Nvidia GPU is so high

06:13

almost every single cloud is

06:15

overextended almost every single data

06:18

center is overextended there are so many

06:19

different applications using it so we

06:22

have now reached the Tipping Point of

06:25

accelerated Computing we have now

06:27

reached the Tipping Point of generative

06:29

AI

06:30

people thought that gpus would just be

06:32

gpus they were completely wrong we

06:33

dedicated ourselves to Reinventing the

06:35

GPU so that it's incredibly good at

06:38

tensor processing and then all of the

06:40

algorithms and engines that sit on top

06:42

of these computers we call Nvidia AI the

06:45

only AI operating system in the world

06:47

that takes data processing from data

06:49

processing to training to optimization

06:52

to deployment and inference

06:54

end to end deep learning processing it

06:57

is the engine of AI today

06:59

we connected gpus to other gpus called

07:02

mvlink build one giant GPU and we

07:04

connected those gpus together using

07:07

infiniband into larger scale computers

07:09

the ability for us to drive the

07:11

processor and extend the scale of

07:14

computing

07:15

made it possible

07:17

for the AI research organization the

07:19

community to advance AI at an incredible

07:22

rate

07:23

so every two years we take giant leaps

07:26

forward and I'm expecting the next lead

07:28

to be giant as well

07:29

this is the new computer industry

07:33

software is no longer programmed just by

07:35

computer Engineers software is

07:37

programmed by computer Engineers working

07:39

with AI supercomputers these AI

07:42

supercomputers

07:43

are a new type of factory

07:46

it is very logical that a car industry

07:49

has factories they build things so you

07:50

can see cars

07:51

it is very logical that computer

07:54

industry has computer factories you

07:56

build things that you can see computers

07:59

in the future

08:01

every single major company will also

08:05

have ai factories

08:08

and you will build and produce your

08:11

company's intelligence

08:12

and it's a very sensible thing

08:15

we are intelligence producers already

08:18

it's just that the intelligence

08:19

producers the intelligence are people in

08:22

the future we will be intelligence

08:24

producers artificial intelligence

08:26

producers and every single company will

08:28

have factories and the factories will be

08:30

built this way

08:31

using accelerated Computing and

08:33

artificial intelligence we accelerated

08:34

computer Graphics by 1 000 times in five

08:37

years

08:38

Moore's Law is probably currently

08:40

running at about two times

08:42

a thousand times in five years a

08:45

thousand times in five years is one

08:47

million times in ten we're doing the

08:49

same thing in artificial intelligence

08:51

now question is what can you do when

08:53

your computer is one million times

08:55

faster

08:57

what would you do if your computer was

08:59

one million times faster well it turns

09:02

out that we can now apply the instrument

09:04

of our industry to so many different

09:07

fields that were impossible before

09:10

this is the reason why everybody is so

09:12

excited

09:14

there's no question that we're in a new

09:15

Computing era

09:17

there's just absolutely no question

09:18

about it every single Computing era you

09:20

could do different things that weren't

09:23

possible before and artificial

09:25

intelligence certainly qualifies this

09:27

particular Computing era is special in

09:30

several ways one

09:32

it is able to understand information of

09:35

more than just text and numbers it can

09:38

Now understand multi-modality which is

09:39

the reason why this Computing Revolution

09:41

can impact every industry

09:44

every industry two

09:46

because this computer

09:49

doesn't care how you program it

09:51

it will try to understand what you mean

09:53

because it has this incredible large

09:55

language model capability and so the

09:57

programming barrier is incredibly low we

10:00

have closed the digital divide

10:03

everyone is a programmer now you just

10:06

have to say something to the computer

10:08

third

10:10

this computer

10:12

not only is it able to do amazing things

10:14

for the for the future

10:16

it can do amazing things for every

10:19

single application of the previous era

10:22

which is the reason why all of these

10:24

apis are being connected into Windows

10:25

applications here and there in browsers

10:27

and PowerPoint and word every

10:29

application that exists will be better

10:31

because of AI

10:33

you don't have to just AI this

10:35

generation this Computing era does not

10:37

need

10:38

new applications it can succeed with old

10:41

applications and it's going to have new

10:43

applications

10:45

the rate of progress the rate of

10:47

progress because it's so easy to use

10:50

is the reason why it's growing so fast

10:52

this is going to touch literally every

10:54

single industry and at the core with

10:57

just as with every single Computing era

10:58

it needs a new Computing approach

11:01

the last several years I've been talking

11:03

to you about the new type of processor

11:05

we've been creating

11:06

and this is the reason we've been

11:08

creating it

11:09

ladies and gentlemen

11:11

Grace Hopper is now in full production

11:13

this is Grace Hopper

11:16

nearly 200 billion transistors in this

11:20

computer oh

11:23

foreign

11:29

look at this this is Grace Hopper

11:33

this this processor

11:36

this processor is really quite amazing

11:37

there are several characteristics about

11:39

it this is the world's first accelerated

11:41

processor

11:42

accelerated Computing processor that

11:44

also has a giant memory it has almost

11:47

600 gigabytes of memory that's coherent

11:50

between the CPU and the GPU and so the

11:52

GPU can reference the memory the CPU can

11:55

represent reference the memory and

11:57

unnecessary any unnecessary copying back

12:00

and forth could be avoided

12:02

the amazing amount of high-speed memory

12:05

lets the GPU work on very very large

12:07

data sets this is a computer this is not

12:10

a chip practically the Entire Computer

12:13

is on here all of the Lo this is uh uses

12:15

low power DDR memory just like your cell

12:18

phone except this has been optimized and

12:20

designed for high resilience data center

12:22

applications so let me show you what

12:24

we're going to do so the first thing is

12:26

of course we have the Grace Hopper

12:27

Superchip

12:28

put that into a computer the second

12:30

thing that we're going to do is we're

12:32

going to connect eight of these together

12:33

using ndlink this is an Envy link switch

12:36

so eight of this eight of this Connect

12:39

into three switch trays into eight eight

12:44

Grace Hopper pod

12:46

these eight Grace Hopper pods each one

12:49

of the grace Hoppers are connected to

12:51

the other Grace Hopper at 900 gigabytes

12:53

per second

12:54

Aid them connected together

12:56

as a pod and then we connect 32 of them

12:59

together

13:02

with another layer of switches

13:05

and in order to build in order to build

13:08

this

13:09

256 Grace Hopper Super Chips connected

13:12

into one exoflops one exaflops you know

13:18

that countries and Nations have been

13:20

working on exaflops Computing and just

13:23

recently achieved it

13:26

256 Grace Hoppers for deep learning is

13:28

one exaflop Transformer engine and it

13:30

gives us

13:32

144 terabytes of memory that every GPU

13:36

can see

13:38

this is not 144 terabytes distributed

13:41

this is 144 terabytes connected

13:45

why don't we take a look at what it

13:47

really looks like play please

13:57

foreign

14:04

[Applause]

14:11

this

14:13

is

14:14

150 miles of cables

14:17

fiber optic cables

14:19

2 000 fans

14:23

70

14:24

000 cubic feet per minute

14:27

it probably

14:28

recycles the air in this entire room in

14:31

a couple of minutes

14:34

forty thousand pounds

14:38

four elephants

14:43

one GPU

14:52

if I can get up on here this is actual

14:54

size

14:56

so this is this is our brand new

15:00

Grace Hopper AI supercomputer it is one

15:04

giant GPU

15:06

utterly incredible we're building it now

15:09

and we're so we're so excited that

15:11

Google Cloud meta and Microsoft will be

15:14

the first companies in the world to have

15:16

access

15:17

and they will be doing

15:18

exploratory research on the pioneering

15:21

front the boundaries of artificial

15:24

intelligence with us so this is the dgx

15:28

gh200 it is one giant GPU

15:33

okay I just talked about how we are

15:36

going to extend the frontier of AI

15:39

data centers all over the world and all

15:41

of them over the next decade will be

15:44

recycled

15:46

and re-engineered into accelerated data

15:49

centers and generative AI capable data

15:51

centers but there are so many different

15:52

applications in so many different areas

15:54

scientific computing

15:56

data processing cloud and video and

15:58

Graphics generative AI for Enterprise

16:01

and of course the edge each one of these

16:03

applications have different

16:04

configurations of servers

16:06

different focus of applications

16:08

different deployment methods and so

16:11

security is different operating system

16:13

is different how it's managed it's

16:14

different

16:15

well this is just an enormous number of

16:17

configurations and so today we're

16:19

announcing in partnership with so many

16:21

companies here in Taiwan the Nvidia mgx

16:24

it's an open modular server design

16:26

specification and the design for

16:28

Accelerated Computing most of the

16:30

servers today are designed for general

16:32

purpose Computing the mechanical thermal

16:35

and electrical is insufficient for a

16:38

very highly dense Computing system

16:40

accelerated computers take as you know

16:42

many servers and compress it into one

16:46

you save a lot of money you save a lot

16:48

of floor space but the architecture is

16:51

different and we designed it so that

16:53

it's multi-generation standardized so

16:55

that once you make an investment our

16:57

next generation gpus and Next Generation

16:58

CPUs and next generation dpus will

17:00

continue to easily configure into it so

17:02

that we can best time to Market and best

17:05

preservation of our investment different

17:07

data centers have different requirements

17:08

and we've made this modular and flexible

17:10

so that it could address all of these

17:12

different domains now this is the basic

17:14

chassis let's take a look at some of the

17:16

other things you can do with it this is

17:17

the Omniverse ovx server

17:19

it has x86 four l40s Bluefield three two

17:23

CX-7 six PCI Express Lots this is the

17:26

grace Omniverse server

17:29

Grace same for l40s BF3 Bluefield 3 and

17:33

2 cx-7s okay this is the grace Cloud

17:35

Graphics server

17:38

this is the hopper NV link generative AI

17:42

inference server

17:43

and of course Grace Hopper liquid cooled

17:46

okay for very dense servers and then

17:48

this one is our dense general purpose

17:51

Grace Superchip server this is just CPU

17:54

and has the ability to accommodate four

17:57

CPU four gray CPUs or two gray

18:00

Superchips enormous amounts of

18:03

performance in ISO performance Grace

18:05

only consumes 580 Watts for the whole

18:08

for the whole server versus the latest

18:11

generation CPU servers x86 servers 1090

18:14

Watts it's basically half the power at

18:17

the same performance or another way of

18:18

saying

18:19

you know at the same power if your data

18:22

center is power constrained you get

18:24

twice the performance most data centers

18:27

today are power limited and so this is

18:30

really a terrific capability

18:32

we're going to expand AI into a new

18:34

territory

18:35

if you look at the world's data centers

18:37

the data center is now the computer and

18:39

the network defines what that data

18:41

center does largely there are two types

18:43

of data centers today there's the data

18:46

center that's used for hyperscale where

18:49

you have application workloads of all

18:51

different kinds the number of CPUs you

18:53

the number of gpus you connect to it is

18:55

relatively low the number of tenants is

18:58

very high the workloads are Loosely

19:00

coupled

19:01

and you have another type of data center

19:03

they're like super Computing data

19:04

centers AI supercomputers where the

19:07

workloads are tightly coupled

19:10

the number of tenants far fewer and

19:13

sometimes just one

19:15

its purpose is high throughput on very

19:18

large Computing problems

19:21

and so super Computing centers and Ai

19:22

supercomputers and the world's cloud

19:24

hyperscale cloud are very different in

19:26

nature

19:28

the ability for ethernet to interconnect

19:30

components of almost from anywhere is

19:33

the reason why the world's internet was

19:35

created if it required too much

19:37

coordination how could we have built

19:39

today's internet so ethernet's profound

19:41

contribution it's this lossy capability

19:44

is resilient capability and because so

19:46

it basically can connect almost anything

19:48

together

19:49

however a super Computing data center

19:51

can't afford that you can't interconnect

19:53

random things together because that

19:55

billion dollar supercomputer the

19:57

difference between 95 percent

20:01

networking throughput achieved versus 50

20:04

is effectively 500 million dollars

20:07

now it's really really important to

20:09

realize that in a high performance

20:11

Computing application every single GPU

20:15

must finish their job so that the

20:18

application can move on

20:20

in many cases where you do all

20:22

reductions you have to wait until the

20:24

results of every single one so if one

20:26

node takes too long everybody gets held

20:28

back

20:29

the question is how do we introduce

20:33

a new type of ethernet that's of course

20:35

backwards compatible with everything but

20:37

it's engineered in a way that achieves

20:39

the type of capabilities that we that we

20:42

can bring AI workloads to the world's

20:45

any data center first

20:48

adaptive routing adaptive routing

20:50

basically says based on the traffic that

20:53

is going through your data center

20:54

depending on which one of the ports of

20:57

that switch is over congested it will

20:59

tell Bluefield 3 to send and will send

21:02

it to another Port Bluefield 3 on the

21:05

other end would reassemble it and

21:08

present the data to the GPU without any

21:12

CPU intervention second congestion

21:14

control congestion control it is

21:16

possible for a certain different ports

21:20

to become heavily congested in which

21:22

case each switch will see how the

21:25

network is performing and communicate to

21:27

the senders please don't send any more

21:30

data right away

21:32

because you're congesting the network

21:33

that congestion control requires

21:35

basically a overriding system which

21:38

includes software the switch working

21:40

with all of the endpoints to overall

21:43

manage the congestion or the traffic and

21:45

the throughput of the data center this

21:47

capability is going to increase

21:48

ethernet's overall performance

21:50

dramatically

21:51

now one of the things that very few

21:53

people realize

21:55

is that today there's only one software

21:58

stack that is Enterprise secure and

22:01

Enterprise grade

22:03

that software stack is CPU

22:06

and the reason for that is because in

22:08

order to be Enterprise grade it has to

22:11

be Enterprise secure and has to be

22:12

Enterprise managed and Enterprise

22:13

supported over 4 000 software packages

22:17

is what it takes for people to use

22:20

accelerated Computing today in data

22:22

processing and training and optimization

22:24

all the way to inference so for the very

22:26

first time we are taking all of that

22:28

software

22:29

and we're going to maintain it and

22:31

manage it like red hat does for Linux

22:34

Nvidia AI Enterprise will do it for all

22:37

of nvidia's libraries now Enterprise can

22:40

finally have an Enterprise grade and

22:42

Enterprise secure software stack this is

22:45

such a big deal otherwise

22:47

even though the promise of accelerated

22:49

Computing is possible for many

22:51

researchers and scientists is not

22:53

available for Enterprise companies and

22:55

so let's take a look at the benefit for

22:57

them this is a simple image processing

23:00

application if you were to do it on a

23:02

CPU versus on a GPU running on

23:04

Enterprise Nvidia AI Enterprise you're

23:06

getting

23:08

31.8 images per minute or basically 24

23:11

times the throughput or you only pay

23:14

five percent of the cost

23:17

this is really quite amazing this is the

23:19

benefit of accelerated Computing in the

23:21

cloud but for many companies Enterprises

23:23

is simply not possible unless you have

23:25

this stack

23:27

Nvidia AI Enterprise is now fully

23:30

integrated into AWS Google cloud and

23:33

Microsoft Azure or an oracle Cloud it is

23:35

also integrated into the world's machine

23:38

learning operations pipeline as I

23:40

mentioned before AI is a different type

23:42

of workload and this type of new type of

23:45

software this new type of software has a

23:46

whole new software industry and this

23:48

software industry 100 of them we have

23:50

now connected with Nvidia Enterprise

23:52

I told you several things I told you

23:54

that we are going through two

23:57

simultaneous Computing industry

23:59

transition accelerated Computing and

24:01

generative AI

24:03

two

24:04

this form of computing is not like the

24:07

traditional general purpose Computing it

24:09

is full stack

24:11

it is Data Center scale because the data

24:13

center is the computer and it is domain

24:16

specific for every domain that you want

24:18

to go into every industry you go into

24:19

you need to have the software stack and

24:22

if you have the software stack then the

24:24

utility the utilization of your machine

24:26

the utilization of your computer will be

24:28

high

24:29

so number two

24:30

it is full stack data scanner scale and

24:32

domain specific we are in full

24:34

production of the engine of generative

24:37

Ai and that is hgx h100 meanwhile

24:41

this engine that's going to be used for

24:43

AI factories will be scaled out using

24:46

Grace Hopper the engine that we created

24:49

for the era of generative AI we also

24:52

took Grace Hopper connected to 256 node

24:55

nvlink and created the largest GPU in

24:58

the world dgx

24:59

gh200

25:01

we're trying to extend generative Ai and

25:02

accelerated Computing in several

25:04

different directions at the same time

25:05

number one we would like to of course

25:07

extend it in the cloud

25:10

so that every cloud data center can be

25:12

an AI data center not just AI factories

25:15

and hyperscale but every hyperscale data

25:18

center can now be a generative AI Data

25:20

Center and the way we do that is the

25:21

Spectrum X it takes four components to

25:24

make Spectrum X possible the switch

25:27

the Bluefield 3 Nick the interconnects

25:30

themselves the cables are so important

25:31

in high speed high-speed Communications

25:33

and the software stack that goes on top

25:35

of it we would like to extend generative

25:37

AI to the world's Enterprise and there

25:40

are so many different configurations of

25:41

servers and the way we're doing that

25:42

with partnership with our Taiwanese

25:44

ecosystem the mgx modular accelerated

25:47

Computing systems we put Nvidia into

25:49

Cloud so that every Enterprise in the

25:52

world can engage us to create generative

25:55

AI models and deploy it in a Enterprise

25:58

grade Enterprise secure way in every

26:01

single Cloud I want to thank all of you

26:03

for your partnership over the years

26:04

thank you

26:05

[Applause]