NVIDIA'S HUGE AI Chip Breakthroughs Change Everything (Supercut)

Ticker Symbol: YOU
11 Jun 202326:07

Summary

TLDR这段视频展示了计算行业的重大变革,尤其是加速计算和生成式AI的兴起。演讲强调了软件开发的新模式,即计算机工程师与AI超级计算机共同编程,及其对各行各业的深远影响。介绍了价值20万美元的H100计算系统和其生产过程,强调了其在大数据处理和AI领域的应用潜力。此外,还探讨了加速计算的挑战、发展历程和未来方向,以及Nvidia如何通过创新的Grace Hopper超级芯片和AI工厂概念来重新定义数据中心,以应对计算需求的爆炸性增长。视频展望了AI技术如何塑造未来,减少能耗并提高计算效率。

Takeaways

  • 🚀 软件开发已经迈入新时代,不再仅由计算机工程师独立完成,而是与AI超级计算机共同协作。
  • 🌐 我们已经达到加速计算和生成式AI的转折点,这一变化正在全面影响各个行业。
  • 💡 H100计算机的生产标志着这一变革的具体实现,它集成了35,000个组件和8个Hopper GPUs,重达60-65磅,需要机器人操作。
  • 💰 H100是世界上最昂贵的单台计算机,售价20万美元,但它能替代整个计算机房的工作,体现了“越买越省”的概念。
  • 🔬 加速计算和生成式AI重新定义了软件开发和计算方式,这一转变耗时近三十年才实现。
  • 🏭 NVIDIA的加速计算用于大型语言模型,展示了通过GPU服务器实现的成本效率和性能提升。
  • 🌟 引入Grace Hopper超级芯片,它集成了近2000亿个晶体管,代表了加速处理器的新阶段,具有巨大的内存和高速度。
  • 📈 NVIDIA通过创建能看到144TB内存的超级计算系统,将256个Grace Hopper Super Chips连接成一个拥有1 ExaFLOPS的系统。
  • 🌍 每个主要公司未来都将拥有AI工厂,生产和提炼公司的智能,这是加速计算和AI技术的直接应用。
  • 🔧 NVIDIA MGX提出了一个开放的模块化服务器设计规范,旨在满足加速计算的需求,支持多代技术的平滑过渡。

Q & A

  • H100是如何生产的?

    -H100的生产涉及到35,000个组件被集成到系统板上,包括8个Hopper GPUs。其组装过程高度依赖机器人,因为产品本身重达60至65磅,且插入压力极高,需要精确操作。

  • 为什么说H100是目前世界上最昂贵的计算机之一?

    -H100的价格高达200,000美元,其性能强大到可以替代一个满是其他计算机的房间,因此被视为世界上单个最昂贵的计算机之一。

  • 加速计算和生成式AI在计算机行业中的转变是什么?

    -这两种技术标志着计算机行业的两个基本趋势。因为CPU性能提升的天花板已到,同时深度学习的出现,加速计算和生成式AI正在重新定义软件的开发方式,推动计算能力的飞跃性增长。

  • 为什么GPU服务器被认为是昂贵的?

    -虽然GPU服务器的初始成本高,但它们提供的加速计算能力意味着更高的性能和效率,尤其是在处理大型语言模型和其他复杂计算任务时。

  • Nvidia AI操作系统在AI领域的作用是什么?

    -Nvidia AI是世界上唯一的AI操作系统,它从数据处理到训练、优化到部署和推理,提供端到端的深度学习处理能力,是今天AI技术的核心引擎。

  • 为什么未来每个主要公司都将拥有AI工厂?

    -随着加速计算和人工智能的应用日益广泛,企业将需要拥有自己的AI工厂来生产和优化公司智能,以保持竞争力和创新能力。

  • Grace Hopper超级芯片的特点是什么?

    -Grace Hopper超级芯片拥有近2000亿个晶体管,配备了近600GB的内存,支持CPU和GPU之间的内存共享,避免了不必要的数据复制,适用于处理非常大的数据集。

  • 为什么说当前是一个新的计算时代?

    -当前的计算时代能够理解多模态信息,并通过大型语言模型降低编程障碍,使每个人都能成为程序员。此外,它为过去和未来的应用提供了强大的支持,预示着技术和应用的巨大进步。

  • Nvidia如何通过Grace Hopper扩展加速计算和生成式AI?

    -Nvidia通过将Grace Hopper与其他GPU连接,构建了世界上最大的GPU,DGX GH200,它拥有256个Grace Hopper超级芯片,提供了前所未有的计算能力和内存资源。

  • Nvidia MGX在加速计算系统设计中的创新是什么?

    -Nvidia MGX是一个开放的模块化服务器设计规范,旨在满足加速计算的特殊要求,包括机械、热量和电气设计,以支持高密度计算系统,确保对未来几代GPU的兼容性和投资保值。

Outlines

00:00

💻 新的计算机产业时代

新的计算机产业时代标志着软件开发的根本变革,计算机工程师现在与AI超级计算机合作进行软件编程。我们已经达到加速计算和生成式AI的转折点,全面生产H100的实现,预示着它将影响每一个行业。H100系统板搭载了35000个组件和8个Hopper GPU,重达60-65磅,需要机器人进行搬运和精确插入。这台计算机的成本为20万美元,但其性能替代了整个计算机房的其他计算机,体现了“买得越多,节省越多”的概念。H100引入了变革性的计算方式,特别是对大型语言模型的加速计算展示了其惊人的性能。

05:01

🚀 加速计算和生成式AI的转折点

随着加速计算和生成式AI达到转折点,NVIDIA GPU的利用率达到了前所未有的高度,几乎每个云服务和数据中心都需求过剩。GPU不仅仅是图形处理单元,而是经过重新设计,极其擅长张量处理,NVIDIA AI成为了唯一的AI操作系统,从数据处理到训练、优化直至部署和推理,实现了端到端深度学习处理。通过MVLink和InfiniBand技术,将GPU互连,构建出更大规模的计算系统,加速AI研究的进展,每两年实现巨大飞跃。此外,AI超级计算机的出现,被视为新工厂,预示着未来每个大公司都将拥有自己的AI工厂,生产公司智能。

10:03

🌐 Grace Hopper超级芯片的全面生产

Grace Hopper超级芯片的生产标志着加速处理器技术的新里程碑,拥有近2000亿个晶体管,整合了巨量的高速存储,允许CPU和GPU之间的无缝数据交换,避免了不必要的数据复制。这种设计使得GPU能够处理更大的数据集,整个计算机几乎集成在一块芯片上,使用了与手机相同的低功耗DDR内存,但针对数据中心的高可靠性进行了优化。通过NVLink技术,将8个Grace Hopper Superchip连接起来,构建出一个拥有256个Grace Hopper Superchips的超级计算系统,实现了1 exaflops的深度学习计算能力,拥有144TB的共享内存,展示了前所未有的计算能力和规模。

15:04

🤖 面向未来的AI数据中心

随着AI和加速计算技术的发展,全球的数据中心将转型为支持加速计算和生成式AI的高效数据中心。NVIDIA提出了一个开放的模块化服务器设计规范,NVIDIA MGX,专为加速计算而设计,以适应不同应用和配置的需求。介绍了几种服务器配置,包括Omniverse OVX服务器和Grace Hopper液冷服务器等,旨在提高计算密度和效率。同时,强调了在相同性能下Grace Superchip服务器相比传统CPU服务器能够大幅降低功耗,推动AI技术进入新的领域,标志着数据中心将成为未来计算的核心。

20:08

🌟 AI和加速计算的未来展望

展望未来,AI和加速计算的融合将使每个云数据中心都能成为一个AI数据中心,推动AI技术向企业和各行各业的广泛扩展。通过Spectrum X组件和NVIDIA MGX模块化加速计算系统,NVIDIA旨在将生成式AI引入全球企业,与台湾生态系统合作,推动硬件和软件的创新。感谢过去几年的合作伙伴关系,NVIDIA对于未来AI数据中心的发展充满信心,预示着加速计算和生成式AI技术将继续引领行业发展,实现更广泛的应用和影响。

Mindmap

Keywords

💡accelerated computing

加速计算,使用GPU等硬件加速深度学习和AI计算

💡generative AI

生成式AI,可以自动生成文本、图像等的AI

💡Grace Hopper

恩维迪亚最新GPU架构

💡H100

恩维迪亚最新数据中心GPU

💡hyperscale

大规模数据中心

💡AI supercomputer

AI超级计算机

💡modular design

模块化设计,提高可扩展性

💡NVLink

恩维迪亚GPU间高速互联

💡BlueField

恩维迪亚AI数据中心网络处理器

💡full stack

全栈,指完整的软硬件堆栈

Highlights

第一个重要的要点文字

第二个显着的要点文字

Transcripts

00:00

this is the new computer industry

00:03

software is no longer programmed just by

00:05

computer Engineers software is

00:07

programmed by computer Engineers working

00:09

with AI supercomputers we have now

00:12

reached the Tipping Point of accelerated

00:15

Computing we have now reached the

00:17

Tipping Point of generative Ai and we

00:20

are so so so excited to be in full

00:23

volume production of the h100 this is

00:27

going to touch literally every single

00:28

industry let's take a look at how h100

00:32

is produced

00:36

[Music]

01:13

okay

01:18

[Music]

01:28

35

01:29

000 components on that system board

01:32

eight

01:33

Hopper gpus

01:36

let me show it to you

01:43

all right this

01:45

I would I would lift this but I I um I

01:48

still have the rest of the keynote I

01:49

would like to give this is 60 pounds 65

01:53

pounds it takes robots to lift it of

01:55

course and it takes robots to insert it

01:57

because the insertion pressure is so

01:58

high and has to be so perfect

02:00

this computer is two hundred thousand

02:02

dollars and as you know it replaces an

02:05

entire room of other computers it's the

02:07

world's single most expensive computer

02:10

that you can say the more you buy the

02:12

more you save

02:18

this is what a compute trade looks like

02:20

even this is incredibly heavy

02:24

see that

02:25

so this is the brand new h100 with the

02:29

world's first computer that has a

02:31

Transformer engine in it

02:33

the performance is utterly incredible

02:37

there are two fundamental transitions

02:39

happening in the computer industry today

02:41

all of you are deep within it and you

02:43

feel it there are two fundamental Trends

02:46

the first trend is because CPU scaling

02:49

has ended the ability to get 10 times

02:52

more performance every five years has

02:54

ended the ability to get 10 times more

02:56

performance every five years at the same

02:58

cost is the reason why computers are so

03:00

fast today

03:01

that trend has ended it happened at

03:04

exactly the time when a new way of doing

03:07

software was discovered deep learning

03:09

these two events came together and is

03:13

driving Computing today

03:16

accelerated Computing and generative AI

03:20

of doing software just a way of doing

03:21

computation is a reinvention from the

03:23

ground up and it's not easy accelerated

03:26

Computing has taken us nearly three

03:28

decades to accomplish

03:30

well this is how accelerated Computing

03:32

works

03:33

this is accelerated Computing used for

03:35

large language models basically the core

03:38

of generative AI this example is a 10

03:41

million dollar server and so 10 million

03:43

dollars gets you nearly a thousand CPU

03:45

servers and to train to process this

03:49

large language model takes 11 gigawatt

03:52

hours 11 gigawatt hours okay and this is

03:55

what happens when you accelerate this

03:58

workload with accelerated Computing and

04:00

so with 10 million dollars for a 10

04:02

million dollar server you buy 48 GPU

04:04

servers it's the reason why people say

04:06

that GPU servers are so expensive

04:10

remember people say GPS servers are so

04:13

expensive however the GPU server is no

04:17

longer the computer the computer is the

04:19

data center

04:20

your goal is to build the most cost

04:22

effective data center not build the most

04:24

cost effective server

04:26

back in the old days when the computer

04:28

was the server that would be a

04:30

reasonable thing to do but today the

04:32

computer is the data center so for 10

04:34

million dollars you buy 48 GPU servers

04:36

it only consumes 3.2 gigawatt hours and

04:41

44 times the performance

04:43

let me just show it to you one more time

04:46

this is before and this is after and

04:48

this is

04:53

we want dense computers not big ones we

04:57

want dense computers fast computers not

04:58

big ones let me show you something else

05:01

this is my favorite

05:03

if your goal if your goal is to get the

05:05

work done

05:07

and this is the work you want to get

05:08

done ISO work

05:10

okay this is ISO work all right look at

05:13

this

05:24

look at this look at this before

05:27

after you've heard me talk about this

05:29

for so many years

05:31

in fact every single time you saw me

05:33

I've been talking to you about

05:34

accelerated computing

05:35

and now

05:37

why is it that finally it's the Tipping

05:39

Point because we have now addressed so

05:42

many different domains of science so

05:44

many Industries and in data processing

05:47

in deep learning classical machine

05:49

learning

05:51

so many different ways for us to deploy

05:52

software from the cloud to Enterprise to

05:55

Super Computing to the edge

05:56

so many different configurations of gpus

05:58

from our hgx versions to our Omniverse

06:02

versions to our Cloud GPU and Graphics

06:04

version so many different versions now

06:06

the utilization is incredibly High

06:10

the utilization of Nvidia GPU is so high

06:13

almost every single cloud is

06:15

overextended almost every single data

06:18

center is overextended there are so many

06:19

different applications using it so we

06:22

have now reached the Tipping Point of

06:25

accelerated Computing we have now

06:27

reached the Tipping Point of generative

06:29

AI

06:30

people thought that gpus would just be

06:32

gpus they were completely wrong we

06:33

dedicated ourselves to Reinventing the

06:35

GPU so that it's incredibly good at

06:38

tensor processing and then all of the

06:40

algorithms and engines that sit on top

06:42

of these computers we call Nvidia AI the

06:45

only AI operating system in the world

06:47

that takes data processing from data

06:49

processing to training to optimization

06:52

to deployment and inference

06:54

end to end deep learning processing it

06:57

is the engine of AI today

06:59

we connected gpus to other gpus called

07:02

mvlink build one giant GPU and we

07:04

connected those gpus together using

07:07

infiniband into larger scale computers

07:09

the ability for us to drive the

07:11

processor and extend the scale of

07:14

computing

07:15

made it possible

07:17

for the AI research organization the

07:19

community to advance AI at an incredible

07:22

rate

07:23

so every two years we take giant leaps

07:26

forward and I'm expecting the next lead

07:28

to be giant as well

07:29

this is the new computer industry

07:33

software is no longer programmed just by

07:35

computer Engineers software is

07:37

programmed by computer Engineers working

07:39

with AI supercomputers these AI

07:42

supercomputers

07:43

are a new type of factory

07:46

it is very logical that a car industry

07:49

has factories they build things so you

07:50

can see cars

07:51

it is very logical that computer

07:54

industry has computer factories you

07:56

build things that you can see computers

07:59

in the future

08:01

every single major company will also

08:05

have ai factories

08:08

and you will build and produce your

08:11

company's intelligence

08:12

and it's a very sensible thing

08:15

we are intelligence producers already

08:18

it's just that the intelligence

08:19

producers the intelligence are people in

08:22

the future we will be intelligence

08:24

producers artificial intelligence

08:26

producers and every single company will

08:28

have factories and the factories will be

08:30

built this way

08:31

using accelerated Computing and

08:33

artificial intelligence we accelerated

08:34

computer Graphics by 1 000 times in five

08:37

years

08:38

Moore's Law is probably currently

08:40

running at about two times

08:42

a thousand times in five years a

08:45

thousand times in five years is one

08:47

million times in ten we're doing the

08:49

same thing in artificial intelligence

08:51

now question is what can you do when

08:53

your computer is one million times

08:55

faster

08:57

what would you do if your computer was

08:59

one million times faster well it turns

09:02

out that we can now apply the instrument

09:04

of our industry to so many different

09:07

fields that were impossible before

09:10

this is the reason why everybody is so

09:12

excited

09:14

there's no question that we're in a new

09:15

Computing era

09:17

there's just absolutely no question

09:18

about it every single Computing era you

09:20

could do different things that weren't

09:23

possible before and artificial

09:25

intelligence certainly qualifies this

09:27

particular Computing era is special in

09:30

several ways one

09:32

it is able to understand information of

09:35

more than just text and numbers it can

09:38

Now understand multi-modality which is

09:39

the reason why this Computing Revolution

09:41

can impact every industry

09:44

every industry two

09:46

because this computer

09:49

doesn't care how you program it

09:51

it will try to understand what you mean

09:53

because it has this incredible large

09:55

language model capability and so the

09:57

programming barrier is incredibly low we

10:00

have closed the digital divide

10:03

everyone is a programmer now you just

10:06

have to say something to the computer

10:08

third

10:10

this computer

10:12

not only is it able to do amazing things

10:14

for the for the future

10:16

it can do amazing things for every

10:19

single application of the previous era

10:22

which is the reason why all of these

10:24

apis are being connected into Windows

10:25

applications here and there in browsers

10:27

and PowerPoint and word every

10:29

application that exists will be better

10:31

because of AI

10:33

you don't have to just AI this

10:35

generation this Computing era does not

10:37

need

10:38

new applications it can succeed with old

10:41

applications and it's going to have new

10:43

applications

10:45

the rate of progress the rate of

10:47

progress because it's so easy to use

10:50

is the reason why it's growing so fast

10:52

this is going to touch literally every

10:54

single industry and at the core with

10:57

just as with every single Computing era

10:58

it needs a new Computing approach

11:01

the last several years I've been talking

11:03

to you about the new type of processor

11:05

we've been creating

11:06

and this is the reason we've been

11:08

creating it

11:09

ladies and gentlemen

11:11

Grace Hopper is now in full production

11:13

this is Grace Hopper

11:16

nearly 200 billion transistors in this

11:20

computer oh

11:23

foreign

11:29

look at this this is Grace Hopper

11:33

this this processor

11:36

this processor is really quite amazing

11:37

there are several characteristics about

11:39

it this is the world's first accelerated

11:41

processor

11:42

accelerated Computing processor that

11:44

also has a giant memory it has almost

11:47

600 gigabytes of memory that's coherent

11:50

between the CPU and the GPU and so the

11:52

GPU can reference the memory the CPU can

11:55

represent reference the memory and

11:57

unnecessary any unnecessary copying back

12:00

and forth could be avoided

12:02

the amazing amount of high-speed memory

12:05

lets the GPU work on very very large

12:07

data sets this is a computer this is not

12:10

a chip practically the Entire Computer

12:13

is on here all of the Lo this is uh uses

12:15

low power DDR memory just like your cell

12:18

phone except this has been optimized and

12:20

designed for high resilience data center

12:22

applications so let me show you what

12:24

we're going to do so the first thing is

12:26

of course we have the Grace Hopper

12:27

Superchip

12:28

put that into a computer the second

12:30

thing that we're going to do is we're

12:32

going to connect eight of these together

12:33

using ndlink this is an Envy link switch

12:36

so eight of this eight of this Connect

12:39

into three switch trays into eight eight

12:44

Grace Hopper pod

12:46

these eight Grace Hopper pods each one

12:49

of the grace Hoppers are connected to

12:51

the other Grace Hopper at 900 gigabytes

12:53

per second

12:54

Aid them connected together

12:56

as a pod and then we connect 32 of them

12:59

together

13:02

with another layer of switches

13:05

and in order to build in order to build

13:08

this

13:09

256 Grace Hopper Super Chips connected

13:12

into one exoflops one exaflops you know

13:18

that countries and Nations have been

13:20

working on exaflops Computing and just

13:23

recently achieved it

13:26

256 Grace Hoppers for deep learning is

13:28

one exaflop Transformer engine and it

13:30

gives us

13:32

144 terabytes of memory that every GPU

13:36

can see

13:38

this is not 144 terabytes distributed

13:41

this is 144 terabytes connected

13:45

why don't we take a look at what it

13:47

really looks like play please

13:57

foreign

14:04

[Applause]

14:11

this

14:13

is

14:14

150 miles of cables

14:17

fiber optic cables

14:19

2 000 fans

14:23

70

14:24

000 cubic feet per minute

14:27

it probably

14:28

recycles the air in this entire room in

14:31

a couple of minutes

14:34

forty thousand pounds

14:38

four elephants

14:43

one GPU

14:52

if I can get up on here this is actual

14:54

size

14:56

so this is this is our brand new

15:00

Grace Hopper AI supercomputer it is one

15:04

giant GPU

15:06

utterly incredible we're building it now

15:09

and we're so we're so excited that

15:11

Google Cloud meta and Microsoft will be

15:14

the first companies in the world to have

15:16

access

15:17

and they will be doing

15:18

exploratory research on the pioneering

15:21

front the boundaries of artificial

15:24

intelligence with us so this is the dgx

15:28

gh200 it is one giant GPU

15:33

okay I just talked about how we are

15:36

going to extend the frontier of AI

15:39

data centers all over the world and all

15:41

of them over the next decade will be

15:44

recycled

15:46

and re-engineered into accelerated data

15:49

centers and generative AI capable data

15:51

centers but there are so many different

15:52

applications in so many different areas

15:54

scientific computing

15:56

data processing cloud and video and

15:58

Graphics generative AI for Enterprise

16:01

and of course the edge each one of these

16:03

applications have different

16:04

configurations of servers

16:06

different focus of applications

16:08

different deployment methods and so

16:11

security is different operating system

16:13

is different how it's managed it's

16:14

different

16:15

well this is just an enormous number of

16:17

configurations and so today we're

16:19

announcing in partnership with so many

16:21

companies here in Taiwan the Nvidia mgx

16:24

it's an open modular server design

16:26

specification and the design for

16:28

Accelerated Computing most of the

16:30

servers today are designed for general

16:32

purpose Computing the mechanical thermal

16:35

and electrical is insufficient for a

16:38

very highly dense Computing system

16:40

accelerated computers take as you know

16:42

many servers and compress it into one

16:46

you save a lot of money you save a lot

16:48

of floor space but the architecture is

16:51

different and we designed it so that

16:53

it's multi-generation standardized so

16:55

that once you make an investment our

16:57

next generation gpus and Next Generation

16:58

CPUs and next generation dpus will

17:00

continue to easily configure into it so

17:02

that we can best time to Market and best

17:05

preservation of our investment different

17:07

data centers have different requirements

17:08

and we've made this modular and flexible

17:10

so that it could address all of these

17:12

different domains now this is the basic

17:14

chassis let's take a look at some of the

17:16

other things you can do with it this is

17:17

the Omniverse ovx server

17:19

it has x86 four l40s Bluefield three two

17:23

CX-7 six PCI Express Lots this is the

17:26

grace Omniverse server

17:29

Grace same for l40s BF3 Bluefield 3 and

17:33

2 cx-7s okay this is the grace Cloud

17:35

Graphics server

17:38

this is the hopper NV link generative AI

17:42

inference server

17:43

and of course Grace Hopper liquid cooled

17:46

okay for very dense servers and then

17:48

this one is our dense general purpose

17:51

Grace Superchip server this is just CPU

17:54

and has the ability to accommodate four

17:57

CPU four gray CPUs or two gray

18:00

Superchips enormous amounts of

18:03

performance in ISO performance Grace

18:05

only consumes 580 Watts for the whole

18:08

for the whole server versus the latest

18:11

generation CPU servers x86 servers 1090

18:14

Watts it's basically half the power at

18:17

the same performance or another way of

18:18

saying

18:19

you know at the same power if your data

18:22

center is power constrained you get

18:24

twice the performance most data centers

18:27

today are power limited and so this is

18:30

really a terrific capability

18:32

we're going to expand AI into a new

18:34

territory

18:35

if you look at the world's data centers

18:37

the data center is now the computer and

18:39

the network defines what that data

18:41

center does largely there are two types

18:43

of data centers today there's the data

18:46

center that's used for hyperscale where

18:49

you have application workloads of all

18:51

different kinds the number of CPUs you

18:53

the number of gpus you connect to it is

18:55

relatively low the number of tenants is

18:58

very high the workloads are Loosely

19:00

coupled

19:01

and you have another type of data center

19:03

they're like super Computing data

19:04

centers AI supercomputers where the

19:07

workloads are tightly coupled

19:10

the number of tenants far fewer and

19:13

sometimes just one

19:15

its purpose is high throughput on very

19:18

large Computing problems

19:21

and so super Computing centers and Ai

19:22

supercomputers and the world's cloud

19:24

hyperscale cloud are very different in

19:26

nature

19:28

the ability for ethernet to interconnect

19:30

components of almost from anywhere is

19:33

the reason why the world's internet was

19:35

created if it required too much

19:37

coordination how could we have built

19:39

today's internet so ethernet's profound

19:41

contribution it's this lossy capability

19:44

is resilient capability and because so

19:46

it basically can connect almost anything

19:48

together

19:49

however a super Computing data center

19:51

can't afford that you can't interconnect

19:53

random things together because that

19:55

billion dollar supercomputer the

19:57

difference between 95 percent

20:01

networking throughput achieved versus 50

20:04

is effectively 500 million dollars

20:07

now it's really really important to

20:09

realize that in a high performance

20:11

Computing application every single GPU

20:15

must finish their job so that the

20:18

application can move on

20:20

in many cases where you do all

20:22

reductions you have to wait until the

20:24

results of every single one so if one

20:26

node takes too long everybody gets held

20:28

back

20:29

the question is how do we introduce

20:33

a new type of ethernet that's of course

20:35

backwards compatible with everything but

20:37

it's engineered in a way that achieves

20:39

the type of capabilities that we that we

20:42

can bring AI workloads to the world's

20:45

any data center first

20:48

adaptive routing adaptive routing

20:50

basically says based on the traffic that

20:53

is going through your data center

20:54

depending on which one of the ports of

20:57

that switch is over congested it will

20:59

tell Bluefield 3 to send and will send

21:02

it to another Port Bluefield 3 on the

21:05

other end would reassemble it and

21:08

present the data to the GPU without any

21:12

CPU intervention second congestion

21:14

control congestion control it is

21:16

possible for a certain different ports

21:20

to become heavily congested in which

21:22

case each switch will see how the

21:25

network is performing and communicate to

21:27

the senders please don't send any more

21:30

data right away

21:32

because you're congesting the network

21:33

that congestion control requires

21:35

basically a overriding system which

21:38

includes software the switch working

21:40

with all of the endpoints to overall

21:43

manage the congestion or the traffic and

21:45

the throughput of the data center this

21:47

capability is going to increase

21:48

ethernet's overall performance

21:50

dramatically

21:51

now one of the things that very few

21:53

people realize

21:55

is that today there's only one software

21:58

stack that is Enterprise secure and

22:01

Enterprise grade

22:03

that software stack is CPU

22:06

and the reason for that is because in

22:08

order to be Enterprise grade it has to

22:11

be Enterprise secure and has to be

22:12

Enterprise managed and Enterprise

22:13

supported over 4 000 software packages

22:17

is what it takes for people to use

22:20

accelerated Computing today in data

22:22

processing and training and optimization

22:24

all the way to inference so for the very

22:26

first time we are taking all of that

22:28

software

22:29

and we're going to maintain it and

22:31

manage it like red hat does for Linux

22:34

Nvidia AI Enterprise will do it for all

22:37

of nvidia's libraries now Enterprise can

22:40

finally have an Enterprise grade and

22:42

Enterprise secure software stack this is

22:45

such a big deal otherwise

22:47

even though the promise of accelerated

22:49

Computing is possible for many

22:51

researchers and scientists is not

22:53

available for Enterprise companies and

22:55

so let's take a look at the benefit for

22:57

them this is a simple image processing

23:00

application if you were to do it on a

23:02

CPU versus on a GPU running on

23:04

Enterprise Nvidia AI Enterprise you're

23:06

getting

23:08

31.8 images per minute or basically 24

23:11

times the throughput or you only pay

23:14

five percent of the cost

23:17

this is really quite amazing this is the

23:19

benefit of accelerated Computing in the

23:21

cloud but for many companies Enterprises

23:23

is simply not possible unless you have

23:25

this stack

23:27

Nvidia AI Enterprise is now fully

23:30

integrated into AWS Google cloud and

23:33

Microsoft Azure or an oracle Cloud it is

23:35

also integrated into the world's machine

23:38

learning operations pipeline as I

23:40

mentioned before AI is a different type

23:42

of workload and this type of new type of

23:45

software this new type of software has a

23:46

whole new software industry and this

23:48

software industry 100 of them we have

23:50

now connected with Nvidia Enterprise

23:52

I told you several things I told you

23:54

that we are going through two

23:57

simultaneous Computing industry

23:59

transition accelerated Computing and

24:01

generative AI

24:03

two

24:04

this form of computing is not like the

24:07

traditional general purpose Computing it

24:09

is full stack

24:11

it is Data Center scale because the data

24:13

center is the computer and it is domain

24:16

specific for every domain that you want

24:18

to go into every industry you go into

24:19

you need to have the software stack and

24:22

if you have the software stack then the

24:24

utility the utilization of your machine

24:26

the utilization of your computer will be

24:28

high

24:29

so number two

24:30

it is full stack data scanner scale and

24:32

domain specific we are in full

24:34

production of the engine of generative

24:37

Ai and that is hgx h100 meanwhile

24:41

this engine that's going to be used for

24:43

AI factories will be scaled out using

24:46

Grace Hopper the engine that we created

24:49

for the era of generative AI we also

24:52

took Grace Hopper connected to 256 node

24:55

nvlink and created the largest GPU in

24:58

the world dgx

24:59

gh200

25:01

we're trying to extend generative Ai and

25:02

accelerated Computing in several

25:04

different directions at the same time

25:05

number one we would like to of course

25:07

extend it in the cloud

25:10

so that every cloud data center can be

25:12

an AI data center not just AI factories

25:15

and hyperscale but every hyperscale data

25:18

center can now be a generative AI Data

25:20

Center and the way we do that is the

25:21

Spectrum X it takes four components to

25:24

make Spectrum X possible the switch

25:27

the Bluefield 3 Nick the interconnects

25:30

themselves the cables are so important

25:31

in high speed high-speed Communications

25:33

and the software stack that goes on top

25:35

of it we would like to extend generative

25:37

AI to the world's Enterprise and there

25:40

are so many different configurations of

25:41

servers and the way we're doing that

25:42

with partnership with our Taiwanese

25:44

ecosystem the mgx modular accelerated

25:47

Computing systems we put Nvidia into

25:49

Cloud so that every Enterprise in the

25:52

world can engage us to create generative

25:55

AI models and deploy it in a Enterprise

25:58

grade Enterprise secure way in every

26:01

single Cloud I want to thank all of you

26:03

for your partnership over the years

26:04

thank you

26:05

[Applause]