Google Keynote (Google I/O ‘24)

Google
14 May 2024112:43

Summary

TLDR在谷歌I/O大会上,谷歌宣布了其在人工智能领域的一系列创新和进步。谷歌推出了名为Gemini的先进AI模型,该模型具备多模态功能,能够理解和生成文本、图像、视频和代码等。Gemini的1.5 Pro版本在长文本处理上取得了突破,能够处理高达100万个标记的文本,为开发者和用户带来了新的应用可能性。此外,谷歌还介绍了其在搜索、照片、工作空间和安卓系统中的应用案例,展示了AI如何在提高效率、增强用户体验和推动创新方面发挥作用。谷歌还强调了负责任的AI开发,包括提高模型安全性和防止滥用的措施。最后,谷歌透露了即将推出的新功能,如LearnLM教育模型,以及与教育专家合作,将AI技术应用于学习工具的计划。

Takeaways

  • 🚀 谷歌展示了其在人工智能领域的雄心,推出了名为Gemini的生成型AI,这将彻底改变我们的工作方式。
  • 📈 Gemini AI模型原生支持多模态输入,能够处理文本、图像、视频、代码等多种数据类型。
  • 📱 谷歌正在将其产品如Search、Photos、Workspace、Android等与Gemini集成,以提供更强大的用户体验。
  • 🔍 Gemini在谷歌搜索中的应用使得用户能够以全新的方式进行搜索,包括通过照片搜索和查询更复杂的问题。
  • 📈 Gemini 1.5 Pro模型在长文本处理上取得了突破,能够处理高达100万个标记的文本,为开发者和消费者提供了新的可能。
  • 🌟 谷歌介绍了新的Gemini 1.5 Flash模型,它是为了在保持多模态和长文本处理能力的同时,提供更快、更经济的解决方案。
  • 🎓 谷歌正在开发名为LearnLM的新型学习模型,旨在通过个性化和互动的学习体验来增强教育。
  • 🤖 谷歌提出了AI代理的概念,这些智能系统能够代表用户执行任务,例如购物、规划和组织信息。
  • 📹 谷歌展示了其在生成性视频领域的最新进展,推出了名为Veo的视频生成模型,能够从文本、图像和视频提示中创建高清视频。
  • 💡 Gemini的更新和新功能将在今年晚些时候推出,包括对开发者的新工具和改进,以及对消费者的新体验。
  • 🌐 谷歌强调了负责任的AI开发的重要性,包括通过水印技术(SynthID)来防止AI生成内容的滥用,并与合作伙伴共同推动数字媒体透明度标准的建立。

Q & A

  • 谷歌在人工智能领域的最新动态是什么?

    -谷歌推出了名为Gemini的生成性人工智能,它能够改变我们的工作方式,通过理解文本、图像、视频、代码等多种模态的输入,并将它们转换成任何形式的输出。

  • Gemini AI在谷歌搜索中的应用有哪些改进?

    -谷歌搜索通过Gemini AI提供了全新的搜索体验,包括AI概述,能够回答复杂问题,进行多步推理,并提供个性化的搜索结果页面。

  • Gemini 1.5 Pro模型的主要特点是什么?

    -Gemini 1.5 Pro模型能够处理长达100万个令牌的文本,这是迄今为止最大规模的基础模型,能够在生产中一致性地运行。

  • 谷歌如何通过AI提升其产品的用户体验?

    -谷歌通过将Gemini AI集成到其产品中,如搜索、照片、工作空间、Android等,提供了更加个性化和强大的功能,比如通过照片搜索回忆、自动化的邮件摘要和回复建议等。

  • 谷歌如何确保AI技术的负责任使用?

    -谷歌遵循其AI原则,通过红队测试、AI辅助的红队测试、与内部安全专家和独立专家的反馈,以及开发像SynthID这样的工具来提高模型的安全性和防止滥用。

  • Gemini AI如何帮助开发者和创业者?

    -谷歌提供了多种Gemini模型,包括1.5 Pro和1.5 Flash,以及即将推出的2百万令牌上下文窗口的私人预览,这些模型可以帮助开发者在全球范围内构建下一代AI应用程序。

  • 谷歌如何利用AI技术推动教育领域的创新?

    -谷歌推出了LearnLM模型家族,专门针对学习进行微调,以提供更个性化和吸引人的学习体验,并将这些模型集成到搜索、Android、Gemini和YouTube等产品中。

  • 谷歌在AI领域的基础设施投资有哪些?

    -谷歌投资了包括定制的张量处理单元(TPU)在内的世界领先的技术基础设施,以及优化的硬件和开放软件,以支持AI的进步。

  • 谷歌如何通过AI技术帮助用户更有效地规划旅行?

    -通过Gemini Advanced的旅行规划体验,用户可以上传航班和酒店信息,Gemini会利用这些数据创建一个动态的旅行选项图,并生成个性化的度假计划。

  • 谷歌的AI技术在提升无障碍功能方面有哪些进展?

    -谷歌正在改进TalkBack功能,利用Gemini Nano的多模态能力,为视障用户提供更丰富和清晰的照片描述,帮助他们更好地进行网上购物和日常导航。

  • 谷歌如何通过AI技术保护用户免受诈骗的侵害?

    -谷歌在Android上使用Gemini Nano模型,当检测到可疑活动,如银行要求转移资金以保证安全时,能够在设备上即时提醒用户,保护用户的隐私和安全。

Outlines

00:00

🚀 谷歌AI的新时代:Gemini的启动与影响

谷歌在人工智能领域的雄心壮志,推出了一款名为Gemini的生成型人工智能,它正在彻底改变我们的工作方式。从新起点到解决老问题的新方法,谷歌在这一年中经历了许多变革。谷歌CEO Sundar Pichai在Google I/O大会上欢迎开发者,并强调了AI在研究、产品和基础设施各层面的创新。Gemini作为多模态模型,能够处理文本、图像、视频和代码等多种输入,是AI领域的一大步。目前,超过150万开发者正在使用Gemini模型进行代码调试、洞察新知和构建AI应用的下一代。

05:02

🔍 Gemini在谷歌搜索中的革新性应用

谷歌搜索通过Gemini实现了生成性搜索体验,用户现在可以使用全新的方式进行搜索,包括提出更复杂的问题、使用照片搜索等。谷歌正在测试这种体验,并计划将其推广到更多国家。此外,谷歌照片(Google Photos)也通过Gemini进行了改进,使用户能够更轻松地搜索和整理照片。

10:05

📚 Gemini的多模态能力和长文本处理能力

Gemini的多模态特性使其能够理解不同类型的输入,并找到它们之间的联系。长文本处理能力则允许引入更多信息,如数百页的文本、数小时的音频或完整的视频。开发者已经利用这些特性进行了有趣的创新尝试,例如通过视频将书架上的书籍转换成可搜索数据库。

15:08

🌐 Gemini 1.5 Pro的全球推广和新特性

谷歌正在将改进版的Gemini 1.5 Pro推广给全球开发者,并为消费者提供Gemini Advanced,支持35种语言。此外,谷歌宣布将上下文窗口扩展到200万个标记,并为开发者提供私有预览。这标志着向无限上下文的终极目标迈出的下一步。

20:12

🤖 Gemini在谷歌工作空间(Google Workspace)中的应用

谷歌工作空间通过Gemini的集成,使得电子邮件搜索更加强大。例如,家长可以要求Gemini总结学校最近的电子邮件,它能够识别相关邮件并分析附件,提供关键点和行动项的摘要。此外,Gemini还能够为长时间的会议录音提供重点摘要,并且可以帮助起草回复邮件。

25:15

🎓 LearnLM:基于Gemini的教育模型家族

谷歌宣布了LearnLM,这是一系列基于Gemini并针对学习进行微调的模型。LearnLM基于教育研究,旨在使学习体验更加个性化和吸引人。这些模型将被集成到用户日常使用的产品中,如搜索、安卓、Gemini和YouTube。LearnLM还将与教育机构合作,以测试和改进模型的学习功能,并与教师合作开发更多有益的生成型AI工具。

30:16

📱 Android与Gemini的融合:智能手机的新纪元

谷歌正在将AI作为Android体验的核心,通过AI提升整个智能手机体验。Android是第一个内置了设备上的基座模型的移动操作系统,这使得体验更快,同时也保护了用户隐私。从今年晚些时候开始,Pixel手机将扩展Gemini Nano的功能,包括多模态能力。此外,谷歌还计划在Android上直接构建更多基于Gemini的AI功能,如改进的TalkBack无障碍功能和欺诈电话警告。

35:18

🤖 Gemini作为系统级AI助理在Android上的应用

在Android上,Gemini不仅仅是一个应用,它正在成为Android体验的基础部分。谷歌正在使Gemini具有上下文感知能力,使其能够预测用户的需求并提供及时的帮助。例如,用户可以直接在消息应用中使用Gemini创建图像或询问视频内容。这些改进将在未来几个月内推广到数亿设备上。

40:19

📈 Gemini 1.5系列:Pro和Flash模型的全球可用性

谷歌宣布了Gemini 1.5系列的两个模型,1.5 Pro和全新的1.5 Flash,它们都具有多模态能力,并且在全球200多个国家和地区可用。这些模型可以直接在AI Studio或Vertex AI中尝试。谷歌还为开发者提供了新的功能,如视频帧提取、并行函数调用和上下文缓存,以提高长文本处理的效率和可负担性。

45:21

🌟 Gemma:推动AI创新和责任的开放模型家族

Gemma是谷歌的开放模型家族,它基于与Gemini相同的研究和技术构建,提供高性能的7亿和20亿参数模型。自从推出以来,Gemma已被下载数百万次,并被开发者和研究人员用于各种定制应用。Gemma的新成员PaliGemma是首个视觉-语言模型,适用于图像描述、视觉问答等任务。此外,Gemma 2将于6月推出,包含27亿参数模型,专为下一代GPU和TPU优化。

50:23

🧪 负责任的AI:谷歌如何确保AI技术的安全性和益处

谷歌正在通过多种方式确保AI技术的安全性和益处。公司正在改进模型,使用红队测试和AI辅助的红队技术来识别弱点,并防止模型的滥用。谷歌还与内部安全专家和独立专家合作,以识别新的风险。此外,谷歌正在开发新工具,如SynthID,以防止AI生成的内容被用于传播虚假信息。公司还致力于通过开放源代码和与生态系统合作,帮助其他开发者负责任地构建AI。

55:32

🌐 谷歌AI的全球影响力和未来展望

谷歌的AI技术正在帮助全球的科学家、教育者和个人。例如,AlphaFold正在帮助190个国家的180万科学家研究被忽视的疾病,而Data Commons正在帮助组织跟踪联合国的17个可持续发展目标。谷歌的AI技术也在教育领域开辟了新的可能性,如为每个学生提供个性化的AI导师。谷歌正在与教育机构合作,将LearnLM集成到Google Classroom等产品中,以简化课程规划并满足学生的个性化需求。

Mindmap

Keywords

💡人工智能

人工智能(AI)是指由人造系统所表现出来的智能行为。在视频中,AI是核心主题,Google通过其产品如Gemini展示了AI在提升工作效率、个性化学习和内容生成等方面的应用。例如,Google I/O大会展示了如何利用AI来帮助解决复杂问题,提供个性化体验,并通过多模态交互提升用户体验。

💡多模态

多模态(Multimodal)是指系统能够处理并理解多种不同类型的输入,如文本、图像、音频和视频。在视频中,Google的Gemini模型被设计为原生多模态,能够理解不同格式的信息,并在它们之间找到联系,从而提供更为丰富和深入的交互体验。

💡长文本处理

长文本处理是指系统能够处理和理解长篇幅的文本信息。视频中提到Gemini 1.5 Pro模型能够处理高达100万个token的文本,这使得它能够理解和生成更加复杂和深入的内容,如长篇文章、报告或代码库。

💡个性化

个性化是指根据个体的特定需求或偏好来定制服务或产品。视频中Google提到了如何通过AI技术提供个性化体验,例如通过Gemini Advanced模型,用户可以上传自己的文件,AI将基于这些内容提供定制化的反馈和建议。

💡实时信息

实时信息是指系统能够即时获取和处理当前发生的事件或数据。在视频中,Google Search利用实时信息为用户即时提供最新的搜索结果和概览,这得益于Google强大的实时数据处理能力。

💡安全性

安全性涉及到保护数据免受未授权访问和滥用。视频中提到Google在开发AI模型时,考虑了安全性,通过技术如SynthID水印来防止AI生成的内容被用于误导或不安全的目的。

💡可访问性

可访问性是指技术或产品能够被不同能力和背景的用户所使用。视频中提到了如何通过AI技术提高产品的可访问性,例如通过TalkBack功能为视障用户提供更好的手机使用体验。

💡自然语言处理

自然语言处理(NLP)是AI的一个分支,它使计算机能够理解、解释和生成人类语言。在视频中,Google的AI技术能够理解和生成自然语言,从而提供更加自然和人性化的交互体验。

💡机器学习

机器学习是AI的一种方法,它使系统能够从数据中学习并改进其性能。视频中提到的Gemini模型就是通过机器学习训练而来,能够执行各种复杂的任务,如图像和视频生成。

💡教育

教育是视频中讨论的AI应用领域之一。Google介绍了LearnLM模型,这是专门为教育和学习定制的AI模型,旨在提供个性化的学习体验,并帮助学生和教育者更有效地进行教学和学习。

💡隐私保护

隐私保护是指在处理个人数据时保护用户隐私不被泄露。视频中Google强调了在开发AI技术时对隐私的重视,例如通过在设备上运行的AI模型来保护用户数据的隐私。

Highlights

谷歌在人工智能领域的雄心壮志,推出Gemini,一种全新的生成型人工智能,正在彻底改变我们的工作方式。

谷歌展示了过去一年在AI领域的进步,包括新起点、创新思维和解决老问题的全新方案。

谷歌CEO Sundar Pichai在Google I/O大会上欢迎全球开发者,并强调了AI在产品和基础设施各层面的创新。

Gemini模型作为原生多模态模型,能够理解文本、图像、视频、代码等多种输入,并转换为任何输出。

Gemini 1.5 Pro模型在长文本处理上取得突破,能够处理高达100万个标记的文本,超越了以往任何大规模基础模型。

超过150万开发者正在使用Gemini模型进行代码调试、获取新见解和构建下一代AI应用。

谷歌搜索通过Gemini实现了新的搜索生成体验,用户可以使用照片进行搜索,并得到网络上最佳结果。

谷歌照片利用Gemini使搜索个人照片变得更加容易,能够识别常用车辆并提供详细信息。

谷歌Workspace利用Gemini增强了电子邮件搜索功能,能够总结学校邮件并分析附件,提供关键点和行动项。

谷歌演示了Notebook LM使用Gemini 1.5 Pro创建的音频概览,为学生提供个性化的学习体验。

谷歌展示了如何通过AI代理来简化复杂的任务,如购物和搬家,通过智能系统代表用户执行多步骤操作。

DeepMind的Demis Hassabis讨论了公司在创建通用AI助手方面的进展,这些助手能够理解和响应我们的复杂和动态世界。

谷歌宣布了新的Imagen 3图像生成模型,它能够根据文本提示生成高分辨率、真实感强的图像。

Veo是谷歌最新的视频生成模型,能够从文本、图像和视频提示中创建1080P高清视频。

谷歌展示了其在自然语言理解和计算机视觉方面的最新进展,这些技术将被整合到下一代谷歌搜索中。

谷歌正在开发LearnLM,一系列基于Gemini并针对学习进行微调的模型,旨在使学习体验更加个性化和吸引人。

Transcripts

00:00

[Cheers and Applause]. >>WOMAN: Google’s ambitions in 

00:01

artificial intelligence. >>MAN: Google launches Gemini, 

00:03

the generative AI. >> And it's completely changing 

00:06

the way we work. >> You know, a lot has happened 

00:09

in a year. There have been new beginnings. 

00:15

We found new ways to find new Ways to find new ideas. 

00:20

And new solutions to age-old problems. >> Sorry about your shirt. 

00:27

We dreamt of things -- >> Never too old for a 

00:30

treehouse. >> We trained for things. 

00:32

>> All right! Let’s go go go!

00:34

>> And learned about this thing. We found new paths, took the 

00:41

next step, and made the big leap. Cannon ball! 

00:52

We filled days like they were weeks. 

00:54

And more happened in months, than has happened in years. 

00:59

>> Hey, free eggs. >> Things got bigger,  

01:08

like waaay bigger. 

01:12

And it wasn’t all just for him, or for her. 

01:18

It was for everyone.

01:24

And you know what? 

01:27

We’re just getting started.

01:47

>>SUNDAR PICHAI:  Hi, everyone. Good morning. 

01:56

[Cheers and Applause]. welcome to Google I/O. 

01:57

It's great to have all of you with us. We have a few thousand 

02:00

developers with us here today at Shoreline. 

02:03

Millions more are joining virtually around the world. 

02:06

Thanks to everyone for being here. 

02:09

For those of you who haven’t seen I/O before, it’s basically 

02:13

Google’s version of the Eras Tour, but with fewer costume 

02:18

changes. [Laughter]. 

02:20

At Google, though, we are fully in our Gemini era. Before we get into it, I want to 

02:28

reflect on this moment we’re in. We’ve been investing in AI for 

02:33

more than a decade, and innovating  at every layer of the stack: 

02:38

Research, product, infrastructure We’re going to talk about it all today. 

02:43

Still, we are in the early  days of the AI platform shift. 

02:47

We see so much opportunity ahead for creators,  for developers, for startups, for everyone. 

02:56

Helping to drive those opportunities  is what our Gemini era is all about. 

03:01

So let’s get started. 

03:03

A year ago on this stage, we first shared our plans for 

03:06

Gemini, a frontier model built to be natively multimodal from 

03:11

the very beginning, that could reason across text, images, 

03:16

video, code, and more. It’s a big step in turning any 

03:20

input into any output. An I/O for a new generation. 

03:26

Since then we introduced the first Gemini models, our most 

03:29

capable yet. They demonstrated 

03:32

state-of-the-art performance on every multimodal benchmark. 

03:35

And that was just the beginning. Two months later, we introduced 

03:40

Gemini 1.5 Pro, delivering a big breakthrough in long context. 

03:46

It can run 1 million tokens in production, consistently. 

03:49

More than any other large-scale foundation model yet. 

03:53

We want everyone to benefit from what Gemini can do, so we’ve 

03:57

worked quickly to share these advances with all of you. 

04:01

Today, more than 1.5 million developers use Gemini models 

04:06

across our tools. You’re using it to debug code, 

04:10

get new insights, and build the next generation of AI 

04:13

applications. We’ve also been bringing 

04:17

Gemini’s breakthrough capabilities across our products 

04:20

in powerful ways. We’ll show examples today across 

04:24

Search, Photos, Workspace, Android and more. 

04:28

Today, all of our 2-billion user products use Gemini. 

04:32

And we’ve introduced new experiences, too, including on 

04:36

Mobile, where people can interact with Gemini directly 

04:39

through the app. Now available on Android and 

04:43

iOS. And through Gemini Advanced, 

04:46

which provides access to our most capable models. 

04:49

Over 1 million people have signed up to try it, in just 

04:52

three months. And it continues to show strong 

04:55

momentum. One of the most exciting 

04:58

transformations with Gemini has been in Google Search. 

05:02

In the past year, we’ve answered billions of queries as part of 

05:06

our Search Generative Experience. 

05:08

People are using it to Search in entirely new ways. 

05:12

And asking new types of questions, longer and more 

05:15

complex queries, even searching with photos, and getting back 

05:20

the best the web has to offer. We’ve been testing this 

05:24

experience outside of Labs, and we’re encouraged to see not only 

05:28

an increase in Search usage, but also an increase in user 

05:32

satisfaction. I’m excited to announce that 

05:35

we’ll begin launching this fully revamped experience, AI 

05:39

Overviews, to everyone in the U.S. this week. 

05:42

And we’ll bring it to more countries soon.

05:51

[Cheers and Applause]. There’s so much innovation 

05:53

happening in Search. Thanks to Gemini we can create 

05:57

much more powerful search experiences, including within 

06:00

our products. Let me show you an example in 

06:03

Google Photos. We launched Google Photos almost 

06:06

nine years ago. Since then, people have used it 

06:09

to organize their most important memories. 

06:12

Today that amounts to more than 6 billion photos and videos 

06:16

uploaded every single day. And people love using Photos to 

06:21

search across their life. With Gemini, we’re making that a 

06:24

whole lot easier. Say you’re at a parking station 

06:28

ready to pay, but you can’t recall your license plate 

06:31

number. Before, you could search Photos 

06:33

for keywords and then scroll through years’ worth of photos, 

06:37

looking for the right one. Now, you can simply ask Photos. 

06:43

It knows the cars that appear often, it triangulates which one 

06:46

is yours, and just tells you  the license plate number.

06:55

[Cheers and Applause]. And Ask Photos can help you 

06:57

search your memories in a deeper way. 

07:00

For example, you might be reminiscing about your daughter 

07:03

Lucia’s early milestones. You can ask photos, when did Lucia learn to swim? 

07:09

And you can follow up with up with something more complex. 

07:13

Show me how Lucia's swimming has progressed. Here, Gemini goes beyond a 

07:19

simple search, recognizing different contexts from doing 

07:23

laps in the pool, to snorkeling in the ocean, to the text and 

07:27

dates on her swimming certificates. 

07:29

And Photos packages it all up together in a summary, so you 

07:33

can really take it all in, and relive amazing memories all over 

07:37

again. We’re rolling out Ask Photos 

07:40

this summer, with more capabilities to come.

07:50

[Cheers and Applause]. Unlocking knowledge across 

07:51

formats is why we built Gemini to be multimodal from the ground 

07:54

up. It’s one model, with all the 

07:57

modalities built in. So not only does it understand 

08:00

each type of input, it finds connections between them. 

08:04

Multimodality radically expands the questions we can ask, and 

08:08

the answers we will get back. Long context takes this a step 

08:12

further, enabling us to bring in even more information, hundreds 

08:17

of pages of text, hours of audio, a full hour of video, or 

08:21

entire code repos. Or, if you want, roughly 96 

08:26

Cheesecake Factory menus. [Laughter]. 

08:29

For that many menus, you’d need a one million token context 

08:32

window, now possible with Gemini 1.5 Pro. 

08:36

Developers have been using it in super interesting ways. 

08:39

Let’s take a look. >> I remember the announcement, 

08:53

the 1 million token context window, and my first reaction 

08:57

was there's no way they were able to achieve this. 

08:59

>> I wanted to test its technical skills, so I uploaded 

09:04

a line chart. It was temperatures between like 

09:09

Tokyo and Berlin and how they were across the 12 months of the 

09:11

year. >> So  

09:12

I got in there and I threw in the Python library that was 

09:16

really struggling with and I just asked it a simple question. 

09:21

And it nailed it. It could find specific 

09:26

references to comments in the code and specific requests that 

09:30

people had made and other issues that people had had, but then 

09:34

suggest a fix for it that related to what I was working 

09:38

on. >> I immediately tried to kind 

09:41

of crash it. So I took, you know, four or 

09:44

five research papers I had on my desktop, and it's a mind-blowing 

09:48

experience when you add so much text, and then you see the kind 

09:52

of amount of tokens you add is not even at half the capacity. 

09:55

>> It felt a little bit like Christmas because you saw things 

09:59

kind of peppered up to the top of your feed about, like, oh, 

10:01

wow, I built this thing, or oh, it's doing this, and I would 

10:05

have never expected. >> Can I shoot a video of my 

10:07

possessions and turn that into a searchable database? 

10:11

So I ran to my bookshelf, and I shot video just panning my 

10:14

camera along the bookshelf and I fed the video into the model. 

10:18

It gave me the titles and authors of the books, even 

10:21

though the authors weren't visible on those book spines, 

10:24

and on the bookshelf there was a squirrel nut cracker sat in 

10:27

front of the book, truncating the title. 

10:29

You could just see the word "sightsee", and it still guessed 

10:32

the correct book. The range of things you can do 

10:33

with that is almost unlimited. >> So at that point for me was 

10:36

just like a click, like, this is it. 

10:39

I thought, like, I had like  a super power in my hands. 

10:41

>> It was poetry. It was beautiful. 

10:43

I was so happy! This is going to be amazing! 

10:48

This is going to help people! >> This is kind of where the 

10:50

future of language models are going. 

10:52

Personalized to you, not because you trained it to be personal to 

10:58

you, but personal to you because you can give it such a vast 

11:02

understanding of who you are. [Applause]. 

11:11

>>SUNDAR PICHAI: We’ve been rolling out Gemini 1.5 Pro with 

11:14

long context in preview over the last few months. 

11:17

We’ve made a series of quality improvements across translation, 

11:21

coding, and reasoning. You’ll see these updates 

11:24

reflected in the model starting today. 

11:27

I'm excited to announce that we’re bringing this improved 

11:29

version of Gemini 1.5 Pro to all developers globally.

11:41

[Cheers and Applause]. In addition, today Gemini 1.5 

11:44

Pro with 1 million context is now directly  available for consumers in Gemini Advanced,  

11:50

and can be used across 35 languages. One million tokens is opening up 

11:56

entirely new possibilities. It’s exciting, but I think we 

12:01

can push ourselves even further. So today, we are expanding the 

12:05

context window to 2 million Tokens.

12:15

[Cheers and Applause]. We are making it available  

12:16

for developers in private preview. It's amazing to look back and 

12:20

see just how much progress we've made in a few months. 

12:24

This represents the next step on our journey  towards the ultimate goal of infinite context. 

12:30

Okay. So far,  

12:31

we’ve talked about two technical advances: 

12:33

multimodality and long context. Each is powerful on its own. 

12:39

But together, they unlock deeper capabilities, and more 

12:42

intelligence. Let’s see how this comes to life 

12:46

with Google Workspace. People are always searching 

12:49

their emails in Gmail. We are working to make it much 

12:52

more powerful with Gemini. Let’s look at how. 

12:56

As a parent, you want to know everything that’s going on with 

13:00

your child’s school. Okay, maybe not everything, but 

13:04

you want to stay informed. Gemini can help you keep up. 

13:08

Now we can ask Gemini to summarize all recent emails from 

13:12

the school. In the background, it’s 

13:15

identifying relevant emails, and even analyzing attachments, like 

13:19

PDFs. And you get a summary of  

13:21

the key points and action items. So helpful. 

13:25

Maybe you were traveling this week and couldn’t make the PTA 

13:28

meeting. The recording of the meeting is 

13:31

an hour long. If it’s from Google Meet, you 

13:34

can ask Gemini to give you the highlights.

13:43

[Cheers and Applause]. There’s a parents group looking 

13:44

for volunteers, and you’re free that day. 

13:47

So of course, Gemini can draft a reply. 

13:50

There are countless other examples of how this can make 

13:52

life easier. Gemini 1.5 Pro is available 

13:56

today in Workspace Labs. Aparna will share more later on.

14:06

[Applause]. We just looked at an example with text outputs. 

14:14

But with a multimodal model, we can do so much more. 

14:17

To show you an early demo of an audio output in NotebookLM, 

14:22

here’s Josh. >>JOSH WOODWARD: Hi, everyone! 

14:32

Last year, at I/O, we introduced Notebook LM, a research and 

14:37

writing tool grounded in the information you give it. 

14:40

Since then, we've seen a lot of momentum with students and 

14:44

teachers using it. And today, Gemini 1.5 Pro is 

14:48

coming to Notebook LM, and it's great. 

14:51

Let me show you. So here we are in Notebook LM. 

14:55

You can load it up with all of the materials here on the left. 

14:59

In this notebook, I've been using it with my younger son,  

15:02

and I've added some of his science worksheets,  a few slide decks from the teacher, and even an  

15:08

open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook  

15:16

guide with a helpful summary and can generate  a study guide, an FAQ, or even quizzes. 

15:23

But for my son Jimmy, he really learns  best when he can listen to something. 

15:29

So we've prototyped a new feature with  Gemini, and it's called Audio Overviews. 

15:35

Notebook LM is going to use all of the materials on the left  

15:39

as input and output them into a lively  science discussion, personalized for him. 

15:46

Let's take a listen. >> So let's dive into physics. 

15:49

What's on deck for today? >> Well, we're starting with the 

15:52

basics. Force in motion. 

15:53

>> Okay. >> And that, of course, means we 

15:55

have to talk about Sir Isaac Newton and his three laws of 

15:57

motion. >> Ah,  

15:58

yes, the foundation -- For understanding how objects move and interact.

16:02

>>JOSH WOODWARD: Oh, yes, this is  where multimodal really shines. 

16:06

Now, it generated this audio  discussion based on that text material. 

16:11

And what's amazing is that my son  and I can join into the conversation. 

16:17

And steer it whichever direction we want. 

16:20

When I tap "join." >> Hold on. 

16:23

We have a question. What's up, Josh? 

16:26

>>JOSH WOODWARD: Yeah, can you give my son Jimmy a basketball 

16:29

example? >> Hey, Jimmy! 

16:35

That's a fantastic idea! Basketball is actually a great 

16:38

way to visualize force in motion. 

16:40

Let's break it down. >> Okay. 

16:41

So first, imagine a basketball just sitting there on the court. 

16:45

It's not moving, right? 

16:46

That's because all the forces acting on it are balanced. 

16:49

The downward pull of gravity -- >>JOSH WOODWARD: Pretty cool, right?

16:58

[Cheers and Applause]. I gotta say, the first  

17:00

time my son heard this, you should  have seen how big his eyes got! 

17:04

Because he was gripped. They were talking to him. 

17:07

He was learning science through  the example of basketball. 

17:11

His favorite sport. Now, what's interesting  

17:13

is under the hood, you saw that Gemini had used  some of the concepts of gravity, Sir Isaac Newton,  

17:19

but nothing in there was about basketball. It connected the dots and created that  

17:24

age-appropriate example for him. And this is what's becoming  

17:29

possible with the power of Gemini. You can give it lots of information in  

17:34

any format, and it can be transformed in a way  that's personalized and interactive for you. 

17:42

Back to you, Sundar. [Applause]. 

17:50

>>SUNDAR PICHAI: Thanks, Josh. The demo shows the real 

17:52

opportunity with multimodality. Soon you’ll be able to mix and 

17:56

match inputs and outputs. This is what we mean when we say 

17:59

it’s an I/O for a new generation. 

18:02

And I can see you all out there thinking about the 

18:05

possibilities. But what if we could go even 

18:07

further? That’s one of the opportunities 

18:10

we see with AI agents. Let me take a step back and 

18:13

explain what I mean by that. I think about them as 

18:17

intelligent systems that show reasoning, planning, and memory. 

18:21

Are able to “think” multiple steps ahead, work across 

18:25

software and systems, all to get something done on your behalf, 

18:30

and most importantly, under your supervision. 

18:33

We are still in the early days, and you’ll see glimpses of our 

18:37

approach throughout the day, but let me show you the kinds of use 

18:41

cases we are working hard to solve. Let’s talk about shopping. 

18:46

It’s pretty fun to shop for shoes, and a lot less fun to 

18:50

return them when they don’t fit. Imagine if Gemini could do all 

18:54

the steps for you: Searching your inbox for the receipt, 

18:59

locating the order number from your email, filling out a return 

19:03

form, and even scheduling a pickup. That's much easier, right?

19:10

[Applause]. Let’s take another example 

19:14

that’s a bit more complex. Say you just moved to Chicago. 

19:18

You can imagine Gemini and Chrome working together to help 

19:22

you do a number of things to get ready: Organizing, reasoning, 

19:27

synthesizing on your behalf. For example, you’ll want to 

19:30

explore the city and find services nearby, from 

19:33

dry-cleaners to dog-walkers. You will have to update your new  

19:37

address across dozens of Web sites. Gemini can work across these 

19:42

tasks and will prompt you for more information when needed, so 

19:46

you are always in control. That part is really important. 

19:49

as we prototype these experiences. We are thinking hard about how to do it in a way  

19:55

that's private, secure and works for everyone. These are simple-use cases, but 

20:01

they give you a good sense of the types of problems we want to 

20:04

solve, by building intelligent systems that think ahead, 

20:08

reason, and plan, all on your behalf. 

20:11

The power of Gemini, with multimodality, long context and 

20:16

agents, brings us closer to our ultimate goal: Making AI helpful 

20:22

for everyone. We see this as how we will make  

20:25

the most progress against our mission. Organizing the world’s 

20:29

information across every input, making it accessible via any 

20:34

output, and combining the world’s information with the 

20:37

information in your world in a way that’s truly useful for you. 

20:42

To fully realize the benefits of AI, we will continue to break 

20:46

new ground. Google DeepMind is hard at work 

20:50

on this. To share more, please welcome, 

20:52

for the first time on the I/O stage, Sir Demis.

20:58

[Applause]. >>DEMIS HASSABIS:  

21:10

Thanks, Sundar. 

21:11

It's so great to be here. Ever since I was a kid, playing 

21:16

chess for the England Junior Team, I’ve been thinking about 

21:19

the nature of intelligence. I was captivated by the idea of 

21:23

a computer that could think like a person. 

21:26

It’s ultimately why I became a programmer and studied 

21:29

neuroscience. I co-founded DeepMind in 2010 

21:33

with the goal of one day building AGI: Artificial general 

21:37

intelligence, a system that has human-level cognitive 

21:41

capabilities. I’ve always believed that if we 

21:44

could build this technology responsibly, its impact would be 

21:48

truly profound and it could benefit humanity in incredible 

21:51

ways. Last year,  

21:54

we reached a milestone on that path when we  formed Google DeepMind, combining AI talent  

21:58

from across the company in to one super unit. Since then, we've built AI systems that can  

22:04

do an amazing range of things, from turning  language and vision into action for robots,  

22:10

navigating complex virtual environments, involving  Olympiad level math problems, and even discovering  

22:18

thousands of new materials. Just last week, we announced  

22:22

our next generation AlphaFold model. It can predict the structure and interactions  

22:27

of nearly all of life's molecules, including how  proteins interact with strands of DNA and RNA. 

22:34

This will accelerate vitally important  biological and medical research from  

22:38

disease understanding to drug discovery. And all of this was made possible with the  

22:44

best infrastructure for the AI era, including  our highly optimized tensor processing units. 

22:51

At the center of our efforts is our Gemini model. It's built up from the ground up to be natively  

22:57

multimodal because that's how we interact  with and understand the world around us. 

23:02

We've built a variety of  models for different use cases. 

23:05

We've seen how powerful Gemini 1.5 Pro is,  but we also know from user feedback that some 

23:11

applications need lower latency and a lower cost to serve. 

23:16

So today we’re introducing Gemini 1.5 Flash.

23:21

[Cheers and Applause]. Flash is a lighter-weight model 

23:30

compared to Pro. It’s designed to be fast and 

23:33

cost-efficient to serve at scale, while still featuring 

23:36

multimodal reasoning capabilities and breakthrough 

23:38

long context. Flash is optimized for tasks 

23:42

where low latency and efficiency matter most. 

23:45

Starting today, you can use 1.5 Flash and 1.5 Pro with up to one 

23:50

million tokens in Google AI Studio and Vertex AI. 

23:54

And developers can sign up to try two million tokens. 

23:58

We’re so excited to see what all of you will create with it. 

24:02

And you'll hear a little more  about Flash later on from Josh. 

24:07

We’re very excited by the progress we’ve made so far with 

24:09

our family of Gemini models. But we’re always striving to 

24:12

push the state-of-the-art even further. 

24:16

At any one time we have many different models in training. 

24:19

And we use our very large and powerful ones to help teach and 

24:22

train our production-ready models. 

24:26

Together with user feedback, this cutting-edge research will 

24:28

help us to build amazing new products for billions of people. 

24:33

For example, in December, we shared a glimpse into the future 

24:37

of how people would interact with multimodal AI, and how this 

24:41

would end up powering a new set of transformative experiences. 

24:46

Today, we have some exciting new progress to share about the 

24:49

future of AI assistants that we’re calling Project Astra.

24:58

[Cheers and Applause]. For a long time, we’ve wanted to 

25:00

build a universal AI agent that can be truly helpful in everyday 

25:04

life. Our work making this vision a 

25:06

reality goes back many years. It's why we made Gemini multimodal  

25:10

from the very beginning. An agent like this has to 

25:14

understand and respond to our complex and dynamic world just 

25:17

like we do. It would need to take in and 

25:20

remember what it sees so it can understand context and take 

25:23

action. And it would have to be 

25:25

proactive, teachable and personal, so you can talk to it 

25:28

naturally, without lag or delay. While we’ve made great strides 

25:33

in developing AI systems that can understand multimodal 

25:36

information, getting response time down to something 

25:39

conversational is a difficult engineering challenge. 

25:42

Building on our Gemini model, we’ve developed agents that can 

25:45

process information faster by continuously encoding video 

25:49

frames, combining the video and speech input into a timeline of 

25:53

events, and caching this for efficient recall. 

25:56

We’ve also enhanced how they sound, with a wider range of 

26:00

intonations. These agents better understand 

26:03

the context you’re in, and can respond quickly in conversation, 

26:06

making the pace and quality of interaction feel much more 

26:09

natural. Here’s a video of our prototype, 

26:13

which you’ll see has two parts. Each part was captured in a 

26:17

single take, in real time. >> Okay. Let's do some tests. 

26:24

Tell me when you see something that makes sound. 

26:28

>> I see a speaker, which makes sound. 

26:31

>> What is that part of the speaker called? 

26:36

>> That is the tweeter. It produces high frequency 

26:40

sounds. >> Give me a creative 

26:45

alliteration about these. >> Creative crayons color 

26:50

cheerfully. They certainly craft colorful 

26:53

creations. >> What does that part of the 

27:00

code do? >> This code defines encryption 

27:05

and decryption functions. It seems to use AES-CBC 

27:10

encryption to encode and decode data based on a key and an 

27:14

initialization vector (IV). >> That's right. 

27:21

What neighborhood do you think I'm in? 

27:24

>> This appears to be the Kings Cross area of London. 

27:28

It is known for its railway station and transportation 

27:31

connections. >> Do you remember where you saw 

27:35

my glasses? >> Yes, I do. 

27:39

Your glasses were on the desk near a red apple.

27:54

[Applause]. >> What can I add here to make 

27:56

this system faster? >>  

28:00

Adding a cache between the server and database could 

28:02

improve speed. >> What does this remind you of? 

28:09

>> Schroedinger's cat. >> All right. 

28:15

Give me a band name for thisduo. 

28:19

>> Golden Stripes. >> Nice. Thanks, Gemini.

28:28

[Applause]. >>DEMIS HASSABIS:  

28:36

I think you'll agree it's amazing to see how 

28:38

far AI has come, especially when it comes to spatial 

28:42

understanding, video processing and memory. 

28:45

It’s easy to envisage a future where you can have an expert 

28:49

assistant by your side through your phone or new exciting form 

28:52

factors like glasses. Some of these agent capabilities 

28:56

will come to Google products like the Gemini app later this 

28:59

year. For those of you onsite today, 

29:02

you can try out a live demo version of this experience in 

29:05

the AI Sandbox area. [Cheers and Applause]. 

29:14

Next, let’s take a look at how our innovations are helping 

29:17

people bring new creative ideas to life. 

29:20

Today, we’re introducing a series of updates across our 

29:23

generative media tools with new models covering image, music and 

29:28

video. Over the past year, we’ve been 

29:31

enhancing quality, improving  safety and increasing access. 

29:35

To help tell this story, here’s Doug. 

29:49

[Applause]. >>DOUG ECK: Thanks, Demis. 

29:51

Over the past few months, we’ve been working hard 

29:54

to build a new image generation model from the ground up, with 

29:58

stronger evaluations, extensive red teaming, and 

30:01

state-of-the-art watermarking with SynthID. 

30:05

Today, I’m so excited to introduce Imagen 3. 

30:09

It’s our most capable image generation model yet. 

30:13

Imagen 3 is more photorealistic. 

30:15

You can literally count the whiskers on its snout. 

30:19

With richer details, like the incredible sunlight in this 

30:22

shot, and fewer visual artifacts or distorted images. 

30:27

It understands prompts written the way people write. 

30:30

The more creative and detailed you are, the better. 

30:33

And Imagen 3 remembers to incorporate small details like 

30:37

the ‘wildflowers’ or ‘a small blue bird’ in this longer 

30:40

prompt. Plus, this is our best model yet 

30:43

for rendering text, which has been a challenge for image 

30:46

generation models. In side-by-side comparisons, 

30:50

independent evaluators preferred Imagen 3 over other  

30:54

popular image generation models. In sum, Imagen 3 is our 

30:58

highest-quality image generation model so far. 

31:01

You can sign up today to try Imagen 3 in ImageFX, part of our 

31:05

suite of AI tools at labs.Google, and it will be 

31:08

coming soon to developers and enterprise customers in Vertex 

31:11

AI. Another area, full of creative 

31:15

possibility, is generative music. 

31:19

I’ve been working in this space for over 20 years and this has 

31:22

by far the most exciting year of my career. We’re exploring ways of working 

31:26

with artists to expand their creativity with AI. 

31:30

Together with YouTube, we’ve been building Music AI Sandbox, 

31:34

a suite of professional music AI tools that can create new  

31:37

instrumental sections from scratch,  transfer styles between tracks, and more. 

31:43

To help us design and test them, we’ve been working closely with 

31:45

incredible musicians, songwriters and producers. 

31:50

Some of them made even entirely new  songs in ways that would not have been  

31:53

possible without these tools. Let’s hear from some of the 

31:57

artists we’ve been working with. >>  

32:04

I'm going to put this right back into the Music AI tool. 

32:07

The same Boom, boom, bam, boom, boom. 

32:10

What happens if Haiti meets Brazil? 

32:13

Dude, I have no clue what's about to be sprat out. 

32:16

This is what excites me. Da da See see see. 

32:23

As a hip hop producer, we dug in the crates. 

32:26

We playin’ these vinyls, and the part where there's no vocal, we 

32:29

pull it, we sample it, and we create an entire song around 

32:33

that. So right now we digging in the 

32:35

infinite crate. It’s endless. 

32:37

Where I found the AI really useful for me, this  way to like fill in the sparser sort of elements  

32:43

of my loops. Okay. 

32:44

Let's try bongos. We're going to putviola. 

32:47

We're going to put rhythmic clapping, and we're going to see 

32:51

what happens there. Oh, and it makes it sound, 

32:55

ironically, at the end of the day, a little more human. 

32:57

So then this is entirely Google's loops right here. 

33:01

These are Gloops. So it's like having, like, this 

33:07

weird friend that's just like, 

33:09

try this, try that. And then you're like, Oh, okay. 

33:12

Yeah. No, that's pretty dope. 

33:20

(indistinct noises) >> The tools are capable of 

33:22

speeding up the process of what's in my head, getting it 

33:25

out. You're able to move lightspeed 

33:28

with your creativity. This is amazing. 

33:31

That right there. [Applause]. 

33:40

>>DEMIS HASSABIS: I think this really shows what’s possible 

33:42

when we work with the artist community on the future of 

33:45

music. You can find some brand new 

33:48

songs from these acclaimed artists and songwriters on their 

33:50

YouTube channels now. There's one more area I'm  

33:54

really excited to share with you. Our teams have made some 

33:57

incredible progress in generative video. 

34:01

Today, I’m excited to announce our newest, most capable 

34:04

generative video model, called Veo. 

34:12

[Cheers and Applause]. Veo creates high-quality, 1080P 

34:14

videos from text, image and video prompts. 

34:18

It can capture the details of your instructions in different 

34:20

visual and cinematic styles. You can prompt for things like 

34:24

aerial shots of a landscape or a time lapse, and further edit 

34:27

your videos using additional prompts. 

34:30

You can use Veo in our new experimental tool called 

34:32

VideoFX. We’re exploring features like 

34:36

storyboarding and generating longer scenes. 

34:39

Veo gives you unprecedented creative control. 

34:44

Techniques for generating static images have come a long way. 

34:47

But generating video is a different challenge altogether. 

34:51

Not only is it important to understand where an object or 

34:54

subject should be in space, it needs to maintain this 

34:57

consistency over time, just like the car in this video. 

35:02

Veo builds upon years of our pioneering generative video 

35:05

model work, including GQN, Phenaki, Walt, VideoPoet, 

35:10

Lumiere and much more. We combined the best of these 

35:14

architectures and techniques to improve consistency, quality and 

35:18

output resolution. To see what Veo can do, we put 

35:22

it in the hands of an amazing filmmaker. 

35:25

Let’s take a look. >>DONALD GLOVER: Well, I've been 

35:30

interested in AI for a couple of years now. 

35:33

We got in contact with some of the people at Google, and they 

35:35

had been working on something of their own. 

35:38

So we're all meeting here at Gilga Farms to make a short 

35:42

film. >>KORY MATHEWSON: The core 

35:43

technology is Google Deep Mind’s 

35:45

generative video model that has been trained to convert input 

35:49

text into output video. [Laughter]. 

35:53

>>DONALD GLOVER: It looks good. >>KORY MATHEWSON: We are able to 

35:55

bring ideas to life that were otherwise not possible. 

35:58

We can visualize things of time scale that’s 10 or 100 times 

36:02

faster than before. >>MATTHIEU KIM LORRAIN: When 

36:03

you're shooting, you can't really iterate as much as you 

36:05

wish. And so we've been hearing the 

36:07

feedback that it allows for more optionality, more iteration, 

36:12

more improvisation. >>DONALD GLOVER: But that's 

36:14

what's cool about it. It's like you can make a mistake 

36:16

faster. That's all you really want at 

36:17

the end of the day, at least in art, is just to make mistakes 

36:20

fast. >>KORY MATHEWSON: So, using 

36:21

Gemini’s multimodal capabilities to optimize the model’s training 

36:25

process, VEO is better able to capture the nuance from prompts. 

36:29

So this includes cinematic techniques and visual effects, 

36:32

giving you total creative 

36:34

control. >>DONALD GLOVER: Everybody's 

36:36

going to become a director and everybody should be a director. 

36:39

Because at the heart of all of this is just storytelling. 

36:42

The closer we are to being able to tell each other our stories, 

36:46

the more we will understand each other. 

36:48

>>KORY MATHEWSON: These models are really enabling us to be 

36:50

more creative and to share that creativity with each other.

36:57

[Cheers and Applause]. >>DEMIS HASSABIS:  

37:09

Over the coming weeks some of these 

37:11

features will be available to select creators through VideoFX 

37:15

at labs.google, and the waitlist is open now. 

37:19

Of course, these advances in generative video go beyond the 

37:22

beautiful visuals you’ve seen today. 

37:24

By teaching future AI models how to solve problems creatively, or 

37:29

in effect simulate the physics of our world, we can build more 

37:32

useful systems that help people communicate in new ways, and 

37:36

thereby advance the frontiers of AI. 

37:40

When we first began this journey  to build AI more than 15 years ago,  

37:44

we knew that one day it would change everything. 

37:47

Now that time is here. And we continue to be amazed by 

37:51

the progress we see and inspired by the advances still to come, 

37:55

on the path to AGI. Thanks, and back to you Sundar.

38:07

[Applause]. >>SUNDAR PICHAI: Thanks, Demis. 

38:09

A huge amount of innovation is happening at Google DeepMind. 

38:12

it’s amazing how much progress we have made in a year. 

38:15

Training state-of-the-art models requires a lot of computing 

38:19

power. Industry demand for ML compute 

38:22

has grown by a factor of 1 million in the last six years. 

38:27

And every year, it increases tenfold. 

38:30

Google was built for this. For 25 years, we’ve invested in 

38:34

world-class technical infrastructure, from the 

38:38

cutting-edge hardware that powers Search, to our custom 

38:41

tensor processing units that power our AI advances. 

38:45

Gemini was trained and served entirely on our fourth and fifth 

38:49

generation TPUs. And other leading AI companies, 

38:52

like Anthropic, have trained their models on TPUs as well. 

38:56

Today, we are excited to announce the  sixth generation of TPUs called Trillium.

39:07

[Cheers and Applause]. Trillium delivers a 4.7x 

39:10

improvement in compute performance per chip over the 

39:13

previous generation. So our most efficient and performant TPU to date. 

39:18

We will make trillium available to  our cloud customers in late 2024. 

39:24

Alongside our TPUs, we are proud to offer  CPUs and GPUs to support any workload. 

39:30

That includes the new Axion processes we  announced last month, our first custom  

39:35

on-base CPU with industry-leading  performance and energy efficiency. 

39:44

We are also proud to be one of the first  cloud providers to offer Nvidia's cutting edge 

39:50

Blackwell GPUs, available in early 2025.

39:57

[Applause]. We’re fortunate to have a 

39:59

longstanding partnership with Nvidia, and are excited to bring 

40:03

Blackwell's capabilities to our customers. Chips are a foundational part of 

40:07

our integrated end-to-end system, from 

40:10

performance-optimized hardware and open software to flexible 

40:15

consumption models. This all comes together in our 

40:18

AI Hypercomputer, a groundbreaking supercomputer 

40:22

architecture. Businesses and developers are 

40:25

using it to tackle more complex challenges, with more than twice 

40:29

the efficiency relative to just buying the raw hardware and 

40:33

chips. Our AI Hypercomputer 

40:36

advancements are made possible in part because of our approach 

40:40

to liquid cooling in our data centers. 

40:43

We’ve been doing this for nearly a decade, long before it became 

40:46

state of the art for the industry. 

40:48

And today, our total deployed fleet capacity for liquid 

40:52

cooling systems is nearly 1 Giga Watt, and growing. 

40:56

That’s close to 70 times the capacity of any other fleet.

41:02

[Applause]. Underlying this is the sheer 

41:07

scale of our network, which connects our infrastructure 

41:10

globally. Our network spans more than 2 

41:13

million miles of terrestrial and subsea fiber: Over 10 times the 

41:18

reach of the next leading cloud provider. 

41:20

We will keep making the investments necessary to advance 

41:24

AI innovation and deliver state-of-the-art capabilities. 

41:28

And one of our greatest areas of investment and innovation is in 

41:32

our founding product, Search. 25 years ago we created Search 

41:37

to help people make sense of the waves of information moving 

41:41

online. With each platform shift, we’ve 

41:44

delivered breakthroughs to help answer your questions better. 

41:48

On mobile, we unlocked new types of questions and answers, using 

41:52

better context, location awareness, and real-time 

41:55

information. With advances in natural 

41:58

language understanding and computer vision, we enabled new 

42:01

ways to search with your voice, or a hum to find your new 

42:06

favorite song, or an image of that flower you saw on your 

42:10

walk. And now you can even circle to 

42:13

Search those cool new shoes you might want to buy. 

42:17

Go for it, you can always return them later! 

42:23

Of course, Search in the Gemini Era will take this to a whole 

42:26

new level. Combining our infrastructure 

42:28

strengths, the latest AI capabilities, our high bar for 

42:33

information quality, and our decades of experience connecting 

42:36

you to the richness of the web. The result is a product that 

42:40

does the work for you. Google Search is generative AI 

42:45

at the scale of human curiosity. And it’s our most exciting 

42:49

chapter of Search yet. To tell you more, here’s Liz.

42:57

[Applause]. >>LIZ REID:  

43:05

Thanks, Sundar! With each of these platform 

43:08

shifts, we haven’t just adapted, we’ve expanded what’s possible 

43:14

with Google Search. And now, with generative AI, 

43:18

Search will do more for you than you ever imagined. 

43:21

So whatever’s on your mind, and whatever you need to get done, 

43:26

just ask. And Google will do the Googling 

43:29

for you. All the advancements you’ll see 

43:32

today are made possible by a new Gemini model, customized for 

43:36

Google Search. What really sets this apart is 

43:40

our three unique strengths. First, our real-time information 

43:45

with over a trillion facts about people, places, and things. 

43:50

Second, our unparalleled ranking and quality systems, trusted for 

43:55

decades to get you the very best of the web. 

43:58

And third, the power of Gemini, which unlocks new agentive 

44:02

capabilities, right in Search. By bringing these three things 

44:07

all together, we are able to dramatically expand  what's possible with Google Search, yet again. 

44:13

This is Search in the Gemini era. 

44:16

So let's dig in. You've heard today about AI 

44:20

Overviews, and how helpful people are finding them. 

44:23

With AI Overviews, Google does the work for you. 

44:27

Instead of piecing together all the information yourself, you 

44:30

can ask your question, and as you see  here, you can get an answer instantly. 

44:36

Complete with a range of perspectives and links to dive 

44:40

deeper. As Sundar shared, AI Overviews 

44:44

will begin rolling out to everyone in the U.S. starting 

44:46

today, with more countries soon. By the end of the year, AI 

44:51

Overviews will come to over a billion people in Google Search. 

44:56

But this is just the first step. We’re making AI Overviews even 

45:01

more helpful for your most complex questions, the type that 

45:04

are really more like ten questions in one! 

45:07

You can ask your entire question, with all its 

45:10

sub-questions, and get an AI overview in just seconds. 

45:15

To make this possible, we’re introducing multi-step reasoning 

45:18

in Google Search. So Google can do the researching 

45:21

for you. For example, let’s say you’ve 

45:25

been trying to get into yoga and Pilates. 

45:28

Finding the right studio can take a lot of research. 

45:31

There are so many factors to consider! 

45:34

Soon you’ll be able to ask Search to: Find the best yoga or 

45:37

Pilates studios in Boston. And show you details on their 

45:40

intro offers, and walking time from Beacon Hill. 

45:45

As you can see here, Google gets to work for you, finding the 

45:48

most relevant information and bringing it together in your AI 

45:52

Overview. You get some studios with great 

45:55

ratings and their intro offers. You can see the distance for 

45:58

each, like this one is just a ten-minute walk away! 

46:03

Right below, you see where they're  located, laid out visually. 

46:07

And you've got all this from just a single search! Under the hood, our custom 

46:13

Gemini model acts as your AI agent, using what we call 

46:17

multi-step reasoning. It breaks your bigger question 

46:21

down into all its parts, and it figures out which problems it  

46:24

needs to solve and in what order. And thanks to our real-time info 

46:29

and ranking expertise, it reasons using the 

46:33

highest-quality information out there. 

46:37

So since you're asking about places, it taps into Google's 

46:40

index of information about the real world, with over 250 

46:44

million places, and updated in real-time. Including their ratings, 

46:49

reviews, business hours, and more. 

46:54

Research that might have taken you minutes or even hours, 

46:58

Google can now do on your behalf in just seconds. Next, let me show you another 

47:04

way multi-step reasoning in Google Search can make your life 

47:07

that much easier. Take planning, for example. 

47:10

Dreaming up trips and meal plans can be fun, but doing the work 

47:14

of actually figuring it all out, no, thank you. 

47:18

With Gemini in Search, Google does the planning with you. 

47:22

Planning is really hard for AI to get right. 

47:25

It's the type of problem that takes advanced reasoning and 

47:27

logic. After all, if you're meal 

47:30

planning, you probably don’t want mac'n cheese for breakfast, 

47:33

lunch and dinner. Okay, my kids might. 

47:38

But say you’re looking for a bit more variety. 

47:42

Now, you can ask Search to: Create a three-day meal plan for 

47:45

a group that’s easy to prepare. And here you get a plan with a 

47:49

wide range of recipes from across the web. 

47:52

This one for overnight oats looks particularly interesting. 

47:56

And you can easily head over to the Web site to learn how to prepare them. 

48:01

If you want to get more veggies in, you can simply ask Search to 

48:04

swap in a vegetarian dish. And just like that, Search  

48:08

customizes your meal plan. And you can export your meal 

48:11

plan or get the ingredients as a list, just by tapping here. 

48:16

Looking ahead, you could imagine asking Google to add everything 

48:20

to your preferred shopping cart. Then, we’re really cooking! 

48:25

These planning capabilities mean Search will be able to help plan 

48:28

everything from meals and trips to parties, dates, workout 

48:32

routines and more. So you can get all the fun of  

48:35

planning without any of the hassle. You’ve seen how Google Search 

48:41

can help with increasingly complex questions and planning. 

48:45

But what about all those times when you  don't know exactly what to ask and you  

48:49

need some help brain storming? When you come to Search for 

48:52

ideas, you’ll get more than an AI-generated answer. 

48:55

You’ll get an entire AI-organized page, custom-built 

48:59

for you and your question. Say you’re heading to Dallas to 

49:04

celebrate your anniversary and you're looking for the perfect restaurant. 

49:09

What you get here breaks AI out of the box and it brings it to the whole page. 

49:14

Our Gemini model uncovers the most interesting angles for you 

49:17

to explore and organizations these  results into these helpful clusters. 

49:23

Like, you might have never considered restaurants with live 

49:25

music. Or ones with historic charm! 

49:29

Our model even uses contextual factors, like the time of year. 

49:34

So since it’s warm in Dallas, you can get rooftop patios as an idea. 

49:39

And it pulls everything together into a dynamic, whole-page 

49:42

experience. You’ll start to see this new 

49:45

AI-organized search results page when you look for inspiration, 

49:50

starting with dining and recipes, and coming to movies, 

49:53

music, books, hotels, shopping, and more.

50:05

[Applause]. 

50:06

Today, you’ve seen how you can bring any question to Search, 

50:09

and Google takes the work out of searching. 

50:12

But your questions aren’t limited to words in a text box, 

50:15

and sometimes, even a picture can’t tell the whole story. 

50:19

Earlier, Demis showed you our latest advancements in video 

50:22

understanding. 

50:24

And I'm really excited to share that soon  you'll be able to ask questions with video,  

50:28

right in Google Search. Let me introduce Rose to show 

50:32

you this in a live demo. [Applause].

50:41

>>ROSE YAO: Thank you, Liz! I have always wanted a record player,  

50:46

and I got this one, and some  vinyls at a yard sale recently. 

50:50

But, umm, when I go to play a it, this thing keeps sliding off. 

50:54

I have no idea how to fix  it or where to even start! 

50:58

Before, I would have pieced together a bunch of searches to 

51:02

try to figure this out, like, what make is this record player? 

51:06

What’s the model? And, what is this thing actually 

51:08

called? But now I can just ask with a 

51:12

video. So let's try it. 

51:14

Let's do a live demo. I'm going to take a video and ask Google,  

51:20

why will this not stay in place? And in a near instant,  

51:26

Google gives me an AI overview. I get some reasons this might be 

51:30

happening, and steps I can take to troubleshoot. 

51:33

So it looks like first, this is called a tone arm. Very helpful. 

51:38

And it looks like it may be unbalanced,  and there's some really helpful steps here. 

51:42

And I love that because I'm new to all this. I can check out this helpful link from Audio  

51:47

Technica to learn even more. So that was pretty quick!

51:53

[Applause]. 

51:58

Let me walk you through what just happened. Thanks to a combination of our 

52:02

state-of-the-art speech models, our deep visual understanding, 

52:06

and our custom Gemini model, Search was able to understand 

52:10

the question I asked out loud and break down the video 

52:12

frame-by-frame. Each frame was fed into Gemini’s 

52:16

long context window that you heard about earlier today. 

52:19

Search could then pinpoint the exact make and model of my 

52:23

record player. And make sense of the motion 

52:26

across frames to identify that the tonearm was drifting. 

52:29

Search fanned out and combed the web to find relevant insights 

52:33

from articles, forums, videos, and more. 

52:36

And it stitched all of this together into my AI Overview. 

52:41

The result was music to my ears! Back to you, Liz. 

52:52

[Applause]. >>LIZ REID: Everything you saw 

52:53

today is just a glimpse of how we're reimagining Google Search 

52:57

in the Gemini era. We’re taking the very best of 

53:01

what makes Google, Google. All the reasons why billions of 

53:05

people turn to Google Search, and have relied on us for 

53:08

decades. And we’re bringing in the power 

53:11

of Gemini’s agentive capabilities. 

53:14

So Google will do the searching, the Researching. 

53:17

The planning. The brainstorming. 

53:20

And so much more. All you need to do, is ask. 

53:26

You'll start to see these features rolling out in Search 

53:29

in the coming weeks. Opt in to Search Labs to be 

53:32

among the first to try them out. Now let's take a look at how 

53:36

this all comes together in Google Search this year. 

53:43

>> Why is the lever not moving all the way?

54:48

[Applause].

54:51

>>APARNA PAPPU:  

55:31

Since last May, we've been hard at  work making Gemini for workspace  

55:36

even more helpful for businesses  and consumers across the world. 

55:42

Tens of thousands of customers have been using help me write, 

55:46

help me visualize and help me organize since we launched. 

55:50

And now, we're really excited that the new Gemini powered side 

55:55

panel will be generally available next month.

56:04

[Cheers and Applause]. One of our customers is a local 

56:06

favorite right here in California, Sports Basement. 

56:11

They rolled out Gemini for Workspace to the organization. 

56:14

And this has helped improve the productivity of  their customer support team by more than 30%. 

56:22

Customers love how Gemini grows  participation in meetings with  

56:27

automatic language detection and real-time  captions now expanding to 68 languages.

56:36

[Applause]. We are really excited about what 

56:42

Gemini 1.5 Pro unlocks for Workspace and AI Premium 

56:48

customers. Let me start by showing you 

56:51

three new capabilities coming to Gmail mobile. 

56:57

This is my Gmail account. Okay. 

57:00

So there's an E-mail up top from my husband. Help me sort out the roof repair thing, please. 

57:06

Now, we've been trying to find a  contractor to fix our roof, and with  

57:10

work travel, I have clearly dropped the ball. It looks like there's an E-mail thread on this  

57:16

with lots of E-mails that I haven't read. And luckily for me, I can simply tap the  

57:22

summarize option up top and skip  reading this long back and forth. 

57:28

Now, Gemini pulls up this helpful  mobile card as an overlay. 

57:32

And this is where I can read a nice summary of  all the salient information that I need to know. 

57:40

So here I see that we have a quote from Jeff at Green 

57:43

Roofing, and he's ready to start. Now, I know we had other bids  

57:48

and I don't remember the details. Previously, I would have had to do  

57:52

a number of searches in G-mail and then remember  and compare information across different E-mails. 

57:59

Now, I can simply type out my question right here  in the mobile card and say something like, compare  

58:05

my roof repair bids by price and availability. This new Q&A feature makes it so easy to get  

58:12

quick answers on anything in my inbox. For example, when are my shoes arriving,  

58:16

or what time do doors open for the  Knicks game, without having to first  

58:20

search G-mail and open an E-mail and look for the  specific information in attachments and so on. 

58:26

Anyway, back to my roof. It looks like Gemini has found details that I got  

58:30

from two other contractors in completely different  E-mail threads, and I have this really nicely  

58:36

organized summary and I can do a quick comparison. So it seems like Jeff's quote was right  

58:42

in the middle and he can start  immediately, so Green Roofing it is. 

58:46

I'll open that last E-mail from  Jeff and confirm the project. 

58:51

And look at that. I see some suggested replies from Gemini. 

58:56

Now, what is really, really neat about this  evolution of smart reply is that it's contextual. 

59:02

Gemini understood the back-and-forth in that  thread and that Jeff was ready to start. 

59:07

So offers me a few customize  options based on that context. 

59:11

So, you know, here I see I have decline  the service, suggest a new time. 

59:16

I'll choose proceed and confirm time. I can even see a preview of the  

59:20

full reply simply by long pressing. This looks reasonable, so I'll hit send. 

59:28

These new capabilities in Gemini and G-mail  will start rolling out this month to labs users.

59:38

[Applause]. Okay. 

59:42

So one of the really neat things about WorkSpace apps like 

59:45

G-mail, Drive, Docs, Calendar, is how well they work together, 

59:50

and in our daily lives we often have information that flows from 

59:53

one app to another. Like, say, adding a calendar entry from G-mail. 

59:58

Or creating reminders from a spreadsheet tracker. 

60:02

But what if Gemini could make these journeys totally seamless? 

60:07

Perhaps even automate them for you entirely. 

60:12

Let me show you what I mean with a real life example. 

60:17

My sister is a self-employed photographer, and her in box is 

60:22

full of appointment bookings, receipts,  client feedback on photos and so much more. 

60:28

Now, if you're a freelancer or a small  business, you really want to focus on your  

60:32

craft and not on bookkeeping and logistics. So let's go to her in box and take a look. 

60:40

Lots of unread E-mails. Let's click on the first one. 

60:45

It's got a PDF attachment. From a hotel, there's a receipt. 

60:49

And I see a suggestion in the side panel. Help me organize and track my receipts. 

60:54

Let's click on this prompt. The side panel now will show  

60:58

me more details about what that really means,  and as you can see, there's two steps here. 

61:03

Step one, create a Drive folder and put this  receipt and 37 others it's found into that folder. 

61:10

Makes sense. Step 2,  

61:13

extract the relevant information from those  receipts in that folder into a new spreadsheet. 

61:18

Now, this sounds useful. Why not? 

61:21

I also have the option to edit  these actions or just hit okay. 

61:25

So let's hit okay. Gemini will now complete the two 

61:30

steps described above, and this  is where it gets even better. 

61:34

Gemini offers the option to automate this  so that this particular work flow is run on  

61:40

all future E-mails, keeping your Drive folder and  expense sheet up to date with no effort from you.

61:51

[Applause]. Now, we know that creating 

61:56

complex spread sheets can be daunting for most people. 

61:59

But with this automation, Gemini does the hard work of extracting 

62:03

all the right information from all the files from  in that folder and generates the sheet for you. 

62:08

So let's take a look. Okay. 

62:10

It's super well organized, and it  even has a category for expense type. 

62:16

Now, we have this sheet. Things can get even more fun. 

62:20

We can ask Gemini questions. Questions like, show me where the money is spent. 

62:26

Gemini not only analyzes the data from the  sheet, but also creates a nice visual to  

62:33

help me see the complete breakdown by category. And you can imagine how this extends to all sorts  

62:39

of use cases in your in box, like travel expenses,  shopping, remodeling projects, you name it. 

62:46

All of that information in G-mail can be put to  good use and help you work, plan and play better. 

62:54

Now, this particular -- [Applause]. 

62:57

I know! 

63:02

This particular ability to organize your  attachments in Drive and generate a sheet  

63:06

and do data analysis via Q&A will be  rolling out to Labs users this September. 

63:12

And it's just one of the many automations  that we're working on in WorkSpace. 

63:18

Workspace in the Gemini era will continue to unlock new ways of 

63:22

getting things done. We’re building advanced agentive 

63:26

experiences, including customizing how you use Gemini. 

63:32

Now, as we look to 2025 and beyond, we're exploring  

63:36

entirely new ways of working with AI. Now, with Gemini, you have an AI-powered  

63:42

assistant always at your side. But what if you could expand how 

63:46

you interact with AI? For example, when we work with 

63:50

other people, we mention them in comments  and docs, so we send them E-mail. 

63:55

We have group chats with them, et cetera. And it's not just how we collaborate with  

64:00

each other, but we each have a  specific role to play in the team. 

64:04

And as the team works together, we build a  set of collective experiences and contexts  

64:09

to learn from each other. We have the combined set of 

64:14

skills to draw from when we need help. So how could we introduce AI into this mix  

64:21

and build on this shared expertise? Well, here’s one way. 

64:26

We are prototyping a virtual Gemini powered teammate. 

64:32

This teammate has an identity and a Workspace account, along 

64:36

with a specific role and objective. 

64:40

Let me bring Tony up to show you what I mean. Hey, Tony! 

64:44

>>TONY VINCENT: Hi, Aparna! Hey, everyone. 

64:50

Okay. So let me start by showing you 

64:52

how we set up this virtual teammate. 

64:55

As you can see, the teammate has its very own account. 

64:58

And we can go ahead and give it a name. We'll do something fun like Chip. 

65:04

Chip’s been given a specific And set of descriptions on how to be helpful  

65:08

for the team, you can see that here, and some  of the jobs are to monitor and track projects,  

65:13

we've listed a few out, to organize information  and provide contexts, and a few more things. 

65:19

Now that we've configured our virtual teammate,  let's go ahead and see Chip in action. 

65:22

To do that I'll switch us  over here to Google chat. 

65:26

First, when planning for an event like I/O, we  have a ton of chat rooms for various purposes. 

65:31

Luckily for me, chip is in all of them. To quickly catch up, I might ask a question like,  

65:38

anyone know if our I/O storyboards are approved? Because we’ve instructed Chip to 

65:49

track this project, Chip searches across all the conversations  

65:53

and knows to respond with an answer. There it is. 

65:56

Simple, but very helpful. Now, as the team adds Chip to more  

66:01

group chats, more files, more E-mail threads, Chip  builds a collective memory of our work together. 

66:07

Let's look at an example. To show you I'll switch over to a different room. 

66:10

How about Project Sapphire over here and here we  are discussing a product release coming up and  

66:16

as usual, many pieces are still in flight, so I  can go ahead and ask, are we on track for launch? 

66:27

Chip gets to work not only searching  through everything it has access to,  

66:31

but also synthesizing what's found and  coming back with an up-to-date response. 

66:37

There it is. A clear time line, a nice summary and  

66:40

notice even in this first message here, Chip flags  a potential issue the team should be aware of. 

66:46

Because we're in a group space, everyone can  follow along, anyone can jump in at any time,  

66:52

as you see someone just did. Asking Chip to help create a  

66:55

doc to help address the issue. A task like this could take me 

67:00

hours, dozens of hours. Chip can get it all done in just a few minutes,  

67:04

sending the doc over right when it's ready. And so much of this practical helpfulness  

67:09

comes from how we've customized Chip to our team's  needs, and how seamlessly this AI is integrated  

67:15

directly into where we're already working. Back to you, Aparna.

67:18

>>APARNA PAPPU: Thank you, Tony! I can imagine a number of 

67:31

different types of virtual teammates configured by 

67:34

businesses to help them do what they need. Now, we have a lot of work to do to figure out how  

67:39

to bring these agentive experiences like virtual  teammates into WorkSpace, including enabling third  

67:46

parties to make their very own versions of Chip. We're excited about where this is headed,  

67:52

so stay tuned. And as Gemini and its capabilities continue  

67:56

to evolve, we're diligently bringing that power  directly into WorkSpace to make all our users more  

68:03

productive and creative, both at home and at work. And now, over to Sissie to tell you more about  

68:11

Gemini app. [Applause]. 

68:25

>>SISSIE HSIAO: Our vision for the Gemini app is to be the most 

68:29

helpful, personal AI assistant by giving you direct access to 

68:33

Google’s latest AI models. Gemini can help you learn, 

68:38

create, code, and anything else you can imagine. 

68:43

And over the past year, Gemini has put Google’s AI in the hands 

68:47

of millions of people, with experiences designed for your 

68:51

phone and the web. We also launched Gemini 

68:55

Advanced, our premium subscription for access to the 

68:58

latest AI innovations from Google. 

69:01

Today, we’ll show you how Gemini is delivering our most 

69:04

intelligent AI experience. Let’s start with the Gemini app, 

69:09

which is redefining how we interact with AI. 

69:13

It’s natively multimodal, so you can use text, voice or your 

69:18

phone’s camera to express yourself naturally. 

69:21

And this summer, you can have an in-depth conversation with 

69:25

Gemini using your voice. We’re calling this new 

69:28

experience "Live". Using Google’s latest speech 

69:32

models, Gemini can better understand you and answer 

69:36

naturally. You can even interrupt while 

69:39

Gemini is responding, and it will adapt to your speech 

69:42

patterns. And this is just the beginning. 

69:45

We're excited to bring the speed games  and video understanding capabilities  

69:50

from Project Astra to the Gemini app. When you go live, you'll be able to  

69:56

open your camera so Gemini can see what you see  and respond to your surroundings in real-time. 

70:04

Now, the way I use Gemini  isn't the way you use Gemini. 

70:08

So we're rolling out a new feature that  lets you customize it for your own needs. 

70:12

And create personal experts on any topic you want. We're calling these "Gems."

70:28

[Applause]. 

70:29

They're really simple to set up. Just tap to create a gem, write your instructions  

70:33

once, and come back whenever you need it. For example, here's a gem that I created  

70:39

that acts as a personal writing coach. It specializes in short stories with  

70:44

mysterious twists, and it even builds  on the story drafts in my Google drive. 

70:50

I call it the cliff hanger curator. Now, gems are a great time saver when  

70:54

you have specific ways that you want to  interact with Gemini again and again. 

71:00

Gems will roll out in the coming months, and our trusted testers 

71:03

are already finding so many creative ways to put them to 

71:07

use. They can act as your yoga 

71:09

bestie, your personal sous chef, a brainy calculus tutor, a peer 

71:14

reviewer for your code, and so much more. 

71:18

Next, I’ll show you how Gemini is taking a step closer to being 

71:22

a true AI assistant by planning and taking action for you. 

71:27

We all know chatbots can give you ideas for your next 

71:31

vacation. But there’s a lot more that goes 

71:33

into planning a great trip. It requires reasoning that 

71:37

considers space-time logistics, and the intelligence to 

71:41

prioritize and make decisions. That reasoning and intelligence 

71:46

all comes together in the new trip planning experience in 

71:49

Gemini Advanced. Now, it all starts with a prompt. 

71:53

Okay. So here we go. 

71:55

We’re going to Miami. My son loves art, my husband 

71:59

loves seafood, and our flight and hotel details are already in 

72:03

my Gmail inbox. Now there’s a lot going on in 

72:07

that prompt. Everyone has their own things 

72:09

that they want to do. To make sense of those 

72:12

variables, Gemini starts by gathering all kinds of 

72:16

information from Search, and helpful extensions like Maps and 

72:20

G-mail. It uses that data to create a 

72:23

dynamic graph of possible travel options, taking into account all 

72:28

my priorities and constraints. The end result is a personalized 

72:33

vacation plan, presented in Gemini’s new dynamic UI. 

72:38

Now, based on my flight information, Gemini knows that I 

72:41

need a two and a half day itinerary. 

72:44

And you can see how Gemini uses spatial data to make decisions. 

72:49

Our flight lands in the late afternoon, so Gemini skips a big 

72:53

activity that day, and finds a highly rated seafood restaurant 

72:57

close to our hotel. Now, on Sunday, we have a jam-packed day. 

73:02

I like these recommendations, but my family likes to sleep in. 

73:06

So I tap to change the start time,  and just like that, Gemini adjusted my  

73:13

itinerary for the rest of the trip. It moved our walking tour to the 

73:18

next day and added lunch options near the street art museum to 

73:22

make the most of our Sunday afternoon. 

73:24

This looks great! It would have taken me hours of 

73:28

work, checking multiple sources, figuring out schedules, and 

73:32

Gemini did this in a fraction of the time. 

73:36

This new trip-planning experience will be rolling out 

73:38

to Gemini Advanced this summer, just in time to help you plan your  

73:42

own Labor Day weekend. [Applause]. 

73:51

All right. We saved the best for last. 

73:55

You heard Sundar say earlier that starting today, Gemini 

73:58

Advanced subscribers get access to Gemini 1.5 Pro, with one 

74:03

million tokens. That is the longest context 

74:07

window of any chatbot in the world.

74:16

[Cheers and Applause]. It unlocks incredible new 

74:18

potential in AI, so you can tackle complex problems that 

74:22

were previously unimaginable. You can upload a PDF up to 1,500 

74:28

pages long, or multiple files to get insights across a project. 

74:34

And soon, you can upload as much as 30,000  lines of code or even an hour-long video. 

74:41

Gemini Advanced is the only chatbot that lets you process 

74:44

this amount of information. Now, just imagine how useful 

74:48

this will be for students. Let’s say you’ve spent months on 

74:52

your thesis, and you could  really use a fresh perspective. 

74:56

You can upload your entire thesis, your sources, notes, 

75:00

your research, and soon interview,  audio recordings and videos, too. 

75:05

so Gemini has all this context to give you actionable advice. 

75:09

It can dissect your main points, identify improvements, and even 

75:14

role play as your professor. So you can feel confident in 

75:18

your work. And check out what Gemini 

75:21

Advanced can do with your spreadsheets, with the new data 

75:24

analysis feature launching in the coming weeks. 

75:27

Maybe you have a side hustle selling handcrafted products. 

75:31

But you’re a better artist than accountant, and it's really hard to understand  

75:35

which products are worth your time. Simply upload all of your 

75:39

spreadsheets and ask Gemini to visualize your earnings and help 

75:42

you understand your profit. Gemini goes to work calculating 

75:47

your returns and pulling its analysis together into a single 

75:51

chart, so you can easily understand which products are 

75:54

really paying off. Now, behind the scenes, Gemini writes  

75:59

custom Python code to crunch these numbers. And of course, your files are 

76:03

not used to train our models. Oh, and just one more thing. 

76:09

Later this year, we'll be doubling the  long context window to 2 million tokens.

76:22

[Cheers and Applause]. We absolutely can't wait for  

76:23

you to try all of this for yourself. Gemini is continuing to evolve 

76:28

and improve at a breakthrough pace. 

76:30

We’re making Gemini more multimodal, more agentive, and 

76:33

more intelligent, with the capacity to process the most 

76:37

information of any chatbot in the world. 

76:40

And as you heard earlier, we're also expanding Gemini Advanced 

76:43

to over 35 supported languages, available today.

76:50

[Applause]. But, of course, what makes 

76:55

Gemini so compelling is how easy 

76:58

it is to do just about anything you want, with a simple prompt. 

77:02

Let's take a look. >> Enter prompt here. 

77:08

Okay. Can't be that hard. 

77:10

How about generate an image of a cat playing guitar? 

77:14

Is that how it works? Am I doing AI? 

77:17

Yeah. Just does whatever you type 

77:20

What are last minute gift ideas you can make with arts and 

77:22

crafts? Plan a workout routine to get 

77:25

bigger calves. Help me think of titles to my 

77:28

tell-all memoir. What's something smart I can say 

77:31

about Renoir? Generate another image of a cat 

77:34

playing guitar. If a girl calls me a snack, how 

77:38

do I reply? Yeah, that's how it works. 

77:42

you're doing AI. Make this email sound more 

77:44

professional before I hit send. What's a good excuse to cancel 

77:49

dinner with my friends? We're literally sitting right 

77:52

here. There's no wrong way to prompt. 

77:56

Yeah, you're doing AI. There's no wrong way to prompt. 

78:01

It does whatever you type. Just prompt your prompt in the 

78:03

prompt bar. Or just generate an image of a 

78:05

cat playing guitar. You know it can do other stuff, 

78:11

right? [Applause]. 

78:24

>>SAMEER SAMAT: Hi, everyone. It’s great to be back at Google 

78:27

I/O. Today, you’ve seen how AI is 

78:31

transforming our products across Gemini, Search, Workspace and 

78:35

more. We're bringing all of these 

78:37

innovations right onto your Android phone. 

78:40

And we're going even further, to make Android the best place to 

78:45

experience Google AI. This new era of AI is a profound 

78:50

opportunity to make smartphones truly smart. 

78:54

Our phones have come a long way in a short time, but if you 

78:58

think about it, it’s been years since the user experience has 

79:01

fundamentally transformed. This is a once-in-a-generation 

79:05

moment to reinvent what phones can do. 

79:09

So we’ve embarked on a multi-year journey to reimagine 

79:12

Android, with AI at the core. And it starts with three 

79:18

breakthroughs you’ll see this year. 

79:21

First, we're putting AI-powered search right at your fingertips, 

79:26

creating entirely new ways to get the answers you need. 

79:30

Second, Gemini is becoming your new AI assistant on Android, 

79:35

there to help you any time. And third, we’re harnessing 

79:40

on-device AI to unlock new experiences that work as fast as 

79:44

you do, while keeping your sensitive data private. 

79:49

Let's start with AI-powered search. 

79:52

Earlier this year, we took an important first step at Samsung 

79:55

Unpacked, by introducing Circle to Search. 

79:59

It brings the best of Search directly into the user 

80:02

experience. So you can go deeper on anything 

80:05

you see on your phone, without switching apps. 

80:08

Fashionistas are finding the perfect shoes, home chefs are 

80:12

discovering new ingredients, and with our latest update, it’s 

80:16

never been easier to translate whatever’s on your screen, like 

80:20

a social post in another language. 

80:23

And there are even more ways Circle to Search can help. 

80:27

One thing we’ve heard from students is that they are doing 

80:30

more of their schoolwork directly on their phones and 

80:33

tablets. So, we thought: Could Circle to 

80:37

Search be your perfect study buddy? 

80:40

Let’s say my son needs help with a tricky physics word problem, 

80:44

like this one. My first thought is, oh boy, 

80:48

it’s been a while since I’ve thought about kinematics. 

80:51

If he’s stumped on this question, instead of putting me 

80:54

on the spot, he can circle the exact part he’s stuck on and get 

80:58

step-by-step instructions. Right where he’s already doing 

81:02

the work. Ah, of course, final velocity 

81:06

equals initial velocity plus acceleration times elapsed time. 

81:11

Right. I was just about to say that. 

81:15

Seriously, though, I love that it shows how to solve the 

81:18

problem, not just the answer. This new capability is available 

81:24

today! And later this year, Circle to 

81:28

Search will be able to tackle more complex problems involving 

81:31

symbolic formulas, diagrams, graphs and more. 

81:36

Circle to Search is only on Android. 

81:40

It’s available on more than 100 million devices today, and we’re 

81:44

on track to double that by the end of the year.

81:54

[Cheers and Applause]. You’ve already heard from Sissie 

81:56

about the incredible updates coming to the Gemini app. 

82:00

On Android, Gemini is so much more. 

82:04

It’s becoming a foundational part of the Android experience. 

82:08

Here’s Dave to share more. [Applause]. 

82:18

>>DAVE BURKE: Hey, everyone. A couple months ago we launched 

82:22

Gemini on Android. Like Circle to Search, Gemini 

82:26

works at the system level. So instead of going to a 

82:29

separate app, I can bring Gemini right to what I’m doing. 

82:34

Now, we're making Gemini context aware, so it can 

82:38

anticipate what you're trying to do and provide more helpful  

82:41

situations in the moment. In other words, to be a more 

82:45

helpful assistant. So let me show you  

82:48

how this works. 

82:48

And I've got my shiny new Pixel 8a here to help me.

82:55

[Applause]. 

82:56

So my friend Pete is asking me if I want to play pickleball 

82:59

this weekend. And I know how to play tennis, sort of. 

83:03

I have to say that for the demo. But I'm new to this pickleball thing,  

83:06

so I'm to reply and try to be funny and  say is that like tennis but with pickles? 

83:14

This would be actually a lot funnier with a meme,  so let me bring up Gemini to help with that,  

83:19

and I'll say create image of tennis with pickles. Now, one new thing you'll notice 

83:26

is that the Gemini window hovers in place above the app so that I 

83:29

stay in the flow. Okay. 

83:32

So I generated some pretty good images. What's nice is I can drag and drop any of  

83:36

these directly into the images below. So cool, let me send that.

83:44

[Applause]. All right. 

83:47

So Pete's typing, and he says -- he's  sending me a video on how to play pickleball. 

83:51

All right. Thanks, Pete. 

83:52

Let's tap on that. And that launches YouTube but, you know, I only  

83:56

have one or two burning questions about the game. I could bring up Gemini to help with that,  

84:01

and because it's context-aware, Gemini knows I'm  looking at a video, so it proactively shows me  

84:08

an ask this video chip. So let me tap on that. 

84:12

And now, I can ask specific  questions about the video. 

84:15

So, for example, what is the 2 bounce rule? Because that's something that I've heard about but  

84:24

don't quite understand in the game. By the way, this uses signals like  

84:28

YouTube's captions, which means you  can use it on billions of videos. 

84:32

So give it a moment, and, there. I get a nice,succinct answer. 

84:37

The ball must bounce once on each  side of the court after a serve. 

84:40

Okay. Cool. 

84:41

Let me go back to messages and  Pete's followed up, and he says,  

84:45

you're an engineer, so here's the  official rule book for pickleball. 

84:50

Thanks, Pete. Pete is very helpful, by the way. 

84:52

Okay. So we tap on that. 

84:53

It launches a PDF, now, that's an 84-page PDF. I don't know how much time Pete thinks I have. 

84:59

Anyway, us engineers, as you all know,  like to work smarter, not harder,  

85:03

so instead of trolling through this entire  document, I can pull up Gemini to help. 

85:08

And again, Gemini anticipates what I need,  and offers me an ask this PDF option. 

85:14

So if I tap on that, Gemini now ingests all  of the rules to become a pickleball expert,  

85:20

and that means I can ask very esoteric questions,  like, for example, are spin serves allowed? 

85:31

And let's hit that, because I've  heard that rule may be changing. 

85:34

Now, because I'm a Gemini advanced user, this  works on any PDF and takes full advantage  

85:39

of the long context window and there's  just lots of times where that's useful. 

85:43

For example, let's say you're looking for  a quick answer in an appliance user manual. 

85:48

And there you have it. It turns out, no, spin serves are not allowed. 

85:53

So Gemini not only gives me a clear answer to my  question, it also shows me exactly where in the  

85:59

PDF to learn more. Awesome. 

86:02

Okay. So that’s a few of the ways 

86:09

that we're enhancing Gemini to be more  context aware and helpful in the moment. 

86:14

And what you've seen here are the first of  really many new ways that Gemini will unlock  

86:20

new experiences at the system level,  and they're only available on Android. 

86:25

You’ll see these, and more, coming to hundreds of millions of  

86:28

devices over the next couple of months. Now, building Google AI directly 

86:34

into the OS elevates the entire smartphone experience. 

86:38

Android is the first mobile operating system to include a 

86:41

built-in, on-device foundation model. 

86:44

This lets us bring Gemini goodness from the data center 

86:47

right into your pocket. So the experience is faster,  

86:51

while also protecting your privacy. Starting with Pixel later this 

86:55

year, we’ll be expanding what’s possible with our latest model, 

86:58

Gemini Nano with Multimodality. This means your phone can 

87:03

understand the world the way you understand it. 

87:06

So not just through text input, but also  through sights, sounds, and spoken language. 

87:12

Let me give you an example. 2.2 billion people experience 

87:16

blindness or low vision. So several years ago, we 

87:19

developed TalkBack, an accessibility feature that helps 

87:23

people navigate their phone through touch and spoken feedback. 

87:27

Helping with images is especially important. 

87:30

In fact, my colleague Karo, who uses TalkBack, will typically 

87:34

come across 90 unlabeled images per day. 

87:37

Thankfully, TalkBack makes them accessible, and now we’re taking 

87:41

that to the next level with the multimodal capabilities of 

87:44

Gemini Nano. So when someone sends Karo a 

87:47

photo, she’ll get a richer and clearer description of what’s 

87:51

happening. Or, let’s say Karo is shopping 

87:53

online for an outfit. Now she can get a crystal clear 

87:56

description of the style and cut to find the perfect look. 

88:00

Running Gemini Nano on-device helps minimize latency, and the 

88:05

model even works when there's  no network connection. 

88:08

These improvements to TalkBack are coming later this year. 

88:13

Let me show you another example of what on-device AI can unlock. 

88:17

People lost more than one trillion dollars to fraud last 

88:20

year. And as scams continue to evolve 

88:23

across texts, phone calls, and even videos, Android can help 

88:27

protect you from the bad guys, no matter how they try to reach 

88:30

you. So let’s say I get rudely 

88:33

interrupted by an unknown caller right in the middle of my 

88:36

presentation. [Phone ringing]. 

88:40

>> Hello! >> Hi. 

88:42

I'm calling from Save More Bank Security Department. 

88:44

Am I speaking to Dave? >>DAVE BURKE: Yes, this is Dave. 

88:47

I’m kinda in the middle of something. 

88:48

>> We've detected some suspicious activity on your 

88:50

account. It appears someone is trying to 

88:52

make unauthorized charges. >>DAVE BURKE: Oh, yeah? 

88:56

What kind of charges? >> I can't give you specifics 

88:58

over the phone, but to protect your account, I’m going to help 

89:01

you transfer your money to a secure account we’ve set up for 

89:04

you. [Laughter]. 

89:08

>>DAVE BURKE: And look at this, 

89:09

my phone gives me a warning that this call might be a scam!

89:20

[Applause]. Gemini Nano alerts me the second 

89:22

it detects suspicious activity, like a bank asking me to move my 

89:25

money to keep it safe. And everything happens right on 

89:29

my phone, so the audio processing stays completely 

89:32

private to me and on my device. We’re currently testing this 

89:35

feature, and we’ll have more updates to share later in the 

89:38

summer. And we’re really just scratching 

89:41

the surface on the kinds of fast, private experiences that 

89:45

on-device AI unlocks. Later this year, Gemini will be 

89:49

able to more deeply understand the content of your screen, 

89:53

without any information leaving your phone, thanks to the 

89:55

on-device model. So, remember that pickleball 

89:59

example earlier? Gemini on Android will be able 

90:02

to automatically understand the conversation and provide 

90:05

relevant suggestions, like where to find pickleball clubs near 

90:09

me. 

90:10

And this is a powerful concept that will  work across many apps on your phone. 

90:15

In fact, later today at the developer keynote, you’ll hear 

90:18

about how we’re empowering our developer community with our 

90:21

latest AI models and tools like Gemini Nano and Gemini in 

90:25

Android Studio. Also, stay tuned tomorrow for 

90:29

our upcoming Android 15 updates, which we can’t wait to share. 

90:35

As we said at the outset, we’re reimagining Android with Gemini 

90:38

at the core. From your favorite apps, to the 

90:41

OS itself, we’re bringing the power of AI to every aspect of 

90:45

the smartphone experience. And with that, let me hand over 

90:49

to Josh to share more on our latest news for developers. 

90:53

Thank you. [Applause]. 

91:04

>>JOSH WOODWARD: It’s amazing to see Gemini Nano do all of that 

91:08

directly on Android. That was our plan all along, to 

91:12

create a natively multimodal Gemini in a range of sizes so 

91:17

you all, as developers, can choose  the one that works best for you. 

91:22

Throughout the morning, you’ve heard a lot about our Gemini 1.5 

91:25

series, and is I want to talk about  the two models you can access today. 

91:30

1.5 Pro, which is getting a 

91:32

series of quality improvements that go out, right about now,  

91:36

and the brand new 1.5 Flash. Both are available globally in 

91:41

over 200 countries and territories.

91:49

[Cheers and Applause]. You can go over to AI Studio  

91:51

or Vertex AI if you're a Google cloud  customer and you can give them a try. 

91:55

Now, both models are also natively multimodal. 

91:58

That means you can interleave text, images, audio, video as 

92:03

inputs, and pack that massive 1 million token context window. 

92:07

And if you go to ai.google.dev today, you can sign up to try 

92:12

the 2 million token context window for 1.5 Pro. 

92:17

We're also adding a bunch of new developer  features, starting with video frame extraction. 

92:23

That's going to be in the Gemini  API, parallel function calling,  

92:26

so you can return more than one function call  at a time, and my favorite, context caching, so  

92:33

you can send all of your files to the model once  and not have to re-send them over and over again. 

92:40

That should make the long  context even more useful,  

92:43

and more affordable. It ships next month.

92:49

[Applause]. Now, we're using Google's 

92:54

infrastructure to serve these 

92:56

models, so developers like all  of you can get great prices. 

93:01

1.5 Pro is $7 per 1 million tokens, and I'm excited to share 

93:07

that for prompts up to 128K, it will be 50% less, for $3.50. 

93:15

And 1.5 flash will start at .35 cents for 1 million tokens.

93:23

[Cheers and Applause]. 

93:26

Now, one thing you might be wondering is  which model is best for your use case? 

93:30

Here’s how I think about it. We use 1.5 Pro for complex tasks, where you  

93:36

really want the highest quality response, and it's  okay if it takes a little bit longer to come back. 

93:42

We're using 1.5 Flash for quick tasks, where  the speed of the model is what matters the most. 

93:49

And as a developer, you can go try them both  out today and see what works best for you. 

93:55

Now, I'm going to show you how it works here in  AI Studio, the fastest way to build with Gemini. 

94:00

And we'll pull it up here, and  you can see this is AI studio. 

94:05

It's free to use. You don't have to configure anything to get going. 

94:09

You just go to AI studio.Google.com, log in with  your Google account, and you can just pick the  

94:14

model here on the right that works best for you. So one of the ways we've been using 1.5  

94:20

Flash is to actually learn from customer  feedback about some of our labs products. 

94:26

Flash makes this possible with its low latency. So what we did here is we just took a bunch of  

94:32

different feedback from our customer forums. You can put it in to Flash, load up  

94:37

a prompt, and hit run. Now, in the background,  

94:40

what it's going to do is it's going to go through  that 93,000 token pile of information and you  

94:46

can see here start streaming it back. Now, this is really helpful because  

94:50

it pulls out the themes for us. It gives us all the right places  

94:53

where we can start to look. We can see this is from some of the  

94:56

benefits from Notebook LM, like we showed earlier. Now, what's great about this is that you can take  

95:03

something like this in AI Studio, prototyped  here in ten seconds, and with one click in  

95:09

the upper left, get an API key, or over here in  the upper right, just tap get code, and you've  

95:15

got all the model configurations, the safety  settings, ready to go, straight into your IDE. 

95:22

Now, over time, if you find that you  need more enterprise-grade features  

95:26

you can use the same Gemini 1.5 models and  the same configurations right in Vertex AI. 

95:33

That way, you can scale up with Google  Cloud as your enterprise needs grow. 

95:38

So that's our newly updated Gemini 1.5 Pro and the  new 1.5 Flash, both of which are available today  

95:46

globally, and you'll hear a lot more about  them in the developer keynote later today.

95:58

[Applause]. 

95:59

Now, let's shift gears and talk  about Gemma, our family of open 

96:03

models, which are crucial for driving AI innovation and 

96:07

responsibility. Gemma is being built from the  

96:10

same research and technology as Gemini. It offers top performance and comes in  

96:15

light weight 7b and 2b sizes. Since it launched less than 

96:20

three months ago, it’s been downloaded millions of times 

96:23

across all the major model hubs. Developers and researchers have 

96:28

been using it and customizing the base Gemma  model and using some of our pre-trained variants,  

96:34

like RecurrentGemma, and CodeGemma,  and today's newest member, PaliGemma,  

96:42

our first vision-language model,  and it's available right now.

96:49

[Applause]. It's optimized for  

96:53

a range of image captioning, visual Q&A and  other image labeling tasks, so go give it a try. 

97:01

I'm also excited to announce  that we have Gemma 2 coming. 

97:06

It's the next generation of Gemma,  and it will be available in June. 

97:11

One of the top requests we've heard from  developers is for a bigger Gemma model,  

97:16

but it's still going to fit in the  size that's easy for all of you to use. 

97:20

So in a few weeks, we'll  be adding a new 27 billion  

97:23

parameter model to Gemma 2, and  here's what's great about it. 

97:28

This size is optimized by Nvidia to run  on next-gen GPUs and can run efficiently  

97:35

on a single TPU host in Vertex AI. So this quality to size ratio is  

97:42

amazing because it will outperform  models more than twice its size. 

97:47

We can't wait to see what  you're going to build with it.

97:55

[Applause]. 

97:56

To wrap up, I want to share this inspiring  story from India, where developers have been  

98:01

using Gemma and its unique tokenization to  create Navrasa, a set of instruction-tuned  

98:08

models to expand access to 15 Indic languages. This builds on our efforts to make information  

98:14

accessible in more than 7,000  languages and the world. 

98:19

Take a look. >>AASHI:  

98:29

Language is an interesting problem to solve, 

98:33

actually, and given India has a huge variety of languages and it 

98:38

changes every five kilometers. >>HARSH: When technology is 

98:43

developed for a particular culture, it won't be able to 

98:47

solve and understand the nuances of a country like India. 

98:52

One of Gemma’s features is an incredibly powerful tokenizer 

98:56

which enables the model to use hundreds of thousands of words, 

98:59

symbols, and characters across so many alphabets and language 

99:03

systems. 

99:04

This large vocabulary is critical to adapting Gemma to 

99:08

power projects like Navarasa. >>RAMSRI: Navarasa is a model 

99:13

that’s trained for Indic languages. 

99:15

It's a fine-tuned model based on Google’s Gemma. 

99:17

We built Navarasa to make large language models culturally 

99:21

rooted where people can talk in their native language and get 

99:25

the responses in their native language. 

99:28

Our biggest dream is to build a model to include everyone from 

99:33

all corners of India. >>GAURAV: We need a technology 

99:41

that will harness AI so that everyone can use it and no one 

99:41

is left behind. >>HARSH: Today the language that 

99:42

you speak in could be the tool and the technology that you use 

99:46

for solving your real-world problems. 

99:49

And that's the power of generative AI that we want to 

99:52

bring to every corner of India and the entire world.

100:12

[Applause].

100:12

[Cheers and Applause]. >>JAMES MANYIKA: Listening to 

100:13

everything that’s been announced today, it’s clear that AI is 

100:17

already helping people, from their everyday tasks to their 

100:22

most ambitious, productive, and imaginative endeavors. 

100:27

Our AI innovations, like multimodality, long context and 

100:31

agents, are at the cutting-edge of what this technology can do, 

100:36

taking to a whole new level its capacity to help people. 

100:40

Yet, as with any emerging technology, there are still 

100:45

risks and new questions that will arise as AI advances and 

100:49

its uses evolve. In navigating these 

100:53

complexities, we are guided by our AI Principles, and we’re 

100:57

learning from our users, partners, and our own research. 

101:03

To us, building AI responsibly means both addressing the risks 

101:08

and maximizing the benefits for people and society. 

101:12

Let me begin with what we’re doing to address risks. 

101:15

Here, I want to focus on how we are improving our models and 

101:18

protecting against their misuse. Beyond what Demis shared 

101:23

earlier, we are improving our models with an industry-standard 

101:27

practice called red-teaming, in which we test our own models and try  

101:31

to break them to identify weaknesses. Adding to this work, we’re 

101:36

developing a cutting-edge technique we call AI-assisted 

101:40

red teaming. This draws on Google DeepMind's 

101:44

gaming breakthroughs like AlphaGo, where we train AI 

101:48

agents to compete against each other and improve and expand the 

101:51

scope of their red teaming capabilities. 

101:54

We are developing AI models with these capabilities to help 

101:58

address adversarial prompting and limit problematic outputs. 

102:03

We’re also improving our models with feedback from two important 

102:07

groups: Thousands of internal safety experts with a range of  

102:11

disciplines, and a range of independent  experts from academia to civil society. 

102:18

Both groups help us identify emerging risks, from 

102:22

cybersecurity threats to potentially dangerous 

102:25

capabilities in areas like Chem-Bio. 

102:30

Combining human insight with our safety testing methods will help 

102:35

make our models and products more accurate, reliable and safer. 

102:41

This is particularly important as technical advances like better  

102:45

intonation make interactions with  AI feel and sound more human-like. 

102:51

We're doing a lot of research in this area,  including the potential for harm and misuse. 

102:58

We're also developing new tools to  help prevent the misuse of our models. 

103:03

For example, Imagen 3 and Veo create more realistic imagery 

103:08

and videos, we must also consider how they might be 

103:12

misused to spread misinformation. 

103:15

To help, last year we introduced SynthID, a tool that adds 

103:20

imperceptible watermarks to our AI-generated images and audio so 

103:25

that they’re easier to identify. Today, we’re expanding SynthID 

103:30

to two new modalities: Text and video. 

103:35

These launches build on our efforts to deploy 

103:37

state-of-the-art watermarking capabilities across modalities. 

103:42

Moving forward, we will keep integrating advances like 

103:45

watermarking and other emerging techniques, to secure our latest 

103:50

generations of Gemini, Imagine, Lyria, and Veo models. 

103:55

We’re also committed to working with the ecosystem with all of you  

103:59

to help others build on the advances we're making. And in the coming months, we'll be open-sourcing  

104:05

SynthID text watermarking. This will be available in our 

104:09

updated Responsible Generative AI Toolkit, which we created to 

104:14

make it easier for developers to build AI responsibly. 

104:18

We're also collaborating on  C2PA, and we support C2PA, 

104:23

collaborating with Adobe, Microsoft, startups, and many 

104:26

others, to build and implement a standard that improves the 

104:30

transparency of digital media. Now, let’s turn to the second 

104:36

and equally important part of our responsible AI approach: 

104:40

How we’re building AI to benefit people and society. 

104:44

Today, our AI advances are helping to solve real-world 

104:48

problems, like accelerating the work of 1.8 million scientists 

104:53

in 190 countries who are using AlphaFold to work on issues like 

104:58

neglected diseases. Helping to predict floods in 

105:02

more than 80 countries. And helping organizations, like 

105:06

the United Nations track progress on  the world's 17 sustainable development  

105:11

goals with Data Commons. And now, generative AI is 

105:16

unlocking new ways for us to make the world’s information, 

105:20

and knowledge, universally accessible and useful for 

105:23

learning. Billions of people already use 

105:27

Google products to learn every day, and generative  AI is opening up new possibilities, allowing us to  

105:34

ask questions like, what if everyone everywhere could have their own 

105:39

personal AI tutor, on any topic? Or, what if every educator could 

105:45

have their own assistant in the classroom? 

105:48

Today marks a new chapter for learning and education at 

105:51

Google. I am excited to introduce 

105:54

LearnLM, our new family of models, based on Gemini, and 

106:00

fine-tuned for learning. LearnLM is grounded in 

106:04

educational research, making learning experiences more 

106:08

personal and engaging. And it’s coming to the products 

106:12

you use every day. Like Search, Android, Gemini and YouTube. 

106:17

In fact, you've already seen LearnLM  on stage today when it helped Sameer  

106:23

with his son's homework on Android. Now, let's see how it works in 

106:27

the Gemini app. Earlier, Sissie introduced Gems, 

106:31

custom versions of Gemini that can act as personal assistive 

106:36

experts on any topic. We are developing some pre-made 

106:41

Gems, which will be available in the Gemini App and web 

106:44

experience, including one called Learning Coach. 

106:48

With Learning Coach, you can get step-by-step study guidance, 

106:52

along with helpful practice and memory techniques, designed to 

106:57

build understanding rather than just give you the answer. 

107:01

Let’s say you’re a college student studying for an upcoming 

107:04

biology exam. If you need a tip to remember 

107:07

the formula for photosynthesis, Learning Coach can help. 

107:12

Learning Coach, along with other pre-made gems, will launch in 

107:15

Gemini in the coming months. And you can imagine what 

107:19

features like Gemini Live can unlock for learning. 

107:24

Another example is a new feature in YouTube that uses LearnLM to 

107:29

make educational videos more interactive, allowing you to ask 

107:33

a clarifying question, get a helpful explanation, or take a 

107:37

quiz. This even works  

107:39

for those long lectures or seminars, thanks  to Gemini model's long context capabilities. 

107:46

This feature in YouTube is already rolling out to select 

107:50

Android users. As we work to extend LearnLM 

107:54

beyond our own products, we are partnering with experts and 

107:58

institutions like Columbia Teachers College, Arizona State 

108:02

University and Khan Academy to test and improve the new 

108:07

capabilities in our models for learning. 

108:10

And we’ve collaborated with MIT RAISE to develop an online 

108:14

course to help educators better understand and use generative 

108:18

AI. We’re also working directly with 

108:21

educators to build more helpful generative AI tools with Learn 

108:25

LM. For example, in Google 

108:27

Classroom, we’re drawing on the advances you’ve heard about 

108:30

today to develop new ways to simplify  and improve lesson planning, and enable  

108:37

teachers to tailor lessons and content to  meet the individual needs of their students. 

108:44

Standing here today makes me think back to my own time as an 

108:47

undergraduate. Then, AI was considered 

108:51

speculative, far from any real world uses. 

108:55

Today, we can see how much is already real, how much it is 

109:00

already helping people, from their everyday tasks to their 

109:03

most ambitious, productive and imaginative endeavors, and how 

109:08

much more is still to come. This is what motivates us. 

109:12

I’m excited about what’s ahead and what we’ll build with all of 

109:16

you. Back to you, Sundar. 

109:17

[Applause]. >>SUNDAR PICHAI:  

109:30

Thanks, James. All of this shows the important 

109:34

progress we’ve made, as we take a bold and responsible approach 

109:37

to making AI helpful for everyone. 

109:40

Before we wrap, I have a feeling that someone out there might be 

109:44

counting how many times we’ve mentioned AI today. 

109:52

[Laughter]. And since a big theme today has 

109:54

been letting Google do the work for you, we went ahead and 

109:58

counted, so that you don’t have to.

110:12

[Cheers and Applause]. That might be a record in how 

110:14

many times someone has said AI. I’m tempted to say it a few more 

110:21

times. But I won't. 

110:23

Anyhow, this tally is more than just a punchline. 

110:26

It reflects something much deeper. 

110:29

We’ve been AI-first in our approach for a long time. 

110:32

Our decades of research leadership have pioneered many 

110:36

of the modern breakthroughs that power AI progress, for us and 

110:39

for the industry. On top of that, we have 

110:42

world-leading infrastructure built for the AI Era, 

110:45

cutting-edge innovation in Search, now powered by Gemini, 

110:49

products that help at extraordinary scale, including 

110:52

fifteen products with over half a billion users, and platforms 

110:57

that enable everyone, partners, customers, creators, and all of 

111:02

you, to invent the future. This progress is only possible 

111:06

because of our incredible developer community. 

111:08

You are making it real, through the experiences you build every 

111:12

day. So, to everyone here in 

111:14

Shoreline and the millions more watching around the world, 

111:18

here’s to the possibilities ahead and creating them 

111:20

together. Thank you.

111:28

[Cheers and Applause]. >> What does this remind you of? 

111:45

>> Cat. >> Wow. 

111:50

>> Wow! >> Okay! 

111:54

>> When all of these tools come together, it's a powerful 

111:58

combination. >> It's amazing. 

111:59

>> It's amazing. It's an entire suite of different  

112:03

kinds of possibilities. >> Hi. 

112:09

I'm Gemini. >> What neighborhood do you 

112:14

think I'm in? >> This appears to be the Kings 

112:16

Cross area of London. >> Together we're creating a new 

112:19

era.