Google Keynote (Google I/O ‘24)

Google
14 May 2024112:43

Summary

TLDRThe Google I/O 2024 event showcased a multitude of AI innovations, highlighting Google's commitment to integrating artificial intelligence into every aspect of technology. Sundar Pichai, CEO of Google, introduced Gemini, a generative AI model, as a cornerstone of Google's AI strategy. Gemini is designed to be multimodal, capable of processing text, images, video, and code, and is set to revolutionize the way people work and interact with technology. The event covered advancements in Google Search, Workspace, Android, and the introduction of new AI models like Gemini 1.5 Pro and 1.5 Flash. The updates aim to make AI more accessible and beneficial for creators, developers, and users worldwide. The narrative emphasized the potential for AI to personalize and enhance various facets of life, from education to daily tasks, while also addressing the importance of responsible AI development and deployment.

Takeaways

  • 🚀 Google has launched Gemini, a generative AI model, aiming to revolutionize the way we work by being natively multimodal and capable of reasoning across various forms of data like text, images, video, and code.
  • 📈 Over 1.5 million developers are already using Gemini models for tasks such as debugging code, gaining insights, and building AI applications, highlighting its rapid adoption and impact on the developer community.
  • 🔍 Google Search has been transformed with Gemini, enabling new ways of searching, including complex queries, and photo searches, leading to an increase in user satisfaction and search usage.
  • 📱 Gemini's capabilities are being integrated into Google's products across Mobile, Search, Photos, Workspace, and Android, providing a seamless AI experience for users.
  • 🎉 Sundar Pichai announced the expansion of Gemini's context window to 2 million tokens, a significant leap towards the goal of infinite context, allowing for even more detailed and long-range reasoning.
  • 📈 Google Workspace is set to benefit from Gemini's multimodality and long context features, streamlining tasks like email summarization and meeting highlight generation, enhancing productivity.
  • 🎓 LearnLM, a new family of models based on Gemini, is designed to enhance learning experiences and is being integrated into everyday products like Search, Android, Gemini, and YouTube.
  • 🤖 The concept of AI agents was introduced, which are intelligent systems capable of reasoning, planning, and memory, designed to perform tasks on behalf of users while ensuring user supervision and control.
  • 💬 Gemini's real-time speech models enable a more natural conversational experience with AI, allowing users to interrupt and receive immediate responses, making interactions feel more human-like.
  • 🌐 Google is committed to responsible AI development, focusing on improving model safety, preventing misuse, and expanding AI's benefits to society, including education and accessibility.
  • 📊 Google's investment in AI infrastructure, such as Tensor Processing Units (TPUs), is pivotal in training and serving state-of-the-art models like Gemini, reinforcing Google's position at the forefront of AI innovation.

Q & A

  • What is Google's new generative AI model called?

    -Google's new generative AI model is called Gemini.

  • How does Gemini redefine the way we work with AI?

    -Gemini redefines the way we work with AI by being natively multimodal, allowing users to interact with it through text, voice, or the phone's camera, and by providing more natural and context-aware responses.

  • What is the significance of the 1 million token context window in Gemini 1.5 Pro?

    -The 1 million token context window in Gemini 1.5 Pro is significant because it is the longest context window of any chatbot in the world, allowing it to process complex problems and large amounts of information that were previously unimaginable.

  • How does Gemini Advanced's trip planning feature work?

    -Gemini Advanced's trip planning feature works by gathering information from various sources like Search, Maps, and Gmail. It uses this data to create a dynamic graph of possible travel options, taking into account the user's priorities and constraints, and then presents a personalized vacation plan.

  • What is the role of Gemini in the future of Google Search?

    -In the future of Google Search, Gemini plays the role of an AI agent that uses multi-step reasoning to break down complex questions, figure out the problems that need to be solved, and in what order. It taps into Google's index of information about the real world to provide comprehensive and customized search results.

  • How does Google ensure the responsible use of its AI technology?

    -Google ensures the responsible use of its AI technology by adhering to its AI Principles, red-teaming to identify weaknesses, involving internal safety experts and independent experts for feedback, and developing tools like SynthID to watermark AI-generated content, making it easier to identify.

  • What is the new feature called 'Live' in the Gemini app?

    -'Live' is a new feature in the Gemini app that allows users to have in-depth conversations with Gemini using their voice. It utilizes Google's latest speech models to better understand users and provide more natural responses.

  • How does Gemini's 'Gems' feature help users customize their AI experience?

    -Gemini's 'Gems' feature allows users to create personalized AI assistants, or 'Gems,' tailored to specific topics or tasks. Users can set up these Gems once with their instructions and then use them whenever needed for a customized AI experience.

  • What is the purpose of the 'AI-organized search results page' in Google Search?

    -The 'AI-organized search results page' in Google Search is designed to provide users with a whole page of AI-generated and AI-organized content that is custom-built for their query. It uncovers the most interesting angles for the user to explore and organizes the results into helpful clusters.

  • How does the Gemini app integrate with Android to enhance the smartphone experience?

    -The Gemini app integrates with Android by becoming a foundational part of the Android experience, working at the system level. It provides context-aware assistance, allowing users to bring Gemini to their current activity without switching apps, and offers features like video understanding and on-device processing for faster and more private experiences.

  • What is the 'SynthID' tool, and how does it contribute to responsible AI?

    -SynthID is a tool developed by Google that adds imperceptible watermarks to AI-generated images, audio, text, and video. This makes the synthetic media easier to identify and helps prevent the misuse of AI-generated content, such as spreading misinformation.

Outlines

00:00

🚀 Google's Gemini AI: A Leap Forward in Technology

The first paragraph introduces Google's ambitious strides in artificial intelligence with the launch of Gemini, a generative AI model. Sundar Pichai, CEO of Google, welcomes the audience to Google I/O and emphasizes the transformative impact of Gemini on various sectors, including the way we work, find solutions, and interact with technology. The paragraph highlights Google's commitment to AI innovation across research, product development, and infrastructure, and the potential of Gemini to drive opportunities for creators and developers.

05:02

🔍 Google Search Transformation with Generative AI

The second paragraph discusses the transformation of Google Search with the integration of Gemini's capabilities. It talks about the Search Generative Experience that has led to new ways of searching, including complex queries and photo-based searches. The paragraph also mentions the user satisfaction increase and the launch of AI Overviews, which will be available to users in the U.S. with plans for global expansion.

10:05

📸 Google Photos Enhancement with Gemini

The third paragraph showcases how Gemini is enhancing Google Photos by making the search process more intuitive and context-aware. It describes a scenario where a user can find their car's license plate number by simply asking Photos, thanks to Gemini's ability to recognize and understand the context. The paragraph also teases the upcoming 'Ask Photos' feature, which will allow for deeper memory search capabilities.

15:08

🧠 Multimodality and Long Context in Gemini

The fourth paragraph delves into the technical aspects of Gemini, focusing on its multimodality and long context capabilities. It discusses how Gemini's design allows it to understand different types of inputs and find connections between them. The paragraph also highlights the developer's excitement about the 1 million token context window and how it has been used to improve tasks such as coding and data analysis.

20:12

📚 Innovative Applications of Gemini

The fifth paragraph presents real-world applications of Gemini, where developers have used its advanced features to perform tasks like turning a video of a bookshelf into a searchable database. It illustrates the potential of Gemini to understand and process vast amounts of data, providing innovative solutions to complex problems.

25:15

🌐 Expanding Gemini's Reach and Capabilities

The sixth paragraph discusses the expansion of Gemini's capabilities with the introduction of Gemini 1.5 Pro, which offers long context support and is now available globally. It also announces the expansion of the context window to 2 million tokens for developers and highlights new updates in translation, coding, and reasoning.

30:16

🤖 AI Agents and the Future of Intelligent Systems

The seventh paragraph explores the concept of AI agents, which are intelligent systems capable of reasoning, planning, and memory. It provides examples of how these agents can simplify tasks like shopping and moving to a new city by automating multiple steps on behalf of the user. The paragraph emphasizes the importance of privacy, security, and user control in the development of these intelligent systems.

35:18

🧑‍🤝‍🧑 Personalized AI for Everyone

The eighth paragraph focuses on the ultimate goal of making AI helpful and accessible to everyone. It discusses the combination of multimodality, long context, and AI agents as a means to organize the world's information and make it useful for individuals. The paragraph also introduces the concept of AI-first approach and the role of Google's infrastructure in supporting AI advancements.

40:19

🎓 LearnLM: Advancing Education with AI

The ninth paragraph introduces LearnLM, a new family of models based on Gemini and fine-tuned for educational purposes. It highlights the potential of LearnLM to provide personalized and engaging learning experiences through products like Search, Android, Gemini, and YouTube. The paragraph also mentions partnerships with educational institutions to enhance the capabilities of these models for learning.

45:21

🌟 The Impact of AI on Society and Future Innovations

The tenth paragraph emphasizes the real-world impact of AI, its role in solving global issues, and the ethical considerations guiding its development. It discusses Google's AI principles, the use of red-teaming and AI-assisted red teaming to improve model safety, and the expansion of the SynthID watermarking tool. The paragraph concludes with a forward-looking statement on the potential of AI to enhance learning and education.

50:23

🤝 Collaboration and the Era of AI Innovation

The eleventh paragraph celebrates the developer community's role in bringing AI innovations to life. It acknowledges the collaborative efforts in creating AI technologies and the ongoing journey to explore and build the future of AI. The paragraph ends with a tribute to the possibilities ahead and the commitment to creating them together.

55:32

📈 The Significance of AI in Google's Ecosystem

The twelfth paragraph reflects on the frequency of mentioning AI throughout the discussion, symbolizing the integral role of AI in Google's approach and offerings. It underscores Google's AI-first mindset, research leadership, and the infrastructure built for the AI era. The paragraph concludes with a nod to the developer community's contributions to realizing AI's potential.

Mindmap

Keywords

💡Artificial Intelligence (AI)

Artificial Intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is the central theme, with Google discussing its advancements in AI technology through projects like Gemini, which aims to revolutionize the way we work and interact with technology.

💡Gemini

Gemini is a generative AI model introduced by Google that is designed to be natively multimodal, capable of reasoning across various forms of input like text, images, video, and code. It is highlighted in the video as a significant step towards turning any input into any output, thus enabling a new generation of AI applications.

💡Multimodal

Multimodal refers to the ability of a system to process and understand multiple forms of input, such as text, speech, images, and video. In the context of the video, Google's Gemini model is described as multimodal, allowing it to function effectively across various types of data and providing a more integrated and human-like interaction experience.

💡Long Context

Long context denotes the capacity of an AI model to process and understand extensive amounts of information, such as lengthy texts or long-duration audio and video. The video emphasizes Gemini 1.5 Pro's ability to handle up to 1 million tokens in production, a significant breakthrough that allows the model to manage more complex and detailed tasks.

💡AI Overviews

AI Overviews is a feature that utilizes Google's AI capabilities to provide users with summarized and contextual answers to their queries. As mentioned in the video, this feature is part of the revamped Google Search experience, aiming to increase user satisfaction by offering comprehensive and insightful responses to search queries.

💡Google Workspace

Google Workspace is a suite of productivity and collaboration tools developed by Google, which includes Gmail, Docs, Drive, and Calendar, among others. In the video, it is discussed how Gemini's integration with Google Workspace can streamline tasks like email summarization and meeting highlights, enhancing productivity and efficiency for users.

💡AI Agents

AI Agents are intelligent systems that can perform tasks on behalf of users by reasoning, planning, and remembering steps. The video describes how Google is working on AI agents that can execute complex tasks like online shopping or moving to a new city, making the process more convenient and less time-consuming for users.

💡Project Astra

Project Astra is an initiative by Google that aims to develop advanced AI assistants with capabilities for faster processing, better understanding of context, and more natural conversational responses. The video showcases a prototype of these AI agents, demonstrating their potential to provide personalized and interactive experiences.

💡Tensor Processing Units (TPUs)

TPUs are specialized hardware accelerators designed to speed up machine learning tasks. In the video, Google announces the sixth generation of TPUs called Trillium, which offers significant improvements in compute performance, essential for training state-of-the-art models like Gemini.

💡AI-Generated Media

AI-Generated Media refers to the creation of content such as images, music, and video through artificial intelligence. The video highlights Google's advancements in this area with the introduction of models like Imagen 3 for image generation and Veo for generative video, which are set to transform creative industries by providing new tools for artists and developers.

💡AI Principles

AI Principles are a set of ethical guidelines that companies like Google follow to ensure the responsible development and use of AI technology. The video touches on Google's commitment to these principles, emphasizing the importance of safety, privacy, and the beneficial use of AI in society, as well as the continuous evaluation and improvement of AI models.

Highlights

Google has launched Gemini, a generative AI, which is transforming the way we work.

Google I/O introduced new beginnings and innovative solutions to age-old problems through advancements in AI.

Sundar Pichai emphasized that Google is in the early days of the AI platform shift with significant opportunities ahead.

Gemini models have demonstrated state-of-the-art performance on every multimodal benchmark.

Over 1.5 million developers are using Gemini models for various applications such as debugging code and building AI apps.

Google Search has been revolutionized by Gemini, allowing for more complex queries and photo-based searches.

Google Photos integration with Gemini makes searching through personal memories more accessible and efficient.

Google Workspace is set to enhance productivity with Gemini's capabilities, including summarizing emails and automating tasks.

Google is expanding the context window to 2 million tokens for developers, a significant step towards infinite context.

The introduction of Gemini 1.5 Flash, a lighter-weight model optimized for low latency and cost-efficient tasks at scale.

Project Astra represents the future of AI assistants, aiming to build a universal AI agent for everyday use.

Imagen 3, Google's most capable image generation model yet, offers more photorealistic and detailed results.

Google's Music AI Sandbox is a suite of tools that can create new instrumental sections and transfer styles between tracks.

Veo, the new generative video model, creates high-quality 1080P videos from various prompts, offering creative control.

Google's AI innovations are enabling more natural and interactive experiences with AI, with real-world applications.

Google is committed to responsible AI development, focusing on safety, privacy, and ethical considerations in AI advancements.

The introduction of LearnLM, a new family of models designed to enhance learning experiences across Google products.

Transcripts

00:00

[Cheers and Applause]. >>WOMAN: Google’s ambitions in 

00:01

artificial intelligence. >>MAN: Google launches Gemini, 

00:03

the generative AI. >> And it's completely changing 

00:06

the way we work. >> You know, a lot has happened 

00:09

in a year. There have been new beginnings. 

00:15

We found new ways to find new Ways to find new ideas. 

00:20

And new solutions to age-old problems. >> Sorry about your shirt. 

00:27

We dreamt of things -- >> Never too old for a 

00:30

treehouse. >> We trained for things. 

00:32

>> All right! Let’s go go go!

00:34

>> And learned about this thing. We found new paths, took the 

00:41

next step, and made the big leap. Cannon ball! 

00:52

We filled days like they were weeks. 

00:54

And more happened in months, than has happened in years. 

00:59

>> Hey, free eggs. >> Things got bigger,  

01:08

like waaay bigger. 

01:12

And it wasn’t all just for him, or for her. 

01:18

It was for everyone.

01:24

And you know what? 

01:27

We’re just getting started.

01:47

>>SUNDAR PICHAI:  Hi, everyone. Good morning. 

01:56

[Cheers and Applause]. welcome to Google I/O. 

01:57

It's great to have all of you with us. We have a few thousand 

02:00

developers with us here today at Shoreline. 

02:03

Millions more are joining virtually around the world. 

02:06

Thanks to everyone for being here. 

02:09

For those of you who haven’t seen I/O before, it’s basically 

02:13

Google’s version of the Eras Tour, but with fewer costume 

02:18

changes. [Laughter]. 

02:20

At Google, though, we are fully in our Gemini era. Before we get into it, I want to 

02:28

reflect on this moment we’re in. We’ve been investing in AI for 

02:33

more than a decade, and innovating  at every layer of the stack: 

02:38

Research, product, infrastructure We’re going to talk about it all today. 

02:43

Still, we are in the early  days of the AI platform shift. 

02:47

We see so much opportunity ahead for creators,  for developers, for startups, for everyone. 

02:56

Helping to drive those opportunities  is what our Gemini era is all about. 

03:01

So let’s get started. 

03:03

A year ago on this stage, we first shared our plans for 

03:06

Gemini, a frontier model built to be natively multimodal from 

03:11

the very beginning, that could reason across text, images, 

03:16

video, code, and more. It’s a big step in turning any 

03:20

input into any output. An I/O for a new generation. 

03:26

Since then we introduced the first Gemini models, our most 

03:29

capable yet. They demonstrated 

03:32

state-of-the-art performance on every multimodal benchmark. 

03:35

And that was just the beginning. Two months later, we introduced 

03:40

Gemini 1.5 Pro, delivering a big breakthrough in long context. 

03:46

It can run 1 million tokens in production, consistently. 

03:49

More than any other large-scale foundation model yet. 

03:53

We want everyone to benefit from what Gemini can do, so we’ve 

03:57

worked quickly to share these advances with all of you. 

04:01

Today, more than 1.5 million developers use Gemini models 

04:06

across our tools. You’re using it to debug code, 

04:10

get new insights, and build the next generation of AI 

04:13

applications. We’ve also been bringing 

04:17

Gemini’s breakthrough capabilities across our products 

04:20

in powerful ways. We’ll show examples today across 

04:24

Search, Photos, Workspace, Android and more. 

04:28

Today, all of our 2-billion user products use Gemini. 

04:32

And we’ve introduced new experiences, too, including on 

04:36

Mobile, where people can interact with Gemini directly 

04:39

through the app. Now available on Android and 

04:43

iOS. And through Gemini Advanced, 

04:46

which provides access to our most capable models. 

04:49

Over 1 million people have signed up to try it, in just 

04:52

three months. And it continues to show strong 

04:55

momentum. One of the most exciting 

04:58

transformations with Gemini has been in Google Search. 

05:02

In the past year, we’ve answered billions of queries as part of 

05:06

our Search Generative Experience. 

05:08

People are using it to Search in entirely new ways. 

05:12

And asking new types of questions, longer and more 

05:15

complex queries, even searching with photos, and getting back 

05:20

the best the web has to offer. We’ve been testing this 

05:24

experience outside of Labs, and we’re encouraged to see not only 

05:28

an increase in Search usage, but also an increase in user 

05:32

satisfaction. I’m excited to announce that 

05:35

we’ll begin launching this fully revamped experience, AI 

05:39

Overviews, to everyone in the U.S. this week. 

05:42

And we’ll bring it to more countries soon.

05:51

[Cheers and Applause]. There’s so much innovation 

05:53

happening in Search. Thanks to Gemini we can create 

05:57

much more powerful search experiences, including within 

06:00

our products. Let me show you an example in 

06:03

Google Photos. We launched Google Photos almost 

06:06

nine years ago. Since then, people have used it 

06:09

to organize their most important memories. 

06:12

Today that amounts to more than 6 billion photos and videos 

06:16

uploaded every single day. And people love using Photos to 

06:21

search across their life. With Gemini, we’re making that a 

06:24

whole lot easier. Say you’re at a parking station 

06:28

ready to pay, but you can’t recall your license plate 

06:31

number. Before, you could search Photos 

06:33

for keywords and then scroll through years’ worth of photos, 

06:37

looking for the right one. Now, you can simply ask Photos. 

06:43

It knows the cars that appear often, it triangulates which one 

06:46

is yours, and just tells you  the license plate number.

06:55

[Cheers and Applause]. And Ask Photos can help you 

06:57

search your memories in a deeper way. 

07:00

For example, you might be reminiscing about your daughter 

07:03

Lucia’s early milestones. You can ask photos, when did Lucia learn to swim? 

07:09

And you can follow up with up with something more complex. 

07:13

Show me how Lucia's swimming has progressed. Here, Gemini goes beyond a 

07:19

simple search, recognizing different contexts from doing 

07:23

laps in the pool, to snorkeling in the ocean, to the text and 

07:27

dates on her swimming certificates. 

07:29

And Photos packages it all up together in a summary, so you 

07:33

can really take it all in, and relive amazing memories all over 

07:37

again. We’re rolling out Ask Photos 

07:40

this summer, with more capabilities to come.

07:50

[Cheers and Applause]. Unlocking knowledge across 

07:51

formats is why we built Gemini to be multimodal from the ground 

07:54

up. It’s one model, with all the 

07:57

modalities built in. So not only does it understand 

08:00

each type of input, it finds connections between them. 

08:04

Multimodality radically expands the questions we can ask, and 

08:08

the answers we will get back. Long context takes this a step 

08:12

further, enabling us to bring in even more information, hundreds 

08:17

of pages of text, hours of audio, a full hour of video, or 

08:21

entire code repos. Or, if you want, roughly 96 

08:26

Cheesecake Factory menus. [Laughter]. 

08:29

For that many menus, you’d need a one million token context 

08:32

window, now possible with Gemini 1.5 Pro. 

08:36

Developers have been using it in super interesting ways. 

08:39

Let’s take a look. >> I remember the announcement, 

08:53

the 1 million token context window, and my first reaction 

08:57

was there's no way they were able to achieve this. 

08:59

>> I wanted to test its technical skills, so I uploaded 

09:04

a line chart. It was temperatures between like 

09:09

Tokyo and Berlin and how they were across the 12 months of the 

09:11

year. >> So  

09:12

I got in there and I threw in the Python library that was 

09:16

really struggling with and I just asked it a simple question. 

09:21

And it nailed it. It could find specific 

09:26

references to comments in the code and specific requests that 

09:30

people had made and other issues that people had had, but then 

09:34

suggest a fix for it that related to what I was working 

09:38

on. >> I immediately tried to kind 

09:41

of crash it. So I took, you know, four or 

09:44

five research papers I had on my desktop, and it's a mind-blowing 

09:48

experience when you add so much text, and then you see the kind 

09:52

of amount of tokens you add is not even at half the capacity. 

09:55

>> It felt a little bit like Christmas because you saw things 

09:59

kind of peppered up to the top of your feed about, like, oh, 

10:01

wow, I built this thing, or oh, it's doing this, and I would 

10:05

have never expected. >> Can I shoot a video of my 

10:07

possessions and turn that into a searchable database? 

10:11

So I ran to my bookshelf, and I shot video just panning my 

10:14

camera along the bookshelf and I fed the video into the model. 

10:18

It gave me the titles and authors of the books, even 

10:21

though the authors weren't visible on those book spines, 

10:24

and on the bookshelf there was a squirrel nut cracker sat in 

10:27

front of the book, truncating the title. 

10:29

You could just see the word "sightsee", and it still guessed 

10:32

the correct book. The range of things you can do 

10:33

with that is almost unlimited. >> So at that point for me was 

10:36

just like a click, like, this is it. 

10:39

I thought, like, I had like  a super power in my hands. 

10:41

>> It was poetry. It was beautiful. 

10:43

I was so happy! This is going to be amazing! 

10:48

This is going to help people! >> This is kind of where the 

10:50

future of language models are going. 

10:52

Personalized to you, not because you trained it to be personal to 

10:58

you, but personal to you because you can give it such a vast 

11:02

understanding of who you are. [Applause]. 

11:11

>>SUNDAR PICHAI: We’ve been rolling out Gemini 1.5 Pro with 

11:14

long context in preview over the last few months. 

11:17

We’ve made a series of quality improvements across translation, 

11:21

coding, and reasoning. You’ll see these updates 

11:24

reflected in the model starting today. 

11:27

I'm excited to announce that we’re bringing this improved 

11:29

version of Gemini 1.5 Pro to all developers globally.

11:41

[Cheers and Applause]. In addition, today Gemini 1.5 

11:44

Pro with 1 million context is now directly  available for consumers in Gemini Advanced,  

11:50

and can be used across 35 languages. One million tokens is opening up 

11:56

entirely new possibilities. It’s exciting, but I think we 

12:01

can push ourselves even further. So today, we are expanding the 

12:05

context window to 2 million Tokens.

12:15

[Cheers and Applause]. We are making it available  

12:16

for developers in private preview. It's amazing to look back and 

12:20

see just how much progress we've made in a few months. 

12:24

This represents the next step on our journey  towards the ultimate goal of infinite context. 

12:30

Okay. So far,  

12:31

we’ve talked about two technical advances: 

12:33

multimodality and long context. Each is powerful on its own. 

12:39

But together, they unlock deeper capabilities, and more 

12:42

intelligence. Let’s see how this comes to life 

12:46

with Google Workspace. People are always searching 

12:49

their emails in Gmail. We are working to make it much 

12:52

more powerful with Gemini. Let’s look at how. 

12:56

As a parent, you want to know everything that’s going on with 

13:00

your child’s school. Okay, maybe not everything, but 

13:04

you want to stay informed. Gemini can help you keep up. 

13:08

Now we can ask Gemini to summarize all recent emails from 

13:12

the school. In the background, it’s 

13:15

identifying relevant emails, and even analyzing attachments, like 

13:19

PDFs. And you get a summary of  

13:21

the key points and action items. So helpful. 

13:25

Maybe you were traveling this week and couldn’t make the PTA 

13:28

meeting. The recording of the meeting is 

13:31

an hour long. If it’s from Google Meet, you 

13:34

can ask Gemini to give you the highlights.

13:43

[Cheers and Applause]. There’s a parents group looking 

13:44

for volunteers, and you’re free that day. 

13:47

So of course, Gemini can draft a reply. 

13:50

There are countless other examples of how this can make 

13:52

life easier. Gemini 1.5 Pro is available 

13:56

today in Workspace Labs. Aparna will share more later on.

14:06

[Applause]. We just looked at an example with text outputs. 

14:14

But with a multimodal model, we can do so much more. 

14:17

To show you an early demo of an audio output in NotebookLM, 

14:22

here’s Josh. >>JOSH WOODWARD: Hi, everyone! 

14:32

Last year, at I/O, we introduced Notebook LM, a research and 

14:37

writing tool grounded in the information you give it. 

14:40

Since then, we've seen a lot of momentum with students and 

14:44

teachers using it. And today, Gemini 1.5 Pro is 

14:48

coming to Notebook LM, and it's great. 

14:51

Let me show you. So here we are in Notebook LM. 

14:55

You can load it up with all of the materials here on the left. 

14:59

In this notebook, I've been using it with my younger son,  

15:02

and I've added some of his science worksheets,  a few slide decks from the teacher, and even an  

15:08

open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook  

15:16

guide with a helpful summary and can generate  a study guide, an FAQ, or even quizzes. 

15:23

But for my son Jimmy, he really learns  best when he can listen to something. 

15:29

So we've prototyped a new feature with  Gemini, and it's called Audio Overviews. 

15:35

Notebook LM is going to use all of the materials on the left  

15:39

as input and output them into a lively  science discussion, personalized for him. 

15:46

Let's take a listen. >> So let's dive into physics. 

15:49

What's on deck for today? >> Well, we're starting with the 

15:52

basics. Force in motion. 

15:53

>> Okay. >> And that, of course, means we 

15:55

have to talk about Sir Isaac Newton and his three laws of 

15:57

motion. >> Ah,  

15:58

yes, the foundation -- For understanding how objects move and interact.

16:02

>>JOSH WOODWARD: Oh, yes, this is  where multimodal really shines. 

16:06

Now, it generated this audio  discussion based on that text material. 

16:11

And what's amazing is that my son  and I can join into the conversation. 

16:17

And steer it whichever direction we want. 

16:20

When I tap "join." >> Hold on. 

16:23

We have a question. What's up, Josh? 

16:26

>>JOSH WOODWARD: Yeah, can you give my son Jimmy a basketball 

16:29

example? >> Hey, Jimmy! 

16:35

That's a fantastic idea! Basketball is actually a great 

16:38

way to visualize force in motion. 

16:40

Let's break it down. >> Okay. 

16:41

So first, imagine a basketball just sitting there on the court. 

16:45

It's not moving, right? 

16:46

That's because all the forces acting on it are balanced. 

16:49

The downward pull of gravity -- >>JOSH WOODWARD: Pretty cool, right?

16:58

[Cheers and Applause]. I gotta say, the first  

17:00

time my son heard this, you should  have seen how big his eyes got! 

17:04

Because he was gripped. They were talking to him. 

17:07

He was learning science through  the example of basketball. 

17:11

His favorite sport. Now, what's interesting  

17:13

is under the hood, you saw that Gemini had used  some of the concepts of gravity, Sir Isaac Newton,  

17:19

but nothing in there was about basketball. It connected the dots and created that  

17:24

age-appropriate example for him. And this is what's becoming  

17:29

possible with the power of Gemini. You can give it lots of information in  

17:34

any format, and it can be transformed in a way  that's personalized and interactive for you. 

17:42

Back to you, Sundar. [Applause]. 

17:50

>>SUNDAR PICHAI: Thanks, Josh. The demo shows the real 

17:52

opportunity with multimodality. Soon you’ll be able to mix and 

17:56

match inputs and outputs. This is what we mean when we say 

17:59

it’s an I/O for a new generation. 

18:02

And I can see you all out there thinking about the 

18:05

possibilities. But what if we could go even 

18:07

further? That’s one of the opportunities 

18:10

we see with AI agents. Let me take a step back and 

18:13

explain what I mean by that. I think about them as 

18:17

intelligent systems that show reasoning, planning, and memory. 

18:21

Are able to “think” multiple steps ahead, work across 

18:25

software and systems, all to get something done on your behalf, 

18:30

and most importantly, under your supervision. 

18:33

We are still in the early days, and you’ll see glimpses of our 

18:37

approach throughout the day, but let me show you the kinds of use 

18:41

cases we are working hard to solve. Let’s talk about shopping. 

18:46

It’s pretty fun to shop for shoes, and a lot less fun to 

18:50

return them when they don’t fit. Imagine if Gemini could do all 

18:54

the steps for you: Searching your inbox for the receipt, 

18:59

locating the order number from your email, filling out a return 

19:03

form, and even scheduling a pickup. That's much easier, right?

19:10

[Applause]. Let’s take another example 

19:14

that’s a bit more complex. Say you just moved to Chicago. 

19:18

You can imagine Gemini and Chrome working together to help 

19:22

you do a number of things to get ready: Organizing, reasoning, 

19:27

synthesizing on your behalf. For example, you’ll want to 

19:30

explore the city and find services nearby, from 

19:33

dry-cleaners to dog-walkers. You will have to update your new  

19:37

address across dozens of Web sites. Gemini can work across these 

19:42

tasks and will prompt you for more information when needed, so 

19:46

you are always in control. That part is really important. 

19:49

as we prototype these experiences. We are thinking hard about how to do it in a way  

19:55

that's private, secure and works for everyone. These are simple-use cases, but 

20:01

they give you a good sense of the types of problems we want to 

20:04

solve, by building intelligent systems that think ahead, 

20:08

reason, and plan, all on your behalf. 

20:11

The power of Gemini, with multimodality, long context and 

20:16

agents, brings us closer to our ultimate goal: Making AI helpful 

20:22

for everyone. We see this as how we will make  

20:25

the most progress against our mission. Organizing the world’s 

20:29

information across every input, making it accessible via any 

20:34

output, and combining the world’s information with the 

20:37

information in your world in a way that’s truly useful for you. 

20:42

To fully realize the benefits of AI, we will continue to break 

20:46

new ground. Google DeepMind is hard at work 

20:50

on this. To share more, please welcome, 

20:52

for the first time on the I/O stage, Sir Demis.

20:58

[Applause]. >>DEMIS HASSABIS:  

21:10

Thanks, Sundar. 

21:11

It's so great to be here. Ever since I was a kid, playing 

21:16

chess for the England Junior Team, I’ve been thinking about 

21:19

the nature of intelligence. I was captivated by the idea of 

21:23

a computer that could think like a person. 

21:26

It’s ultimately why I became a programmer and studied 

21:29

neuroscience. I co-founded DeepMind in 2010 

21:33

with the goal of one day building AGI: Artificial general 

21:37

intelligence, a system that has human-level cognitive 

21:41

capabilities. I’ve always believed that if we 

21:44

could build this technology responsibly, its impact would be 

21:48

truly profound and it could benefit humanity in incredible 

21:51

ways. Last year,  

21:54

we reached a milestone on that path when we  formed Google DeepMind, combining AI talent  

21:58

from across the company in to one super unit. Since then, we've built AI systems that can  

22:04

do an amazing range of things, from turning  language and vision into action for robots,  

22:10

navigating complex virtual environments, involving  Olympiad level math problems, and even discovering  

22:18

thousands of new materials. Just last week, we announced  

22:22

our next generation AlphaFold model. It can predict the structure and interactions  

22:27

of nearly all of life's molecules, including how  proteins interact with strands of DNA and RNA. 

22:34

This will accelerate vitally important  biological and medical research from  

22:38

disease understanding to drug discovery. And all of this was made possible with the  

22:44

best infrastructure for the AI era, including  our highly optimized tensor processing units. 

22:51

At the center of our efforts is our Gemini model. It's built up from the ground up to be natively  

22:57

multimodal because that's how we interact  with and understand the world around us. 

23:02

We've built a variety of  models for different use cases. 

23:05

We've seen how powerful Gemini 1.5 Pro is,  but we also know from user feedback that some 

23:11

applications need lower latency and a lower cost to serve. 

23:16

So today we’re introducing Gemini 1.5 Flash.

23:21

[Cheers and Applause]. Flash is a lighter-weight model 

23:30

compared to Pro. It’s designed to be fast and 

23:33

cost-efficient to serve at scale, while still featuring 

23:36

multimodal reasoning capabilities and breakthrough 

23:38

long context. Flash is optimized for tasks 

23:42

where low latency and efficiency matter most. 

23:45

Starting today, you can use 1.5 Flash and 1.5 Pro with up to one 

23:50

million tokens in Google AI Studio and Vertex AI. 

23:54

And developers can sign up to try two million tokens. 

23:58

We’re so excited to see what all of you will create with it. 

24:02

And you'll hear a little more  about Flash later on from Josh. 

24:07

We’re very excited by the progress we’ve made so far with 

24:09

our family of Gemini models. But we’re always striving to 

24:12

push the state-of-the-art even further. 

24:16

At any one time we have many different models in training. 

24:19

And we use our very large and powerful ones to help teach and 

24:22

train our production-ready models. 

24:26

Together with user feedback, this cutting-edge research will 

24:28

help us to build amazing new products for billions of people. 

24:33

For example, in December, we shared a glimpse into the future 

24:37

of how people would interact with multimodal AI, and how this 

24:41

would end up powering a new set of transformative experiences. 

24:46

Today, we have some exciting new progress to share about the 

24:49

future of AI assistants that we’re calling Project Astra.

24:58

[Cheers and Applause]. For a long time, we’ve wanted to 

25:00

build a universal AI agent that can be truly helpful in everyday 

25:04

life. Our work making this vision a 

25:06

reality goes back many years. It's why we made Gemini multimodal  

25:10

from the very beginning. An agent like this has to 

25:14

understand and respond to our complex and dynamic world just 

25:17

like we do. It would need to take in and 

25:20

remember what it sees so it can understand context and take 

25:23

action. And it would have to be 

25:25

proactive, teachable and personal, so you can talk to it 

25:28

naturally, without lag or delay. While we’ve made great strides 

25:33

in developing AI systems that can understand multimodal 

25:36

information, getting response time down to something 

25:39

conversational is a difficult engineering challenge. 

25:42

Building on our Gemini model, we’ve developed agents that can 

25:45

process information faster by continuously encoding video 

25:49

frames, combining the video and speech input into a timeline of 

25:53

events, and caching this for efficient recall. 

25:56

We’ve also enhanced how they sound, with a wider range of 

26:00

intonations. These agents better understand 

26:03

the context you’re in, and can respond quickly in conversation, 

26:06

making the pace and quality of interaction feel much more 

26:09

natural. Here’s a video of our prototype, 

26:13

which you’ll see has two parts. Each part was captured in a 

26:17

single take, in real time. >> Okay. Let's do some tests. 

26:24

Tell me when you see something that makes sound. 

26:28

>> I see a speaker, which makes sound. 

26:31

>> What is that part of the speaker called? 

26:36

>> That is the tweeter. It produces high frequency 

26:40

sounds. >> Give me a creative 

26:45

alliteration about these. >> Creative crayons color 

26:50

cheerfully. They certainly craft colorful 

26:53

creations. >> What does that part of the 

27:00

code do? >> This code defines encryption 

27:05

and decryption functions. It seems to use AES-CBC 

27:10

encryption to encode and decode data based on a key and an 

27:14

initialization vector (IV). >> That's right. 

27:21

What neighborhood do you think I'm in? 

27:24

>> This appears to be the Kings Cross area of London. 

27:28

It is known for its railway station and transportation 

27:31

connections. >> Do you remember where you saw 

27:35

my glasses? >> Yes, I do. 

27:39

Your glasses were on the desk near a red apple.

27:54

[Applause]. >> What can I add here to make 

27:56

this system faster? >>  

28:00

Adding a cache between the server and database could 

28:02

improve speed. >> What does this remind you of? 

28:09

>> Schroedinger's cat. >> All right. 

28:15

Give me a band name for thisduo. 

28:19

>> Golden Stripes. >> Nice. Thanks, Gemini.

28:28

[Applause]. >>DEMIS HASSABIS:  

28:36

I think you'll agree it's amazing to see how 

28:38

far AI has come, especially when it comes to spatial 

28:42

understanding, video processing and memory. 

28:45

It’s easy to envisage a future where you can have an expert 

28:49

assistant by your side through your phone or new exciting form 

28:52

factors like glasses. Some of these agent capabilities 

28:56

will come to Google products like the Gemini app later this 

28:59

year. For those of you onsite today, 

29:02

you can try out a live demo version of this experience in 

29:05

the AI Sandbox area. [Cheers and Applause]. 

29:14

Next, let’s take a look at how our innovations are helping 

29:17

people bring new creative ideas to life. 

29:20

Today, we’re introducing a series of updates across our 

29:23

generative media tools with new models covering image, music and 

29:28

video. Over the past year, we’ve been 

29:31

enhancing quality, improving  safety and increasing access. 

29:35

To help tell this story, here’s Doug. 

29:49

[Applause]. >>DOUG ECK: Thanks, Demis. 

29:51

Over the past few months, we’ve been working hard 

29:54

to build a new image generation model from the ground up, with 

29:58

stronger evaluations, extensive red teaming, and 

30:01

state-of-the-art watermarking with SynthID. 

30:05

Today, I’m so excited to introduce Imagen 3. 

30:09

It’s our most capable image generation model yet. 

30:13

Imagen 3 is more photorealistic. 

30:15

You can literally count the whiskers on its snout. 

30:19

With richer details, like the incredible sunlight in this 

30:22

shot, and fewer visual artifacts or distorted images. 

30:27

It understands prompts written the way people write. 

30:30

The more creative and detailed you are, the better. 

30:33

And Imagen 3 remembers to incorporate small details like 

30:37

the ‘wildflowers’ or ‘a small blue bird’ in this longer 

30:40

prompt. Plus, this is our best model yet 

30:43

for rendering text, which has been a challenge for image 

30:46

generation models. In side-by-side comparisons, 

30:50

independent evaluators preferred Imagen 3 over other  

30:54

popular image generation models. In sum, Imagen 3 is our 

30:58

highest-quality image generation model so far. 

31:01

You can sign up today to try Imagen 3 in ImageFX, part of our 

31:05

suite of AI tools at labs.Google, and it will be 

31:08

coming soon to developers and enterprise customers in Vertex 

31:11

AI. Another area, full of creative 

31:15

possibility, is generative music. 

31:19

I’ve been working in this space for over 20 years and this has 

31:22

by far the most exciting year of my career. We’re exploring ways of working 

31:26

with artists to expand their creativity with AI. 

31:30

Together with YouTube, we’ve been building Music AI Sandbox, 

31:34

a suite of professional music AI tools that can create new  

31:37

instrumental sections from scratch,  transfer styles between tracks, and more. 

31:43

To help us design and test them, we’ve been working closely with 

31:45

incredible musicians, songwriters and producers. 

31:50

Some of them made even entirely new  songs in ways that would not have been  

31:53

possible without these tools. Let’s hear from some of the 

31:57

artists we’ve been working with. >>  

32:04

I'm going to put this right back into the Music AI tool. 

32:07

The same Boom, boom, bam, boom, boom. 

32:10

What happens if Haiti meets Brazil? 

32:13

Dude, I have no clue what's about to be sprat out. 

32:16

This is what excites me. Da da See see see. 

32:23

As a hip hop producer, we dug in the crates. 

32:26

We playin’ these vinyls, and the part where there's no vocal, we 

32:29

pull it, we sample it, and we create an entire song around 

32:33

that. So right now we digging in the 

32:35

infinite crate. It’s endless. 

32:37

Where I found the AI really useful for me, this  way to like fill in the sparser sort of elements  

32:43

of my loops. Okay. 

32:44

Let's try bongos. We're going to putviola. 

32:47

We're going to put rhythmic clapping, and we're going to see 

32:51

what happens there. Oh, and it makes it sound, 

32:55

ironically, at the end of the day, a little more human. 

32:57

So then this is entirely Google's loops right here. 

33:01

These are Gloops. So it's like having, like, this 

33:07

weird friend that's just like, 

33:09

try this, try that. And then you're like, Oh, okay. 

33:12

Yeah. No, that's pretty dope. 

33:20

(indistinct noises) >> The tools are capable of 

33:22

speeding up the process of what's in my head, getting it 

33:25

out. You're able to move lightspeed 

33:28

with your creativity. This is amazing. 

33:31

That right there. [Applause]. 

33:40

>>DEMIS HASSABIS: I think this really shows what’s possible 

33:42

when we work with the artist community on the future of 

33:45

music. You can find some brand new 

33:48

songs from these acclaimed artists and songwriters on their 

33:50

YouTube channels now. There's one more area I'm  

33:54

really excited to share with you. Our teams have made some 

33:57

incredible progress in generative video. 

34:01

Today, I’m excited to announce our newest, most capable 

34:04

generative video model, called Veo. 

34:12

[Cheers and Applause]. Veo creates high-quality, 1080P 

34:14

videos from text, image and video prompts. 

34:18

It can capture the details of your instructions in different 

34:20

visual and cinematic styles. You can prompt for things like 

34:24

aerial shots of a landscape or a time lapse, and further edit 

34:27

your videos using additional prompts. 

34:30

You can use Veo in our new experimental tool called 

34:32

VideoFX. We’re exploring features like 

34:36

storyboarding and generating longer scenes. 

34:39

Veo gives you unprecedented creative control. 

34:44

Techniques for generating static images have come a long way. 

34:47

But generating video is a different challenge altogether. 

34:51

Not only is it important to understand where an object or 

34:54

subject should be in space, it needs to maintain this 

34:57

consistency over time, just like the car in this video. 

35:02

Veo builds upon years of our pioneering generative video 

35:05

model work, including GQN, Phenaki, Walt, VideoPoet, 

35:10

Lumiere and much more. We combined the best of these 

35:14

architectures and techniques to improve consistency, quality and 

35:18

output resolution. To see what Veo can do, we put 

35:22

it in the hands of an amazing filmmaker. 

35:25

Let’s take a look. >>DONALD GLOVER: Well, I've been 

35:30

interested in AI for a couple of years now. 

35:33

We got in contact with some of the people at Google, and they 

35:35

had been working on something of their own. 

35:38

So we're all meeting here at Gilga Farms to make a short 

35:42

film. >>KORY MATHEWSON: The core 

35:43

technology is Google Deep Mind’s 

35:45

generative video model that has been trained to convert input 

35:49

text into output video. [Laughter]. 

35:53

>>DONALD GLOVER: It looks good. >>KORY MATHEWSON: We are able to 

35:55

bring ideas to life that were otherwise not possible. 

35:58

We can visualize things of time scale that’s 10 or 100 times 

36:02

faster than before. >>MATTHIEU KIM LORRAIN: When 

36:03

you're shooting, you can't really iterate as much as you 

36:05

wish. And so we've been hearing the 

36:07

feedback that it allows for more optionality, more iteration, 

36:12

more improvisation. >>DONALD GLOVER: But that's 

36:14

what's cool about it. It's like you can make a mistake 

36:16

faster. That's all you really want at 

36:17

the end of the day, at least in art, is just to make mistakes 

36:20

fast. >>KORY MATHEWSON: So, using 

36:21

Gemini’s multimodal capabilities to optimize the model’s training 

36:25

process, VEO is better able to capture the nuance from prompts. 

36:29

So this includes cinematic techniques and visual effects, 

36:32

giving you total creative 

36:34

control. >>DONALD GLOVER: Everybody's 

36:36

going to become a director and everybody should be a director. 

36:39

Because at the heart of all of this is just storytelling. 

36:42

The closer we are to being able to tell each other our stories, 

36:46

the more we will understand each other. 

36:48

>>KORY MATHEWSON: These models are really enabling us to be 

36:50

more creative and to share that creativity with each other.

36:57

[Cheers and Applause]. >>DEMIS HASSABIS:  

37:09

Over the coming weeks some of these 

37:11

features will be available to select creators through VideoFX 

37:15

at labs.google, and the waitlist is open now. 

37:19

Of course, these advances in generative video go beyond the 

37:22

beautiful visuals you’ve seen today. 

37:24

By teaching future AI models how to solve problems creatively, or 

37:29

in effect simulate the physics of our world, we can build more 

37:32

useful systems that help people communicate in new ways, and 

37:36

thereby advance the frontiers of AI. 

37:40

When we first began this journey  to build AI more than 15 years ago,  

37:44

we knew that one day it would change everything. 

37:47

Now that time is here. And we continue to be amazed by 

37:51

the progress we see and inspired by the advances still to come, 

37:55

on the path to AGI. Thanks, and back to you Sundar.

38:07

[Applause]. >>SUNDAR PICHAI: Thanks, Demis. 

38:09

A huge amount of innovation is happening at Google DeepMind. 

38:12

it’s amazing how much progress we have made in a year. 

38:15

Training state-of-the-art models requires a lot of computing 

38:19

power. Industry demand for ML compute 

38:22

has grown by a factor of 1 million in the last six years. 

38:27

And every year, it increases tenfold. 

38:30

Google was built for this. For 25 years, we’ve invested in 

38:34

world-class technical infrastructure, from the 

38:38

cutting-edge hardware that powers Search, to our custom 

38:41

tensor processing units that power our AI advances. 

38:45

Gemini was trained and served entirely on our fourth and fifth 

38:49

generation TPUs. And other leading AI companies, 

38:52

like Anthropic, have trained their models on TPUs as well. 

38:56

Today, we are excited to announce the  sixth generation of TPUs called Trillium.

39:07

[Cheers and Applause]. Trillium delivers a 4.7x 

39:10

improvement in compute performance per chip over the 

39:13

previous generation. So our most efficient and performant TPU to date. 

39:18

We will make trillium available to  our cloud customers in late 2024. 

39:24

Alongside our TPUs, we are proud to offer  CPUs and GPUs to support any workload. 

39:30

That includes the new Axion processes we  announced last month, our first custom  

39:35

on-base CPU with industry-leading  performance and energy efficiency. 

39:44

We are also proud to be one of the first  cloud providers to offer Nvidia's cutting edge 

39:50

Blackwell GPUs, available in early 2025.

39:57

[Applause]. We’re fortunate to have a 

39:59

longstanding partnership with Nvidia, and are excited to bring 

40:03

Blackwell's capabilities to our customers. Chips are a foundational part of 

40:07

our integrated end-to-end system, from 

40:10

performance-optimized hardware and open software to flexible 

40:15

consumption models. This all comes together in our 

40:18

AI Hypercomputer, a groundbreaking supercomputer 

40:22

architecture. Businesses and developers are 

40:25

using it to tackle more complex challenges, with more than twice 

40:29

the efficiency relative to just buying the raw hardware and 

40:33

chips. Our AI Hypercomputer 

40:36

advancements are made possible in part because of our approach 

40:40

to liquid cooling in our data centers. 

40:43

We’ve been doing this for nearly a decade, long before it became 

40:46

state of the art for the industry. 

40:48

And today, our total deployed fleet capacity for liquid 

40:52

cooling systems is nearly 1 Giga Watt, and growing. 

40:56

That’s close to 70 times the capacity of any other fleet.

41:02

[Applause]. Underlying this is the sheer 

41:07

scale of our network, which connects our infrastructure 

41:10

globally. Our network spans more than 2 

41:13

million miles of terrestrial and subsea fiber: Over 10 times the 

41:18

reach of the next leading cloud provider. 

41:20

We will keep making the investments necessary to advance 

41:24

AI innovation and deliver state-of-the-art capabilities. 

41:28

And one of our greatest areas of investment and innovation is in 

41:32

our founding product, Search. 25 years ago we created Search 

41:37

to help people make sense of the waves of information moving 

41:41

online. With each platform shift, we’ve 

41:44

delivered breakthroughs to help answer your questions better. 

41:48

On mobile, we unlocked new types of questions and answers, using 

41:52

better context, location awareness, and real-time 

41:55

information. With advances in natural 

41:58

language understanding and computer vision, we enabled new 

42:01

ways to search with your voice, or a hum to find your new 

42:06

favorite song, or an image of that flower you saw on your 

42:10

walk. And now you can even circle to 

42:13

Search those cool new shoes you might want to buy. 

42:17

Go for it, you can always return them later! 

42:23

Of course, Search in the Gemini Era will take this to a whole 

42:26

new level. Combining our infrastructure 

42:28

strengths, the latest AI capabilities, our high bar for 

42:33

information quality, and our decades of experience connecting 

42:36

you to the richness of the web. The result is a product that 

42:40

does the work for you. Google Search is generative AI 

42:45

at the scale of human curiosity. And it’s our most exciting 

42:49

chapter of Search yet. To tell you more, here’s Liz.

42:57

[Applause]. >>LIZ REID:  

43:05

Thanks, Sundar! With each of these platform 

43:08

shifts, we haven’t just adapted, we’ve expanded what’s possible 

43:14

with Google Search. And now, with generative AI, 

43:18

Search will do more for you than you ever imagined. 

43:21

So whatever’s on your mind, and whatever you need to get done, 

43:26

just ask. And Google will do the Googling 

43:29

for you. All the advancements you’ll see 

43:32

today are made possible by a new Gemini model, customized for 

43:36

Google Search. What really sets this apart is 

43:40

our three unique strengths. First, our real-time information 

43:45

with over a trillion facts about people, places, and things. 

43:50

Second, our unparalleled ranking and quality systems, trusted for 

43:55

decades to get you the very best of the web. 

43:58

And third, the power of Gemini, which unlocks new agentive 

44:02

capabilities, right in Search. By bringing these three things 

44:07

all together, we are able to dramatically expand  what's possible with Google Search, yet again. 

44:13

This is Search in the Gemini era. 

44:16

So let's dig in. You've heard today about AI 

44:20

Overviews, and how helpful people are finding them. 

44:23

With AI Overviews, Google does the work for you. 

44:27

Instead of piecing together all the information yourself, you 

44:30

can ask your question, and as you see  here, you can get an answer instantly. 

44:36

Complete with a range of perspectives and links to dive 

44:40

deeper. As Sundar shared, AI Overviews 

44:44

will begin rolling out to everyone in the U.S. starting 

44:46

today, with more countries soon. By the end of the year, AI 

44:51

Overviews will come to over a billion people in Google Search. 

44:56

But this is just the first step. We’re making AI Overviews even 

45:01

more helpful for your most complex questions, the type that 

45:04

are really more like ten questions in one! 

45:07

You can ask your entire question, with all its 

45:10

sub-questions, and get an AI overview in just seconds. 

45:15

To make this possible, we’re introducing multi-step reasoning 

45:18

in Google Search. So Google can do the researching 

45:21

for you. For example, let’s say you’ve 

45:25

been trying to get into yoga and Pilates. 

45:28

Finding the right studio can take a lot of research. 

45:31

There are so many factors to consider! 

45:34

Soon you’ll be able to ask Search to: Find the best yoga or 

45:37

Pilates studios in Boston. And show you details on their 

45:40

intro offers, and walking time from Beacon Hill. 

45:45

As you can see here, Google gets to work for you, finding the 

45:48

most relevant information and bringing it together in your AI 

45:52

Overview. You get some studios with great 

45:55

ratings and their intro offers. You can see the distance for 

45:58

each, like this one is just a ten-minute walk away! 

46:03

Right below, you see where they're  located, laid out visually. 

46:07

And you've got all this from just a single search! Under the hood, our custom 

46:13

Gemini model acts as your AI agent, using what we call 

46:17

multi-step reasoning. It breaks your bigger question 

46:21

down into all its parts, and it figures out which problems it  

46:24

needs to solve and in what order. And thanks to our real-time info 

46:29

and ranking expertise, it reasons using the 

46:33

highest-quality information out there. 

46:37

So since you're asking about places, it taps into Google's 

46:40

index of information about the real world, with over 250 

46:44

million places, and updated in real-time. Including their ratings, 

46:49

reviews, business hours, and more. 

46:54

Research that might have taken you minutes or even hours, 

46:58

Google can now do on your behalf in just seconds. Next, let me show you another 

47:04

way multi-step reasoning in Google Search can make your life 

47:07

that much easier. Take planning, for example. 

47:10

Dreaming up trips and meal plans can be fun, but doing the work 

47:14

of actually figuring it all out, no, thank you. 

47:18

With Gemini in Search, Google does the planning with you. 

47:22

Planning is really hard for AI to get right. 

47:25

It's the type of problem that takes advanced reasoning and 

47:27

logic. After all, if you're meal 

47:30

planning, you probably don’t want mac'n cheese for breakfast, 

47:33

lunch and dinner. Okay, my kids might. 

47:38

But say you’re looking for a bit more variety. 

47:42

Now, you can ask Search to: Create a three-day meal plan for 

47:45

a group that’s easy to prepare. And here you get a plan with a 

47:49

wide range of recipes from across the web. 

47:52

This one for overnight oats looks particularly interesting. 

47:56

And you can easily head over to the Web site to learn how to prepare them. 

48:01

If you want to get more veggies in, you can simply ask Search to 

48:04

swap in a vegetarian dish. And just like that, Search  

48:08

customizes your meal plan. And you can export your meal 

48:11

plan or get the ingredients as a list, just by tapping here. 

48:16

Looking ahead, you could imagine asking Google to add everything 

48:20

to your preferred shopping cart. Then, we’re really cooking! 

48:25

These planning capabilities mean Search will be able to help plan 

48:28

everything from meals and trips to parties, dates, workout 

48:32

routines and more. So you can get all the fun of  

48:35

planning without any of the hassle. You’ve seen how Google Search 

48:41

can help with increasingly complex questions and planning. 

48:45

But what about all those times when you  don't know exactly what to ask and you  

48:49

need some help brain storming? When you come to Search for 

48:52

ideas, you’ll get more than an AI-generated answer. 

48:55

You’ll get an entire AI-organized page, custom-built 

48:59

for you and your question. Say you’re heading to Dallas to 

49:04

celebrate your anniversary and you're looking for the perfect restaurant. 

49:09

What you get here breaks AI out of the box and it brings it to the whole page. 

49:14

Our Gemini model uncovers the most interesting angles for you 

49:17

to explore and organizations these  results into these helpful clusters. 

49:23

Like, you might have never considered restaurants with live 

49:25

music. Or ones with historic charm! 

49:29

Our model even uses contextual factors, like the time of year. 

49:34

So since it’s warm in Dallas, you can get rooftop patios as an idea. 

49:39

And it pulls everything together into a dynamic, whole-page 

49:42

experience. You’ll start to see this new 

49:45

AI-organized search results page when you look for inspiration, 

49:50

starting with dining and recipes, and coming to movies, 

49:53

music, books, hotels, shopping, and more.

50:05

[Applause]. 

50:06

Today, you’ve seen how you can bring any question to Search, 

50:09

and Google takes the work out of searching. 

50:12

But your questions aren’t limited to words in a text box, 

50:15

and sometimes, even a picture can’t tell the whole story. 

50:19

Earlier, Demis showed you our latest advancements in video 

50:22

understanding. 

50:24

And I'm really excited to share that soon  you'll be able to ask questions with video,  

50:28

right in Google Search. Let me introduce Rose to show 

50:32

you this in a live demo. [Applause].

50:41

>>ROSE YAO: Thank you, Liz! I have always wanted a record player,  

50:46

and I got this one, and some  vinyls at a yard sale recently. 

50:50

But, umm, when I go to play a it, this thing keeps sliding off. 

50:54

I have no idea how to fix  it or where to even start! 

50:58

Before, I would have pieced together a bunch of searches to 

51:02

try to figure this out, like, what make is this record player? 

51:06

What’s the model? And, what is this thing actually 

51:08

called? But now I can just ask with a 

51:12

video. So let's try it. 

51:14

Let's do a live demo. I'm going to take a video and ask Google,  

51:20

why will this not stay in place? And in a near instant,  

51:26

Google gives me an AI overview. I get some reasons this might be 

51:30

happening, and steps I can take to troubleshoot. 

51:33

So it looks like first, this is called a tone arm. Very helpful. 

51:38

And it looks like it may be unbalanced,  and there's some really helpful steps here. 

51:42

And I love that because I'm new to all this. I can check out this helpful link from Audio  

51:47

Technica to learn even more. So that was pretty quick!

51:53

[Applause]. 

51:58

Let me walk you through what just happened. Thanks to a combination of our 

52:02

state-of-the-art speech models, our deep visual understanding, 

52:06

and our custom Gemini model, Search was able to understand 

52:10

the question I asked out loud and break down the video 

52:12

frame-by-frame. Each frame was fed into Gemini’s 

52:16

long context window that you heard about earlier today. 

52:19

Search could then pinpoint the exact make and model of my 

52:23

record player. And make sense of the motion 

52:26

across frames to identify that the tonearm was drifting. 

52:29

Search fanned out and combed the web to find relevant insights 

52:33

from articles, forums, videos, and more. 

52:36

And it stitched all of this together into my AI Overview. 

52:41

The result was music to my ears! Back to you, Liz. 

52:52

[Applause]. >>LIZ REID: Everything you saw 

52:53

today is just a glimpse of how we're reimagining Google Search 

52:57

in the Gemini era. We’re taking the very best of 

53:01

what makes Google, Google. All the reasons why billions of 

53:05

people turn to Google Search, and have relied on us for 

53:08

decades. And we’re bringing in the power 

53:11

of Gemini’s agentive capabilities. 

53:14

So Google will do the searching, the Researching. 

53:17

The planning. The brainstorming. 

53:20

And so much more. All you need to do, is ask. 

53:26

You'll start to see these features rolling out in Search 

53:29

in the coming weeks. Opt in to Search Labs to be 

53:32

among the first to try them out. Now let's take a look at how 

53:36

this all comes together in Google Search this year. 

53:43

>> Why is the lever not moving all the way?

54:48

[Applause].

54:51

>>APARNA PAPPU:  

55:31

Since last May, we've been hard at  work making Gemini for workspace  

55:36

even more helpful for businesses  and consumers across the world. 

55:42

Tens of thousands of customers have been using help me write, 

55:46

help me visualize and help me organize since we launched. 

55:50

And now, we're really excited that the new Gemini powered side 

55:55

panel will be generally available next month.

56:04

[Cheers and Applause]. One of our customers is a local 

56:06

favorite right here in California, Sports Basement. 

56:11

They rolled out Gemini for Workspace to the organization. 

56:14

And this has helped improve the productivity of  their customer support team by more than 30%. 

56:22

Customers love how Gemini grows  participation in meetings with  

56:27

automatic language detection and real-time  captions now expanding to 68 languages.

56:36

[Applause]. We are really excited about what 

56:42

Gemini 1.5 Pro unlocks for Workspace and AI Premium 

56:48

customers. Let me start by showing you 

56:51

three new capabilities coming to Gmail mobile. 

56:57

This is my Gmail account. Okay. 

57:00

So there's an E-mail up top from my husband. Help me sort out the roof repair thing, please. 

57:06

Now, we've been trying to find a  contractor to fix our roof, and with  

57:10

work travel, I have clearly dropped the ball. It looks like there's an E-mail thread on this  

57:16

with lots of E-mails that I haven't read. And luckily for me, I can simply tap the  

57:22

summarize option up top and skip  reading this long back and forth. 

57:28

Now, Gemini pulls up this helpful  mobile card as an overlay. 

57:32

And this is where I can read a nice summary of  all the salient information that I need to know. 

57:40

So here I see that we have a quote from Jeff at Green 

57:43

Roofing, and he's ready to start. Now, I know we had other bids  

57:48

and I don't remember the details. Previously, I would have had to do  

57:52

a number of searches in G-mail and then remember  and compare information across different E-mails. 

57:59

Now, I can simply type out my question right here  in the mobile card and say something like, compare  

58:05

my roof repair bids by price and availability. This new Q&A feature makes it so easy to get  

58:12

quick answers on anything in my inbox. For example, when are my shoes arriving,  

58:16

or what time do doors open for the  Knicks game, without having to first  

58:20

search G-mail and open an E-mail and look for the  specific information in attachments and so on. 

58:26

Anyway, back to my roof. It looks like Gemini has found details that I got  

58:30

from two other contractors in completely different  E-mail threads, and I have this really nicely  

58:36

organized summary and I can do a quick comparison. So it seems like Jeff's quote was right  

58:42

in the middle and he can start  immediately, so Green Roofing it is. 

58:46

I'll open that last E-mail from  Jeff and confirm the project. 

58:51

And look at that. I see some suggested replies from Gemini. 

58:56

Now, what is really, really neat about this  evolution of smart reply is that it's contextual. 

59:02

Gemini understood the back-and-forth in that  thread and that Jeff was ready to start. 

59:07

So offers me a few customize  options based on that context. 

59:11

So, you know, here I see I have decline  the service, suggest a new time. 

59:16

I'll choose proceed and confirm time. I can even see a preview of the  

59:20

full reply simply by long pressing. This looks reasonable, so I'll hit send. 

59:28

These new capabilities in Gemini and G-mail  will start rolling out this month to labs users.

59:38

[Applause]. Okay. 

59:42

So one of the really neat things about WorkSpace apps like 

59:45

G-mail, Drive, Docs, Calendar, is how well they work together, 

59:50

and in our daily lives we often have information that flows from 

59:53

one app to another. Like, say, adding a calendar entry from G-mail. 

59:58

Or creating reminders from a spreadsheet tracker. 

60:02

But what if Gemini could make these journeys totally seamless? 

60:07

Perhaps even automate them for you entirely. 

60:12

Let me show you what I mean with a real life example. 

60:17

My sister is a self-employed photographer, and her in box is 

60:22

full of appointment bookings, receipts,  client feedback on photos and so much more. 

60:28

Now, if you're a freelancer or a small  business, you really want to focus on your  

60:32

craft and not on bookkeeping and logistics. So let's go to her in box and take a look. 

60:40

Lots of unread E-mails. Let's click on the first one. 

60:45

It's got a PDF attachment. From a hotel, there's a receipt. 

60:49

And I see a suggestion in the side panel. Help me organize and track my receipts. 

60:54

Let's click on this prompt. The side panel now will show  

60:58

me more details about what that really means,  and as you can see, there's two steps here. 

61:03

Step one, create a Drive folder and put this  receipt and 37 others it's found into that folder. 

61:10

Makes sense. Step 2,  

61:13

extract the relevant information from those  receipts in that folder into a new spreadsheet. 

61:18

Now, this sounds useful. Why not? 

61:21

I also have the option to edit  these actions or just hit okay. 

61:25

So let's hit okay. Gemini will now complete the two 

61:30

steps described above, and this  is where it gets even better. 

61:34

Gemini offers the option to automate this  so that this particular work flow is run on  

61:40

all future E-mails, keeping your Drive folder and  expense sheet up to date with no effort from you.

61:51

[Applause]. Now, we know that creating 

61:56

complex spread sheets can be daunting for most people. 

61:59

But with this automation, Gemini does the hard work of extracting 

62:03

all the right information from all the files from  in that folder and generates the sheet for you. 

62:08

So let's take a look. Okay. 

62:10

It's super well organized, and it  even has a category for expense type. 

62:16

Now, we have this sheet. Things can get even more fun. 

62:20

We can ask Gemini questions. Questions like, show me where the money is spent. 

62:26

Gemini not only analyzes the data from the  sheet, but also creates a nice visual to  

62:33

help me see the complete breakdown by category. And you can imagine how this extends to all sorts  

62:39

of use cases in your in box, like travel expenses,  shopping, remodeling projects, you name it. 

62:46

All of that information in G-mail can be put to  good use and help you work, plan and play better. 

62:54

Now, this particular -- [Applause]. 

62:57

I know! 

63:02

This particular ability to organize your  attachments in Drive and generate a sheet  

63:06

and do data analysis via Q&A will be  rolling out to Labs users this September. 

63:12

And it's just one of the many automations  that we're working on in WorkSpace. 

63:18

Workspace in the Gemini era will continue to unlock new ways of 

63:22

getting things done. We’re building advanced agentive 

63:26

experiences, including customizing how you use Gemini. 

63:32

Now, as we look to 2025 and beyond, we're exploring  

63:36

entirely new ways of working with AI. Now, with Gemini, you have an AI-powered  

63:42

assistant always at your side. But what if you could expand how 

63:46

you interact with AI? For example, when we work with 

63:50

other people, we mention them in comments  and docs, so we send them E-mail. 

63:55

We have group chats with them, et cetera. And it's not just how we collaborate with  

64:00

each other, but we each have a  specific role to play in the team. 

64:04

And as the team works together, we build a  set of collective experiences and contexts  

64:09

to learn from each other. We have the combined set of 

64:14

skills to draw from when we need help. So how could we introduce AI into this mix  

64:21

and build on this shared expertise? Well, here’s one way. 

64:26

We are prototyping a virtual Gemini powered teammate. 

64:32

This teammate has an identity and a Workspace account, along 

64:36

with a specific role and objective. 

64:40

Let me bring Tony up to show you what I mean. Hey, Tony! 

64:44

>>TONY VINCENT: Hi, Aparna! Hey, everyone. 

64:50

Okay. So let me start by showing you 

64:52

how we set up this virtual teammate. 

64:55

As you can see, the teammate has its very own account. 

64:58

And we can go ahead and give it a name. We'll do something fun like Chip. 

65:04

Chip’s been given a specific And set of descriptions on how to be helpful  

65:08

for the team, you can see that here, and some  of the jobs are to monitor and track projects,  

65:13

we've listed a few out, to organize information  and provide contexts, and a few more things. 

65:19

Now that we've configured our virtual teammate,  let's go ahead and see Chip in action. 

65:22

To do that I'll switch us  over here to Google chat. 

65:26

First, when planning for an event like I/O, we  have a ton of chat rooms for various purposes. 

65:31

Luckily for me, chip is in all of them. To quickly catch up, I might ask a question like,  

65:38

anyone know if our I/O storyboards are approved? Because we’ve instructed Chip to 

65:49

track this project, Chip searches across all the conversations  

65:53

and knows to respond with an answer. There it is. 

65:56

Simple, but very helpful. Now, as the team adds Chip to more  

66:01

group chats, more files, more E-mail threads, Chip  builds a collective memory of our work together. 

66:07

Let's look at an example. To show you I'll switch over to a different room. 

66:10

How about Project Sapphire over here and here we  are discussing a product release coming up and  

66:16

as usual, many pieces are still in flight, so I  can go ahead and ask, are we on track for launch? 

66:27

Chip gets to work not only searching  through everything it has access to,  

66:31

but also synthesizing what's found and  coming back with an up-to-date response. 

66:37

There it is. A clear time line, a nice summary and  

66:40

notice even in this first message here, Chip flags  a potential issue the team should be aware of. 

66:46

Because we're in a group space, everyone can  follow along, anyone can jump in at any time,  

66:52

as you see someone just did. Asking Chip to help create a  

66:55

doc to help address the issue. A task like this could take me 

67:00

hours, dozens of hours. Chip can get it all done in just a few minutes,  

67:04

sending the doc over right when it's ready. And so much of this practical helpfulness  

67:09

comes from how we've customized Chip to our team's  needs, and how seamlessly this AI is integrated  

67:15

directly into where we're already working. Back to you, Aparna.

67:18

>>APARNA PAPPU: Thank you, Tony! I can imagine a number of 

67:31

different types of virtual teammates configured by 

67:34

businesses to help them do what they need. Now, we have a lot of work to do to figure out how  

67:39

to bring these agentive experiences like virtual  teammates into WorkSpace, including enabling third  

67:46

parties to make their very own versions of Chip. We're excited about where this is headed,  

67:52

so stay tuned. And as Gemini and its capabilities continue  

67:56

to evolve, we're diligently bringing that power  directly into WorkSpace to make all our users more  

68:03

productive and creative, both at home and at work. And now, over to Sissie to tell you more about  

68:11

Gemini app. [Applause]. 

68:25

>>SISSIE HSIAO: Our vision for the Gemini app is to be the most 

68:29

helpful, personal AI assistant by giving you direct access to 

68:33

Google’s latest AI models. Gemini can help you learn, 

68:38

create, code, and anything else you can imagine. 

68:43

And over the past year, Gemini has put Google’s AI in the hands 

68:47

of millions of people, with experiences designed for your 

68:51

phone and the web. We also launched Gemini 

68:55

Advanced, our premium subscription for access to the 

68:58

latest AI innovations from Google. 

69:01

Today, we’ll show you how Gemini is delivering our most 

69:04

intelligent AI experience. Let’s start with the Gemini app, 

69:09

which is redefining how we interact with AI. 

69:13

It’s natively multimodal, so you can use text, voice or your 

69:18

phone’s camera to express yourself naturally. 

69:21

And this summer, you can have an in-depth conversation with 

69:25

Gemini using your voice. We’re calling this new 

69:28

experience "Live". Using Google’s latest speech 

69:32

models, Gemini can better understand you and answer 

69:36

naturally. You can even interrupt while 

69:39

Gemini is responding, and it will adapt to your speech 

69:42

patterns. And this is just the beginning. 

69:45

We're excited to bring the speed games  and video understanding capabilities  

69:50

from Project Astra to the Gemini app. When you go live, you'll be able to  

69:56

open your camera so Gemini can see what you see  and respond to your surroundings in real-time. 

70:04

Now, the way I use Gemini  isn't the way you use Gemini. 

70:08

So we're rolling out a new feature that  lets you customize it for your own needs. 

70:12

And create personal experts on any topic you want. We're calling these "Gems."

70:28

[Applause]. 

70:29

They're really simple to set up. Just tap to create a gem, write your instructions  

70:33

once, and come back whenever you need it. For example, here's a gem that I created  

70:39

that acts as a personal writing coach. It specializes in short stories with  

70:44

mysterious twists, and it even builds  on the story drafts in my Google drive. 

70:50

I call it the cliff hanger curator. Now, gems are a great time saver when  

70:54

you have specific ways that you want to  interact with Gemini again and again. 

71:00

Gems will roll out in the coming months, and our trusted testers 

71:03

are already finding so many creative ways to put them to 

71:07

use. They can act as your yoga 

71:09

bestie, your personal sous chef, a brainy calculus tutor, a peer 

71:14

reviewer for your code, and so much more. 

71:18

Next, I’ll show you how Gemini is taking a step closer to being 

71:22

a true AI assistant by planning and taking action for you. 

71:27

We all know chatbots can give you ideas for your next 

71:31

vacation. But there’s a lot more that goes 

71:33

into planning a great trip. It requires reasoning that 

71:37

considers space-time logistics, and the intelligence to 

71:41

prioritize and make decisions. That reasoning and intelligence 

71:46

all comes together in the new trip planning experience in 

71:49

Gemini Advanced. Now, it all starts with a prompt. 

71:53

Okay. So here we go. 

71:55

We’re going to Miami. My son loves art, my husband 

71:59

loves seafood, and our flight and hotel details are already in 

72:03

my Gmail inbox. Now there’s a lot going on in 

72:07

that prompt. Everyone has their own things 

72:09

that they want to do. To make sense of those 

72:12

variables, Gemini starts by gathering all kinds of 

72:16

information from Search, and helpful extensions like Maps and 

72:20

G-mail. It uses that data to create a 

72:23

dynamic graph of possible travel options, taking into account all 

72:28

my priorities and constraints. The end result is a personalized 

72:33

vacation plan, presented in Gemini’s new dynamic UI. 

72:38

Now, based on my flight information, Gemini knows that I 

72:41

need a two and a half day itinerary. 

72:44

And you can see how Gemini uses spatial data to make decisions. 

72:49

Our flight lands in the late afternoon, so Gemini skips a big 

72:53

activity that day, and finds a highly rated seafood restaurant 

72:57

close to our hotel. Now, on Sunday, we have a jam-packed day. 

73:02

I like these recommendations, but my family likes to sleep in. 

73:06

So I tap to change the start time,  and just like that, Gemini adjusted my  

73:13

itinerary for the rest of the trip. It moved our walking tour to the 

73:18

next day and added lunch options near the street art museum to 

73:22

make the most of our Sunday afternoon. 

73:24

This looks great! It would have taken me hours of 

73:28

work, checking multiple sources, figuring out schedules, and 

73:32

Gemini did this in a fraction of the time. 

73:36

This new trip-planning experience will be rolling out 

73:38

to Gemini Advanced this summer, just in time to help you plan your  

73:42

own Labor Day weekend. [Applause]. 

73:51

All right. We saved the best for last. 

73:55

You heard Sundar say earlier that starting today, Gemini 

73:58

Advanced subscribers get access to Gemini 1.5 Pro, with one 

74:03

million tokens. That is the longest context 

74:07

window of any chatbot in the world.

74:16

[Cheers and Applause]. It unlocks incredible new 

74:18

potential in AI, so you can tackle complex problems that 

74:22

were previously unimaginable. You can upload a PDF up to 1,500 

74:28

pages long, or multiple files to get insights across a project. 

74:34

And soon, you can upload as much as 30,000  lines of code or even an hour-long video. 

74:41

Gemini Advanced is the only chatbot that lets you process 

74:44

this amount of information. Now, just imagine how useful 

74:48

this will be for students. Let’s say you’ve spent months on 

74:52

your thesis, and you could  really use a fresh perspective. 

74:56

You can upload your entire thesis, your sources, notes, 

75:00

your research, and soon interview,  audio recordings and videos, too. 

75:05

so Gemini has all this context to give you actionable advice. 

75:09

It can dissect your main points, identify improvements, and even 

75:14

role play as your professor. So you can feel confident in 

75:18

your work. And check out what Gemini 

75:21

Advanced can do with your spreadsheets, with the new data 

75:24

analysis feature launching in the coming weeks. 

75:27

Maybe you have a side hustle selling handcrafted products. 

75:31

But you’re a better artist than accountant, and it's really hard to understand  

75:35

which products are worth your time. Simply upload all of your 

75:39

spreadsheets and ask Gemini to visualize your earnings and help 

75:42

you understand your profit. Gemini goes to work calculating 

75:47

your returns and pulling its analysis together into a single 

75:51

chart, so you can easily understand which products are 

75:54

really paying off. Now, behind the scenes, Gemini writes  

75:59

custom Python code to crunch these numbers. And of course, your files are 

76:03

not used to train our models. Oh, and just one more thing. 

76:09

Later this year, we'll be doubling the  long context window to 2 million tokens.

76:22

[Cheers and Applause]. We absolutely can't wait for  

76:23

you to try all of this for yourself. Gemini is continuing to evolve 

76:28

and improve at a breakthrough pace. 

76:30

We’re making Gemini more multimodal, more agentive, and 

76:33

more intelligent, with the capacity to process the most 

76:37

information of any chatbot in the world. 

76:40

And as you heard earlier, we're also expanding Gemini Advanced 

76:43

to over 35 supported languages, available today.

76:50

[Applause]. But, of course, what makes 

76:55

Gemini so compelling is how easy 

76:58

it is to do just about anything you want, with a simple prompt. 

77:02

Let's take a look. >> Enter prompt here. 

77:08

Okay. Can't be that hard. 

77:10

How about generate an image of a cat playing guitar? 

77:14

Is that how it works? Am I doing AI? 

77:17

Yeah. Just does whatever you type 

77:20

What are last minute gift ideas you can make with arts and 

77:22

crafts? Plan a workout routine to get 

77:25

bigger calves. Help me think of titles to my 

77:28

tell-all memoir. What's something smart I can say 

77:31

about Renoir? Generate another image of a cat 

77:34

playing guitar. If a girl calls me a snack, how 

77:38

do I reply? Yeah, that's how it works. 

77:42

you're doing AI. Make this email sound more 

77:44

professional before I hit send. What's a good excuse to cancel 

77:49

dinner with my friends? We're literally sitting right 

77:52

here. There's no wrong way to prompt. 

77:56

Yeah, you're doing AI. There's no wrong way to prompt. 

78:01

It does whatever you type. Just prompt your prompt in the 

78:03

prompt bar. Or just generate an image of a 

78:05

cat playing guitar. You know it can do other stuff, 

78:11

right? [Applause]. 

78:24

>>SAMEER SAMAT: Hi, everyone. It’s great to be back at Google 

78:27

I/O. Today, you’ve seen how AI is 

78:31

transforming our products across Gemini, Search, Workspace and 

78:35

more. We're bringing all of these 

78:37

innovations right onto your Android phone. 

78:40

And we're going even further, to make Android the best place to 

78:45

experience Google AI. This new era of AI is a profound 

78:50

opportunity to make smartphones truly smart. 

78:54

Our phones have come a long way in a short time, but if you 

78:58

think about it, it’s been years since the user experience has 

79:01

fundamentally transformed. This is a once-in-a-generation 

79:05

moment to reinvent what phones can do. 

79:09

So we’ve embarked on a multi-year journey to reimagine 

79:12

Android, with AI at the core. And it starts with three 

79:18

breakthroughs you’ll see this year. 

79:21

First, we're putting AI-powered search right at your fingertips, 

79:26

creating entirely new ways to get the answers you need. 

79:30

Second, Gemini is becoming your new AI assistant on Android, 

79:35

there to help you any time. And third, we’re harnessing 

79:40

on-device AI to unlock new experiences that work as fast as 

79:44

you do, while keeping your sensitive data private. 

79:49

Let's start with AI-powered search. 

79:52

Earlier this year, we took an important first step at Samsung 

79:55

Unpacked, by introducing Circle to Search. 

79:59

It brings the best of Search directly into the user 

80:02

experience. So you can go deeper on anything 

80:05

you see on your phone, without switching apps. 

80:08

Fashionistas are finding the perfect shoes, home chefs are 

80:12

discovering new ingredients, and with our latest update, it’s 

80:16

never been easier to translate whatever’s on your screen, like 

80:20

a social post in another language. 

80:23

And there are even more ways Circle to Search can help. 

80:27

One thing we’ve heard from students is that they are doing 

80:30

more of their schoolwork directly on their phones and 

80:33

tablets. So, we thought: Could Circle to 

80:37

Search be your perfect study buddy? 

80:40

Let’s say my son needs help with a tricky physics word problem, 

80:44

like this one. My first thought is, oh boy, 

80:48

it’s been a while since I’ve thought about kinematics. 

80:51

If he’s stumped on this question, instead of putting me 

80:54

on the spot, he can circle the exact part he’s stuck on and get 

80:58

step-by-step instructions. Right where he’s already doing 

81:02

the work. Ah, of course, final velocity 

81:06

equals initial velocity plus acceleration times elapsed time. 

81:11

Right. I was just about to say that. 

81:15

Seriously, though, I love that it shows how to solve the 

81:18

problem, not just the answer. This new capability is available 

81:24

today! And later this year, Circle to 

81:28

Search will be able to tackle more complex problems involving 

81:31

symbolic formulas, diagrams, graphs and more. 

81:36

Circle to Search is only on Android. 

81:40

It’s available on more than 100 million devices today, and we’re 

81:44

on track to double that by the end of the year.

81:54

[Cheers and Applause]. You’ve already heard from Sissie 

81:56

about the incredible updates coming to the Gemini app. 

82:00

On Android, Gemini is so much more. 

82:04

It’s becoming a foundational part of the Android experience. 

82:08

Here’s Dave to share more. [Applause]. 

82:18

>>DAVE BURKE: Hey, everyone. A couple months ago we launched 

82:22

Gemini on Android. Like Circle to Search, Gemini 

82:26

works at the system level. So instead of going to a 

82:29

separate app, I can bring Gemini right to what I’m doing. 

82:34

Now, we're making Gemini context aware, so it can 

82:38

anticipate what you're trying to do and provide more helpful  

82:41

situations in the moment. In other words, to be a more 

82:45

helpful assistant. So let me show you  

82:48

how this works. 

82:48

And I've got my shiny new Pixel 8a here to help me.

82:55

[Applause]. 

82:56

So my friend Pete is asking me if I want to play pickleball 

82:59

this weekend. And I know how to play tennis, sort of. 

83:03

I have to say that for the demo. But I'm new to this pickleball thing,  

83:06

so I'm to reply and try to be funny and  say is that like tennis but with pickles? 

83:14

This would be actually a lot funnier with a meme,  so let me bring up Gemini to help with that,  

83:19

and I'll say create image of tennis with pickles. Now, one new thing you'll notice 

83:26

is that the Gemini window hovers in place above the app so that I 

83:29

stay in the flow. Okay. 

83:32

So I generated some pretty good images. What's nice is I can drag and drop any of  

83:36

these directly into the images below. So cool, let me send that.

83:44

[Applause]. All right. 

83:47

So Pete's typing, and he says -- he's  sending me a video on how to play pickleball. 

83:51

All right. Thanks, Pete. 

83:52

Let's tap on that. And that launches YouTube but, you know, I only  

83:56

have one or two burning questions about the game. I could bring up Gemini to help with that,  

84:01

and because it's context-aware, Gemini knows I'm  looking at a video, so it proactively shows me  

84:08

an ask this video chip. So let me tap on that. 

84:12

And now, I can ask specific  questions about the video. 

84:15

So, for example, what is the 2 bounce rule? Because that's something that I've heard about but  

84:24

don't quite understand in the game. By the way, this uses signals like  

84:28

YouTube's captions, which means you  can use it on billions of videos. 

84:32

So give it a moment, and, there. I get a nice,succinct answer. 

84:37

The ball must bounce once on each  side of the court after a serve. 

84:40

Okay. Cool. 

84:41

Let me go back to messages and  Pete's followed up, and he says,  

84:45

you're an engineer, so here's the  official rule book for pickleball. 

84:50

Thanks, Pete. Pete is very helpful, by the way. 

84:52

Okay. So we tap on that. 

84:53

It launches a PDF, now, that's an 84-page PDF. I don't know how much time Pete thinks I have. 

84:59

Anyway, us engineers, as you all know,  like to work smarter, not harder,  

85:03

so instead of trolling through this entire  document, I can pull up Gemini to help. 

85:08

And again, Gemini anticipates what I need,  and offers me an ask this PDF option. 

85:14

So if I tap on that, Gemini now ingests all  of the rules to become a pickleball expert,  

85:20

and that means I can ask very esoteric questions,  like, for example, are spin serves allowed? 

85:31

And let's hit that, because I've  heard that rule may be changing. 

85:34

Now, because I'm a Gemini advanced user, this  works on any PDF and takes full advantage  

85:39

of the long context window and there's  just lots of times where that's useful. 

85:43

For example, let's say you're looking for  a quick answer in an appliance user manual. 

85:48

And there you have it. It turns out, no, spin serves are not allowed. 

85:53

So Gemini not only gives me a clear answer to my  question, it also shows me exactly where in the  

85:59

PDF to learn more. Awesome. 

86:02

Okay. So that’s a few of the ways 

86:09

that we're enhancing Gemini to be more  context aware and helpful in the moment. 

86:14

And what you've seen here are the first of  really many new ways that Gemini will unlock  

86:20

new experiences at the system level,  and they're only available on Android. 

86:25

You’ll see these, and more, coming to hundreds of millions of  

86:28

devices over the next couple of months. Now, building Google AI directly 

86:34

into the OS elevates the entire smartphone experience. 

86:38

Android is the first mobile operating system to include a 

86:41

built-in, on-device foundation model. 

86:44

This lets us bring Gemini goodness from the data center 

86:47

right into your pocket. So the experience is faster,  

86:51

while also protecting your privacy. Starting with Pixel later this 

86:55

year, we’ll be expanding what’s possible with our latest model, 

86:58

Gemini Nano with Multimodality. This means your phone can 

87:03

understand the world the way you understand it. 

87:06

So not just through text input, but also  through sights, sounds, and spoken language. 

87:12

Let me give you an example. 2.2 billion people experience 

87:16

blindness or low vision. So several years ago, we 

87:19

developed TalkBack, an accessibility feature that helps 

87:23

people navigate their phone through touch and spoken feedback. 

87:27

Helping with images is especially important. 

87:30

In fact, my colleague Karo, who uses TalkBack, will typically 

87:34

come across 90 unlabeled images per day. 

87:37

Thankfully, TalkBack makes them accessible, and now we’re taking 

87:41

that to the next level with the multimodal capabilities of 

87:44

Gemini Nano. So when someone sends Karo a 

87:47

photo, she’ll get a richer and clearer description of what’s 

87:51

happening. Or, let’s say Karo is shopping 

87:53

online for an outfit. Now she can get a crystal clear 

87:56

description of the style and cut to find the perfect look. 

88:00

Running Gemini Nano on-device helps minimize latency, and the 

88:05

model even works when there's  no network connection. 

88:08

These improvements to TalkBack are coming later this year. 

88:13

Let me show you another example of what on-device AI can unlock. 

88:17

People lost more than one trillion dollars to fraud last 

88:20

year. And as scams continue to evolve 

88:23

across texts, phone calls, and even videos, Android can help 

88:27

protect you from the bad guys, no matter how they try to reach 

88:30

you. So let’s say I get rudely 

88:33

interrupted by an unknown caller right in the middle of my 

88:36

presentation. [Phone ringing]. 

88:40

>> Hello! >> Hi. 

88:42

I'm calling from Save More Bank Security Department. 

88:44

Am I speaking to Dave? >>DAVE BURKE: Yes, this is Dave. 

88:47

I’m kinda in the middle of something. 

88:48

>> We've detected some suspicious activity on your 

88:50

account. It appears someone is trying to 

88:52

make unauthorized charges. >>DAVE BURKE: Oh, yeah? 

88:56

What kind of charges? >> I can't give you specifics 

88:58

over the phone, but to protect your account, I’m going to help 

89:01

you transfer your money to a secure account we’ve set up for 

89:04

you. [Laughter]. 

89:08

>>DAVE BURKE: And look at this, 

89:09

my phone gives me a warning that this call might be a scam!

89:20

[Applause]. Gemini Nano alerts me the second 

89:22

it detects suspicious activity, like a bank asking me to move my 

89:25

money to keep it safe. And everything happens right on 

89:29

my phone, so the audio processing stays completely 

89:32

private to me and on my device. We’re currently testing this 

89:35

feature, and we’ll have more updates to share later in the 

89:38

summer. And we’re really just scratching 

89:41

the surface on the kinds of fast, private experiences that 

89:45

on-device AI unlocks. Later this year, Gemini will be 

89:49

able to more deeply understand the content of your screen, 

89:53

without any information leaving your phone, thanks to the 

89:55

on-device model. So, remember that pickleball 

89:59

example earlier? Gemini on Android will be able 

90:02

to automatically understand the conversation and provide 

90:05

relevant suggestions, like where to find pickleball clubs near 

90:09

me. 

90:10

And this is a powerful concept that will  work across many apps on your phone. 

90:15

In fact, later today at the developer keynote, you’ll hear 

90:18

about how we’re empowering our developer community with our 

90:21

latest AI models and tools like Gemini Nano and Gemini in 

90:25

Android Studio. Also, stay tuned tomorrow for 

90:29

our upcoming Android 15 updates, which we can’t wait to share. 

90:35

As we said at the outset, we’re reimagining Android with Gemini 

90:38

at the core. From your favorite apps, to the 

90:41

OS itself, we’re bringing the power of AI to every aspect of 

90:45

the smartphone experience. And with that, let me hand over 

90:49

to Josh to share more on our latest news for developers. 

90:53

Thank you. [Applause]. 

91:04

>>JOSH WOODWARD: It’s amazing to see Gemini Nano do all of that 

91:08

directly on Android. That was our plan all along, to 

91:12

create a natively multimodal Gemini in a range of sizes so 

91:17

you all, as developers, can choose  the one that works best for you. 

91:22

Throughout the morning, you’ve heard a lot about our Gemini 1.5 

91:25

series, and is I want to talk about  the two models you can access today. 

91:30

1.5 Pro, which is getting a 

91:32

series of quality improvements that go out, right about now,  

91:36

and the brand new 1.5 Flash. Both are available globally in 

91:41

over 200 countries and territories.

91:49

[Cheers and Applause]. You can go over to AI Studio  

91:51

or Vertex AI if you're a Google cloud  customer and you can give them a try. 

91:55

Now, both models are also natively multimodal. 

91:58

That means you can interleave text, images, audio, video as 

92:03

inputs, and pack that massive 1 million token context window. 

92:07

And if you go to ai.google.dev today, you can sign up to try 

92:12

the 2 million token context window for 1.5 Pro. 

92:17

We're also adding a bunch of new developer  features, starting with video frame extraction. 

92:23

That's going to be in the Gemini  API, parallel function calling,  

92:26

so you can return more than one function call  at a time, and my favorite, context caching, so  

92:33

you can send all of your files to the model once  and not have to re-send them over and over again. 

92:40

That should make the long  context even more useful,  

92:43

and more affordable. It ships next month.

92:49

[Applause]. Now, we're using Google's 

92:54

infrastructure to serve these 

92:56

models, so developers like all  of you can get great prices. 

93:01

1.5 Pro is $7 per 1 million tokens, and I'm excited to share 

93:07

that for prompts up to 128K, it will be 50% less, for $3.50. 

93:15

And 1.5 flash will start at .35 cents for 1 million tokens.

93:23

[Cheers and Applause]. 

93:26

Now, one thing you might be wondering is  which model is best for your use case? 

93:30

Here’s how I think about it. We use 1.5 Pro for complex tasks, where you  

93:36

really want the highest quality response, and it's  okay if it takes a little bit longer to come back. 

93:42

We're using 1.5 Flash for quick tasks, where  the speed of the model is what matters the most. 

93:49

And as a developer, you can go try them both  out today and see what works best for you. 

93:55

Now, I'm going to show you how it works here in  AI Studio, the fastest way to build with Gemini. 

94:00

And we'll pull it up here, and  you can see this is AI studio. 

94:05

It's free to use. You don't have to configure anything to get going. 

94:09

You just go to AI studio.Google.com, log in with  your Google account, and you can just pick the  

94:14

model here on the right that works best for you. So one of the ways we've been using 1.5  

94:20

Flash is to actually learn from customer  feedback about some of our labs products. 

94:26

Flash makes this possible with its low latency. So what we did here is we just took a bunch of  

94:32

different feedback from our customer forums. You can put it in to Flash, load up  

94:37

a prompt, and hit run. Now, in the background,  

94:40

what it's going to do is it's going to go through  that 93,000 token pile of information and you  

94:46

can see here start streaming it back. Now, this is really helpful because  

94:50

it pulls out the themes for us. It gives us all the right places  

94:53

where we can start to look. We can see this is from some of the  

94:56

benefits from Notebook LM, like we showed earlier. Now, what's great about this is that you can take  

95:03

something like this in AI Studio, prototyped  here in ten seconds, and with one click in  

95:09

the upper left, get an API key, or over here in  the upper right, just tap get code, and you've  

95:15

got all the model configurations, the safety  settings, ready to go, straight into your IDE. 

95:22

Now, over time, if you find that you  need more enterprise-grade features  

95:26

you can use the same Gemini 1.5 models and  the same configurations right in Vertex AI. 

95:33

That way, you can scale up with Google  Cloud as your enterprise needs grow. 

95:38

So that's our newly updated Gemini 1.5 Pro and the  new 1.5 Flash, both of which are available today  

95:46

globally, and you'll hear a lot more about  them in the developer keynote later today.

95:58

[Applause]. 

95:59

Now, let's shift gears and talk  about Gemma, our family of open 

96:03

models, which are crucial for driving AI innovation and 

96:07

responsibility. Gemma is being built from the  

96:10

same research and technology as Gemini. It offers top performance and comes in  

96:15

light weight 7b and 2b sizes. Since it launched less than 

96:20

three months ago, it’s been downloaded millions of times 

96:23

across all the major model hubs. Developers and researchers have 

96:28

been using it and customizing the base Gemma  model and using some of our pre-trained variants,  

96:34

like RecurrentGemma, and CodeGemma,  and today's newest member, PaliGemma,  

96:42

our first vision-language model,  and it's available right now.

96:49

[Applause]. It's optimized for  

96:53

a range of image captioning, visual Q&A and  other image labeling tasks, so go give it a try. 

97:01

I'm also excited to announce  that we have Gemma 2 coming. 

97:06

It's the next generation of Gemma,  and it will be available in June. 

97:11

One of the top requests we've heard from  developers is for a bigger Gemma model,  

97:16

but it's still going to fit in the  size that's easy for all of you to use. 

97:20

So in a few weeks, we'll  be adding a new 27 billion  

97:23

parameter model to Gemma 2, and  here's what's great about it. 

97:28

This size is optimized by Nvidia to run  on next-gen GPUs and can run efficiently  

97:35

on a single TPU host in Vertex AI. So this quality to size ratio is  

97:42

amazing because it will outperform  models more than twice its size. 

97:47

We can't wait to see what  you're going to build with it.

97:55

[Applause]. 

97:56

To wrap up, I want to share this inspiring  story from India, where developers have been  

98:01

using Gemma and its unique tokenization to  create Navrasa, a set of instruction-tuned  

98:08

models to expand access to 15 Indic languages. This builds on our efforts to make information  

98:14

accessible in more than 7,000  languages and the world. 

98:19

Take a look. >>AASHI:  

98:29

Language is an interesting problem to solve, 

98:33

actually, and given India has a huge variety of languages and it 

98:38

changes every five kilometers. >>HARSH: When technology is 

98:43

developed for a particular culture, it won't be able to 

98:47

solve and understand the nuances of a country like India. 

98:52

One of Gemma’s features is an incredibly powerful tokenizer 

98:56

which enables the model to use hundreds of thousands of words, 

98:59

symbols, and characters across so many alphabets and language 

99:03

systems. 

99:04

This large vocabulary is critical to adapting Gemma to 

99:08

power projects like Navarasa. >>RAMSRI: Navarasa is a model 

99:13

that’s trained for Indic languages. 

99:15

It's a fine-tuned model based on Google’s Gemma. 

99:17

We built Navarasa to make large language models culturally 

99:21

rooted where people can talk in their native language and get 

99:25

the responses in their native language. 

99:28

Our biggest dream is to build a model to include everyone from 

99:33

all corners of India. >>GAURAV: We need a technology 

99:41

that will harness AI so that everyone can use it and no one 

99:41

is left behind. >>HARSH: Today the language that 

99:42

you speak in could be the tool and the technology that you use 

99:46

for solving your real-world problems. 

99:49

And that's the power of generative AI that we want to 

99:52

bring to every corner of India and the entire world.

100:12

[Applause].

100:12

[Cheers and Applause]. >>JAMES MANYIKA: Listening to 

100:13

everything that’s been announced today, it’s clear that AI is 

100:17

already helping people, from their everyday tasks to their 

100:22

most ambitious, productive, and imaginative endeavors. 

100:27

Our AI innovations, like multimodality, long context and 

100:31

agents, are at the cutting-edge of what this technology can do, 

100:36

taking to a whole new level its capacity to help people. 

100:40

Yet, as with any emerging technology, there are still 

100:45

risks and new questions that will arise as AI advances and 

100:49

its uses evolve. In navigating these 

100:53

complexities, we are guided by our AI Principles, and we’re 

100:57

learning from our users, partners, and our own research. 

101:03

To us, building AI responsibly means both addressing the risks 

101:08

and maximizing the benefits for people and society. 

101:12

Let me begin with what we’re doing to address risks. 

101:15

Here, I want to focus on how we are improving our models and 

101:18

protecting against their misuse. Beyond what Demis shared 

101:23

earlier, we are improving our models with an industry-standard 

101:27

practice called red-teaming, in which we test our own models and try  

101:31

to break them to identify weaknesses. Adding to this work, we’re 

101:36

developing a cutting-edge technique we call AI-assisted 

101:40

red teaming. This draws on Google DeepMind's 

101:44

gaming breakthroughs like AlphaGo, where we train AI 

101:48

agents to compete against each other and improve and expand the 

101:51

scope of their red teaming capabilities. 

101:54

We are developing AI models with these capabilities to help 

101:58

address adversarial prompting and limit problematic outputs. 

102:03

We’re also improving our models with feedback from two important 

102:07

groups: Thousands of internal safety experts with a range of  

102:11

disciplines, and a range of independent  experts from academia to civil society. 

102:18

Both groups help us identify emerging risks, from 

102:22

cybersecurity threats to potentially dangerous 

102:25

capabilities in areas like Chem-Bio. 

102:30

Combining human insight with our safety testing methods will help 

102:35

make our models and products more accurate, reliable and safer. 

102:41

This is particularly important as technical advances like better  

102:45

intonation make interactions with  AI feel and sound more human-like. 

102:51

We're doing a lot of research in this area,  including the potential for harm and misuse. 

102:58

We're also developing new tools to  help prevent the misuse of our models. 

103:03

For example, Imagen 3 and Veo create more realistic imagery 

103:08

and videos, we must also consider how they might be 

103:12

misused to spread misinformation. 

103:15

To help, last year we introduced SynthID, a tool that adds 

103:20

imperceptible watermarks to our AI-generated images and audio so 

103:25

that they’re easier to identify. Today, we’re expanding SynthID 

103:30

to two new modalities: Text and video. 

103:35

These launches build on our efforts to deploy 

103:37

state-of-the-art watermarking capabilities across modalities. 

103:42

Moving forward, we will keep integrating advances like 

103:45

watermarking and other emerging techniques, to secure our latest 

103:50

generations of Gemini, Imagine, Lyria, and Veo models. 

103:55

We’re also committed to working with the ecosystem with all of you  

103:59

to help others build on the advances we're making. And in the coming months, we'll be open-sourcing  

104:05

SynthID text watermarking. This will be available in our 

104:09

updated Responsible Generative AI Toolkit, which we created to 

104:14

make it easier for developers to build AI responsibly. 

104:18

We're also collaborating on  C2PA, and we support C2PA, 

104:23

collaborating with Adobe, Microsoft, startups, and many 

104:26

others, to build and implement a standard that improves the 

104:30

transparency of digital media. Now, let’s turn to the second 

104:36

and equally important part of our responsible AI approach: 

104:40

How we’re building AI to benefit people and society. 

104:44

Today, our AI advances are helping to solve real-world 

104:48

problems, like accelerating the work of 1.8 million scientists 

104:53

in 190 countries who are using AlphaFold to work on issues like 

104:58

neglected diseases. Helping to predict floods in 

105:02

more than 80 countries. And helping organizations, like 

105:06

the United Nations track progress on  the world's 17 sustainable development  

105:11

goals with Data Commons. And now, generative AI is 

105:16

unlocking new ways for us to make the world’s information, 

105:20

and knowledge, universally accessible and useful for 

105:23

learning. Billions of people already use 

105:27

Google products to learn every day, and generative  AI is opening up new possibilities, allowing us to  

105:34

ask questions like, what if everyone everywhere could have their own 

105:39

personal AI tutor, on any topic? Or, what if every educator could 

105:45

have their own assistant in the classroom? 

105:48

Today marks a new chapter for learning and education at 

105:51

Google. I am excited to introduce 

105:54

LearnLM, our new family of models, based on Gemini, and 

106:00

fine-tuned for learning. LearnLM is grounded in 

106:04

educational research, making learning experiences more 

106:08

personal and engaging. And it’s coming to the products 

106:12

you use every day. Like Search, Android, Gemini and YouTube. 

106:17

In fact, you've already seen LearnLM  on stage today when it helped Sameer  

106:23

with his son's homework on Android. Now, let's see how it works in 

106:27

the Gemini app. Earlier, Sissie introduced Gems, 

106:31

custom versions of Gemini that can act as personal assistive 

106:36

experts on any topic. We are developing some pre-made 

106:41

Gems, which will be available in the Gemini App and web 

106:44

experience, including one called Learning Coach. 

106:48

With Learning Coach, you can get step-by-step study guidance, 

106:52

along with helpful practice and memory techniques, designed to 

106:57

build understanding rather than just give you the answer. 

107:01

Let’s say you’re a college student studying for an upcoming 

107:04

biology exam. If you need a tip to remember 

107:07

the formula for photosynthesis, Learning Coach can help. 

107:12

Learning Coach, along with other pre-made gems, will launch in 

107:15

Gemini in the coming months. And you can imagine what 

107:19

features like Gemini Live can unlock for learning. 

107:24

Another example is a new feature in YouTube that uses LearnLM to 

107:29

make educational videos more interactive, allowing you to ask 

107:33

a clarifying question, get a helpful explanation, or take a 

107:37

quiz. This even works  

107:39

for those long lectures or seminars, thanks  to Gemini model's long context capabilities. 

107:46

This feature in YouTube is already rolling out to select 

107:50

Android users. As we work to extend LearnLM 

107:54

beyond our own products, we are partnering with experts and 

107:58

institutions like Columbia Teachers College, Arizona State 

108:02

University and Khan Academy to test and improve the new 

108:07

capabilities in our models for learning. 

108:10

And we’ve collaborated with MIT RAISE to develop an online 

108:14

course to help educators better understand and use generative 

108:18

AI. We’re also working directly with 

108:21

educators to build more helpful generative AI tools with Learn 

108:25

LM. For example, in Google 

108:27

Classroom, we’re drawing on the advances you’ve heard about 

108:30

today to develop new ways to simplify  and improve lesson planning, and enable  

108:37

teachers to tailor lessons and content to  meet the individual needs of their students. 

108:44

Standing here today makes me think back to my own time as an 

108:47

undergraduate. Then, AI was considered 

108:51

speculative, far from any real world uses. 

108:55

Today, we can see how much is already real, how much it is 

109:00

already helping people, from their everyday tasks to their 

109:03

most ambitious, productive and imaginative endeavors, and how 

109:08

much more is still to come. This is what motivates us. 

109:12

I’m excited about what’s ahead and what we’ll build with all of 

109:16

you. Back to you, Sundar. 

109:17

[Applause]. >>SUNDAR PICHAI:  

109:30

Thanks, James. All of this shows the important 

109:34

progress we’ve made, as we take a bold and responsible approach 

109:37

to making AI helpful for everyone. 

109:40

Before we wrap, I have a feeling that someone out there might be 

109:44

counting how many times we’ve mentioned AI today. 

109:52

[Laughter]. And since a big theme today has 

109:54

been letting Google do the work for you, we went ahead and 

109:58

counted, so that you don’t have to.

110:12

[Cheers and Applause]. That might be a record in how 

110:14

many times someone has said AI. I’m tempted to say it a few more 

110:21

times. But I won't. 

110:23

Anyhow, this tally is more than just a punchline. 

110:26

It reflects something much deeper. 

110:29

We’ve been AI-first in our approach for a long time. 

110:32

Our decades of research leadership have pioneered many 

110:36

of the modern breakthroughs that power AI progress, for us and 

110:39

for the industry. On top of that, we have 

110:42

world-leading infrastructure built for the AI Era, 

110:45

cutting-edge innovation in Search, now powered by Gemini, 

110:49

products that help at extraordinary scale, including 

110:52

fifteen products with over half a billion users, and platforms 

110:57

that enable everyone, partners, customers, creators, and all of 

111:02

you, to invent the future. This progress is only possible 

111:06

because of our incredible developer community. 

111:08

You are making it real, through the experiences you build every 

111:12

day. So, to everyone here in 

111:14

Shoreline and the millions more watching around the world, 

111:18

here’s to the possibilities ahead and creating them 

111:20

together. Thank you.

111:28

[Cheers and Applause]. >> What does this remind you of? 

111:45

>> Cat. >> Wow. 

111:50

>> Wow! >> Okay! 

111:54

>> When all of these tools come together, it's a powerful 

111:58

combination. >> It's amazing. 

111:59

>> It's amazing. It's an entire suite of different  

112:03

kinds of possibilities. >> Hi. 

112:09

I'm Gemini. >> What neighborhood do you 

112:14

think I'm in? >> This appears to be the Kings 

112:16

Cross area of London. >> Together we're creating a new 

112:19

era.

Rate This

5.0 / 5 (0 votes)

Verwandte Tags
AI InnovationGoogle I/OMultimodal AILong ContextAI AgentsGemini AIGenerative AISearch TechnologyWorkspace EfficiencyAndroid IntegrationDeveloper ToolsEducational AIResponsible AI
Benötigen Sie eine Zusammenfassung auf Deutsch?