Google Keynote (Google I/O ‘24)

Google
14 May 2024112:43

Summary

TLDRThe Google I/O 2024 event showcased a multitude of AI innovations, highlighting Google's commitment to integrating artificial intelligence into every aspect of technology. Sundar Pichai, CEO of Google, introduced Gemini, a generative AI model, as a cornerstone of Google's AI strategy. Gemini is designed to be multimodal, capable of processing text, images, video, and code, and is set to revolutionize the way people work and interact with technology. The event covered advancements in Google Search, Workspace, Android, and the introduction of new AI models like Gemini 1.5 Pro and 1.5 Flash. The updates aim to make AI more accessible and beneficial for creators, developers, and users worldwide. The narrative emphasized the potential for AI to personalize and enhance various facets of life, from education to daily tasks, while also addressing the importance of responsible AI development and deployment.

Takeaways

  • 🚀 Google has launched Gemini, a generative AI model, aiming to revolutionize the way we work by being natively multimodal and capable of reasoning across various forms of data like text, images, video, and code.
  • 📈 Over 1.5 million developers are already using Gemini models for tasks such as debugging code, gaining insights, and building AI applications, highlighting its rapid adoption and impact on the developer community.
  • 🔍 Google Search has been transformed with Gemini, enabling new ways of searching, including complex queries, and photo searches, leading to an increase in user satisfaction and search usage.
  • 📱 Gemini's capabilities are being integrated into Google's products across Mobile, Search, Photos, Workspace, and Android, providing a seamless AI experience for users.
  • 🎉 Sundar Pichai announced the expansion of Gemini's context window to 2 million tokens, a significant leap towards the goal of infinite context, allowing for even more detailed and long-range reasoning.
  • 📈 Google Workspace is set to benefit from Gemini's multimodality and long context features, streamlining tasks like email summarization and meeting highlight generation, enhancing productivity.
  • 🎓 LearnLM, a new family of models based on Gemini, is designed to enhance learning experiences and is being integrated into everyday products like Search, Android, Gemini, and YouTube.
  • 🤖 The concept of AI agents was introduced, which are intelligent systems capable of reasoning, planning, and memory, designed to perform tasks on behalf of users while ensuring user supervision and control.
  • 💬 Gemini's real-time speech models enable a more natural conversational experience with AI, allowing users to interrupt and receive immediate responses, making interactions feel more human-like.
  • 🌐 Google is committed to responsible AI development, focusing on improving model safety, preventing misuse, and expanding AI's benefits to society, including education and accessibility.
  • 📊 Google's investment in AI infrastructure, such as Tensor Processing Units (TPUs), is pivotal in training and serving state-of-the-art models like Gemini, reinforcing Google's position at the forefront of AI innovation.

Q & A

  • What is Google's new generative AI model called?

    -Google's new generative AI model is called Gemini.

  • How does Gemini redefine the way we work with AI?

    -Gemini redefines the way we work with AI by being natively multimodal, allowing users to interact with it through text, voice, or the phone's camera, and by providing more natural and context-aware responses.

  • What is the significance of the 1 million token context window in Gemini 1.5 Pro?

    -The 1 million token context window in Gemini 1.5 Pro is significant because it is the longest context window of any chatbot in the world, allowing it to process complex problems and large amounts of information that were previously unimaginable.

  • How does Gemini Advanced's trip planning feature work?

    -Gemini Advanced's trip planning feature works by gathering information from various sources like Search, Maps, and Gmail. It uses this data to create a dynamic graph of possible travel options, taking into account the user's priorities and constraints, and then presents a personalized vacation plan.

  • What is the role of Gemini in the future of Google Search?

    -In the future of Google Search, Gemini plays the role of an AI agent that uses multi-step reasoning to break down complex questions, figure out the problems that need to be solved, and in what order. It taps into Google's index of information about the real world to provide comprehensive and customized search results.

  • How does Google ensure the responsible use of its AI technology?

    -Google ensures the responsible use of its AI technology by adhering to its AI Principles, red-teaming to identify weaknesses, involving internal safety experts and independent experts for feedback, and developing tools like SynthID to watermark AI-generated content, making it easier to identify.

  • What is the new feature called 'Live' in the Gemini app?

    -'Live' is a new feature in the Gemini app that allows users to have in-depth conversations with Gemini using their voice. It utilizes Google's latest speech models to better understand users and provide more natural responses.

  • How does Gemini's 'Gems' feature help users customize their AI experience?

    -Gemini's 'Gems' feature allows users to create personalized AI assistants, or 'Gems,' tailored to specific topics or tasks. Users can set up these Gems once with their instructions and then use them whenever needed for a customized AI experience.

  • What is the purpose of the 'AI-organized search results page' in Google Search?

    -The 'AI-organized search results page' in Google Search is designed to provide users with a whole page of AI-generated and AI-organized content that is custom-built for their query. It uncovers the most interesting angles for the user to explore and organizes the results into helpful clusters.

  • How does the Gemini app integrate with Android to enhance the smartphone experience?

    -The Gemini app integrates with Android by becoming a foundational part of the Android experience, working at the system level. It provides context-aware assistance, allowing users to bring Gemini to their current activity without switching apps, and offers features like video understanding and on-device processing for faster and more private experiences.

  • What is the 'SynthID' tool, and how does it contribute to responsible AI?

    -SynthID is a tool developed by Google that adds imperceptible watermarks to AI-generated images, audio, text, and video. This makes the synthetic media easier to identify and helps prevent the misuse of AI-generated content, such as spreading misinformation.

Outlines

00:00

🚀 Google's Gemini AI: A Leap Forward in Technology

The first paragraph introduces Google's ambitious strides in artificial intelligence with the launch of Gemini, a generative AI model. Sundar Pichai, CEO of Google, welcomes the audience to Google I/O and emphasizes the transformative impact of Gemini on various sectors, including the way we work, find solutions, and interact with technology. The paragraph highlights Google's commitment to AI innovation across research, product development, and infrastructure, and the potential of Gemini to drive opportunities for creators and developers.

05:02

🔍 Google Search Transformation with Generative AI

The second paragraph discusses the transformation of Google Search with the integration of Gemini's capabilities. It talks about the Search Generative Experience that has led to new ways of searching, including complex queries and photo-based searches. The paragraph also mentions the user satisfaction increase and the launch of AI Overviews, which will be available to users in the U.S. with plans for global expansion.

10:05

📸 Google Photos Enhancement with Gemini

The third paragraph showcases how Gemini is enhancing Google Photos by making the search process more intuitive and context-aware. It describes a scenario where a user can find their car's license plate number by simply asking Photos, thanks to Gemini's ability to recognize and understand the context. The paragraph also teases the upcoming 'Ask Photos' feature, which will allow for deeper memory search capabilities.

15:08

🧠 Multimodality and Long Context in Gemini

The fourth paragraph delves into the technical aspects of Gemini, focusing on its multimodality and long context capabilities. It discusses how Gemini's design allows it to understand different types of inputs and find connections between them. The paragraph also highlights the developer's excitement about the 1 million token context window and how it has been used to improve tasks such as coding and data analysis.

20:12

📚 Innovative Applications of Gemini

The fifth paragraph presents real-world applications of Gemini, where developers have used its advanced features to perform tasks like turning a video of a bookshelf into a searchable database. It illustrates the potential of Gemini to understand and process vast amounts of data, providing innovative solutions to complex problems.

25:15

🌐 Expanding Gemini's Reach and Capabilities

The sixth paragraph discusses the expansion of Gemini's capabilities with the introduction of Gemini 1.5 Pro, which offers long context support and is now available globally. It also announces the expansion of the context window to 2 million tokens for developers and highlights new updates in translation, coding, and reasoning.

30:16

🤖 AI Agents and the Future of Intelligent Systems

The seventh paragraph explores the concept of AI agents, which are intelligent systems capable of reasoning, planning, and memory. It provides examples of how these agents can simplify tasks like shopping and moving to a new city by automating multiple steps on behalf of the user. The paragraph emphasizes the importance of privacy, security, and user control in the development of these intelligent systems.

35:18

🧑‍🤝‍🧑 Personalized AI for Everyone

The eighth paragraph focuses on the ultimate goal of making AI helpful and accessible to everyone. It discusses the combination of multimodality, long context, and AI agents as a means to organize the world's information and make it useful for individuals. The paragraph also introduces the concept of AI-first approach and the role of Google's infrastructure in supporting AI advancements.

40:19

🎓 LearnLM: Advancing Education with AI

The ninth paragraph introduces LearnLM, a new family of models based on Gemini and fine-tuned for educational purposes. It highlights the potential of LearnLM to provide personalized and engaging learning experiences through products like Search, Android, Gemini, and YouTube. The paragraph also mentions partnerships with educational institutions to enhance the capabilities of these models for learning.

45:21

🌟 The Impact of AI on Society and Future Innovations

The tenth paragraph emphasizes the real-world impact of AI, its role in solving global issues, and the ethical considerations guiding its development. It discusses Google's AI principles, the use of red-teaming and AI-assisted red teaming to improve model safety, and the expansion of the SynthID watermarking tool. The paragraph concludes with a forward-looking statement on the potential of AI to enhance learning and education.

50:23

🤝 Collaboration and the Era of AI Innovation

The eleventh paragraph celebrates the developer community's role in bringing AI innovations to life. It acknowledges the collaborative efforts in creating AI technologies and the ongoing journey to explore and build the future of AI. The paragraph ends with a tribute to the possibilities ahead and the commitment to creating them together.

55:32

📈 The Significance of AI in Google's Ecosystem

The twelfth paragraph reflects on the frequency of mentioning AI throughout the discussion, symbolizing the integral role of AI in Google's approach and offerings. It underscores Google's AI-first mindset, research leadership, and the infrastructure built for the AI era. The paragraph concludes with a nod to the developer community's contributions to realizing AI's potential.

Mindmap

Keywords

💡Artificial Intelligence (AI)

Artificial Intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is the central theme, with Google discussing its advancements in AI technology through projects like Gemini, which aims to revolutionize the way we work and interact with technology.

💡Gemini

Gemini is a generative AI model introduced by Google that is designed to be natively multimodal, capable of reasoning across various forms of input like text, images, video, and code. It is highlighted in the video as a significant step towards turning any input into any output, thus enabling a new generation of AI applications.

💡Multimodal

Multimodal refers to the ability of a system to process and understand multiple forms of input, such as text, speech, images, and video. In the context of the video, Google's Gemini model is described as multimodal, allowing it to function effectively across various types of data and providing a more integrated and human-like interaction experience.

💡Long Context

Long context denotes the capacity of an AI model to process and understand extensive amounts of information, such as lengthy texts or long-duration audio and video. The video emphasizes Gemini 1.5 Pro's ability to handle up to 1 million tokens in production, a significant breakthrough that allows the model to manage more complex and detailed tasks.

💡AI Overviews

AI Overviews is a feature that utilizes Google's AI capabilities to provide users with summarized and contextual answers to their queries. As mentioned in the video, this feature is part of the revamped Google Search experience, aiming to increase user satisfaction by offering comprehensive and insightful responses to search queries.

💡Google Workspace

Google Workspace is a suite of productivity and collaboration tools developed by Google, which includes Gmail, Docs, Drive, and Calendar, among others. In the video, it is discussed how Gemini's integration with Google Workspace can streamline tasks like email summarization and meeting highlights, enhancing productivity and efficiency for users.

💡AI Agents

AI Agents are intelligent systems that can perform tasks on behalf of users by reasoning, planning, and remembering steps. The video describes how Google is working on AI agents that can execute complex tasks like online shopping or moving to a new city, making the process more convenient and less time-consuming for users.

💡Project Astra

Project Astra is an initiative by Google that aims to develop advanced AI assistants with capabilities for faster processing, better understanding of context, and more natural conversational responses. The video showcases a prototype of these AI agents, demonstrating their potential to provide personalized and interactive experiences.

💡Tensor Processing Units (TPUs)

TPUs are specialized hardware accelerators designed to speed up machine learning tasks. In the video, Google announces the sixth generation of TPUs called Trillium, which offers significant improvements in compute performance, essential for training state-of-the-art models like Gemini.

💡AI-Generated Media

AI-Generated Media refers to the creation of content such as images, music, and video through artificial intelligence. The video highlights Google's advancements in this area with the introduction of models like Imagen 3 for image generation and Veo for generative video, which are set to transform creative industries by providing new tools for artists and developers.

💡AI Principles

AI Principles are a set of ethical guidelines that companies like Google follow to ensure the responsible development and use of AI technology. The video touches on Google's commitment to these principles, emphasizing the importance of safety, privacy, and the beneficial use of AI in society, as well as the continuous evaluation and improvement of AI models.

Highlights

Google has launched Gemini, a generative AI, which is transforming the way we work.

Google I/O introduced new beginnings and innovative solutions to age-old problems through advancements in AI.

Sundar Pichai emphasized that Google is in the early days of the AI platform shift with significant opportunities ahead.

Gemini models have demonstrated state-of-the-art performance on every multimodal benchmark.

Over 1.5 million developers are using Gemini models for various applications such as debugging code and building AI apps.

Google Search has been revolutionized by Gemini, allowing for more complex queries and photo-based searches.

Google Photos integration with Gemini makes searching through personal memories more accessible and efficient.

Google Workspace is set to enhance productivity with Gemini's capabilities, including summarizing emails and automating tasks.

Google is expanding the context window to 2 million tokens for developers, a significant step towards infinite context.

The introduction of Gemini 1.5 Flash, a lighter-weight model optimized for low latency and cost-efficient tasks at scale.

Project Astra represents the future of AI assistants, aiming to build a universal AI agent for everyday use.

Imagen 3, Google's most capable image generation model yet, offers more photorealistic and detailed results.

Google's Music AI Sandbox is a suite of tools that can create new instrumental sections and transfer styles between tracks.

Veo, the new generative video model, creates high-quality 1080P videos from various prompts, offering creative control.

Google's AI innovations are enabling more natural and interactive experiences with AI, with real-world applications.

Google is committed to responsible AI development, focusing on safety, privacy, and ethical considerations in AI advancements.

The introduction of LearnLM, a new family of models designed to enhance learning experiences across Google products.

Transcripts

00:00

[Cheers and Applause]. >>WOMAN: Google’s ambitions in 

00:01

artificial intelligence. >>MAN: Google launches Gemini, 

00:03

the generative AI. >> And it's completely changing 

00:06

the way we work. >> You know, a lot has happened 

00:09

in a year. There have been new beginnings. 

00:15

We found new ways to find new Ways to find new ideas. 

00:20

And new solutions to age-old problems. >> Sorry about your shirt. 

00:27

We dreamt of things -- >> Never too old for a 

00:30

treehouse. >> We trained for things. 

00:32

>> All right! Let’s go go go!

00:34

>> And learned about this thing. We found new paths, took the 

00:41

next step, and made the big leap. Cannon ball! 

00:52

We filled days like they were weeks. 

00:54

And more happened in months, than has happened in years. 

00:59

>> Hey, free eggs. >> Things got bigger,  

01:08

like waaay bigger. 

01:12

And it wasn’t all just for him, or for her. 

01:18

It was for everyone.

01:24

And you know what? 

01:27

We’re just getting started.

01:47

>>SUNDAR PICHAI:  Hi, everyone. Good morning. 

01:56

[Cheers and Applause]. welcome to Google I/O. 

01:57

It's great to have all of you with us. We have a few thousand 

02:00

developers with us here today at Shoreline. 

02:03

Millions more are joining virtually around the world. 

02:06

Thanks to everyone for being here. 

02:09

For those of you who haven’t seen I/O before, it’s basically 

02:13

Google’s version of the Eras Tour, but with fewer costume 

02:18

changes. [Laughter]. 

02:20

At Google, though, we are fully in our Gemini era. Before we get into it, I want to 

02:28

reflect on this moment we’re in. We’ve been investing in AI for 

02:33

more than a decade, and innovating  at every layer of the stack: 

02:38

Research, product, infrastructure We’re going to talk about it all today. 

02:43

Still, we are in the early  days of the AI platform shift. 

02:47

We see so much opportunity ahead for creators,  for developers, for startups, for everyone. 

02:56

Helping to drive those opportunities  is what our Gemini era is all about. 

03:01

So let’s get started. 

03:03

A year ago on this stage, we first shared our plans for 

03:06

Gemini, a frontier model built to be natively multimodal from 

03:11

the very beginning, that could reason across text, images, 

03:16

video, code, and more. It’s a big step in turning any 

03:20

input into any output. An I/O for a new generation. 

03:26

Since then we introduced the first Gemini models, our most 

03:29

capable yet. They demonstrated 

03:32

state-of-the-art performance on every multimodal benchmark. 

03:35

And that was just the beginning. Two months later, we introduced 

03:40

Gemini 1.5 Pro, delivering a big breakthrough in long context. 

03:46

It can run 1 million tokens in production, consistently. 

03:49

More than any other large-scale foundation model yet. 

03:53

We want everyone to benefit from what Gemini can do, so we’ve 

03:57

worked quickly to share these advances with all of you. 

04:01

Today, more than 1.5 million developers use Gemini models 

04:06

across our tools. You’re using it to debug code, 

04:10

get new insights, and build the next generation of AI 

04:13

applications. We’ve also been bringing 

04:17

Gemini’s breakthrough capabilities across our products 

04:20

in powerful ways. We’ll show examples today across 

04:24

Search, Photos, Workspace, Android and more. 

04:28

Today, all of our 2-billion user products use Gemini. 

04:32

And we’ve introduced new experiences, too, including on 

04:36

Mobile, where people can interact with Gemini directly 

04:39

through the app. Now available on Android and 

04:43

iOS. And through Gemini Advanced, 

04:46

which provides access to our most capable models. 

04:49

Over 1 million people have signed up to try it, in just 

04:52

three months. And it continues to show strong 

04:55

momentum. One of the most exciting 

04:58

transformations with Gemini has been in Google Search. 

05:02

In the past year, we’ve answered billions of queries as part of 

05:06

our Search Generative Experience. 

05:08

People are using it to Search in entirely new ways. 

05:12

And asking new types of questions, longer and more 

05:15

complex queries, even searching with photos, and getting back 

05:20

the best the web has to offer. We’ve been testing this 

05:24

experience outside of Labs, and we’re encouraged to see not only 

05:28

an increase in Search usage, but also an increase in user 

05:32

satisfaction. I’m excited to announce that 

05:35

we’ll begin launching this fully revamped experience, AI 

05:39

Overviews, to everyone in the U.S. this week. 

05:42

And we’ll bring it to more countries soon.

05:51

[Cheers and Applause]. There’s so much innovation 

05:53

happening in Search. Thanks to Gemini we can create 

05:57

much more powerful search experiences, including within 

06:00

our products. Let me show you an example in 

06:03

Google Photos. We launched Google Photos almost 

06:06

nine years ago. Since then, people have used it 

06:09

to organize their most important memories. 

06:12

Today that amounts to more than 6 billion photos and videos 

06:16

uploaded every single day. And people love using Photos to 

06:21

search across their life. With Gemini, we’re making that a 

06:24

whole lot easier. Say you’re at a parking station 

06:28

ready to pay, but you can’t recall your license plate 

06:31

number. Before, you could search Photos 

06:33

for keywords and then scroll through years’ worth of photos, 

06:37

looking for the right one. Now, you can simply ask Photos. 

06:43

It knows the cars that appear often, it triangulates which one 

06:46

is yours, and just tells you  the license plate number.

06:55

[Cheers and Applause]. And Ask Photos can help you 

06:57

search your memories in a deeper way. 

07:00

For example, you might be reminiscing about your daughter 

07:03

Lucia’s early milestones. You can ask photos, when did Lucia learn to swim? 

07:09

And you can follow up with up with something more complex. 

07:13

Show me how Lucia's swimming has progressed. Here, Gemini goes beyond a 

07:19

simple search, recognizing different contexts from doing 

07:23

laps in the pool, to snorkeling in the ocean, to the text and 

07:27

dates on her swimming certificates. 

07:29

And Photos packages it all up together in a summary, so you 

07:33

can really take it all in, and relive amazing memories all over 

07:37

again. We’re rolling out Ask Photos 

07:40

this summer, with more capabilities to come.

07:50

[Cheers and Applause]. Unlocking knowledge across 

07:51

formats is why we built Gemini to be multimodal from the ground 

07:54

up. It’s one model, with all the 

07:57

modalities built in. So not only does it understand 

08:00

each type of input, it finds connections between them. 

08:04

Multimodality radically expands the questions we can ask, and 

08:08

the answers we will get back. Long context takes this a step 

08:12

further, enabling us to bring in even more information, hundreds 

08:17

of pages of text, hours of audio, a full hour of video, or 

08:21

entire code repos. Or, if you want, roughly 96 

08:26

Cheesecake Factory menus. [Laughter]. 

08:29

For that many menus, you’d need a one million token context 

08:32

window, now possible with Gemini 1.5 Pro. 

08:36

Developers have been using it in super interesting ways. 

08:39

Let’s take a look. >> I remember the announcement, 

08:53

the 1 million token context window, and my first reaction 

08:57

was there's no way they were able to achieve this. 

08:59

>> I wanted to test its technical skills, so I uploaded 

09:04

a line chart. It was temperatures between like 

09:09

Tokyo and Berlin and how they were across the 12 months of the 

09:11

year. >> So  

09:12

I got in there and I threw in the Python library that was 

09:16

really struggling with and I just asked it a simple question. 

09:21

And it nailed it. It could find specific 

09:26

references to comments in the code and specific requests that 

09:30

people had made and other issues that people had had, but then 

09:34

suggest a fix for it that related to what I was working 

09:38

on. >> I immediately tried to kind 

09:41

of crash it. So I took, you know, four or 

09:44

five research papers I had on my desktop, and it's a mind-blowing 

09:48

experience when you add so much text, and then you see the kind 

09:52

of amount of tokens you add is not even at half the capacity. 

09:55

>> It felt a little bit like Christmas because you saw things 

09:59

kind of peppered up to the top of your feed about, like, oh, 

10:01

wow, I built this thing, or oh, it's doing this, and I would 

10:05

have never expected. >> Can I shoot a video of my 

10:07

possessions and turn that into a searchable database? 

10:11

So I ran to my bookshelf, and I shot video just panning my 

10:14

camera along the bookshelf and I fed the video into the model. 

10:18

It gave me the titles and authors of the books, even 

10:21

though the authors weren't visible on those book spines, 

10:24

and on the bookshelf there was a squirrel nut cracker sat in 

10:27

front of the book, truncating the title. 

10:29

You could just see the word "sightsee", and it still guessed 

10:32

the correct book. The range of things you can do 

10:33

with that is almost unlimited. >> So at that point for me was 

10:36

just like a click, like, this is it. 

10:39

I thought, like, I had like  a super power in my hands. 

10:41

>> It was poetry. It was beautiful. 

10:43

I was so happy! This is going to be amazing! 

10:48

This is going to help people! >> This is kind of where the 

10:50

future of language models are going. 

10:52

Personalized to you, not because you trained it to be personal to 

10:58

you, but personal to you because you can give it such a vast 

11:02

understanding of who you are. [Applause]. 

11:11

>>SUNDAR PICHAI: We’ve been rolling out Gemini 1.5 Pro with 

11:14

long context in preview over the last few months. 

11:17

We’ve made a series of quality improvements across translation, 

11:21

coding, and reasoning. You’ll see these updates 

11:24

reflected in the model starting today. 

11:27

I'm excited to announce that we’re bringing this improved 

11:29

version of Gemini 1.5 Pro to all developers globally.

11:41

[Cheers and Applause]. In addition, today Gemini 1.5 

11:44

Pro with 1 million context is now directly  available for consumers in Gemini Advanced,  

11:50

and can be used across 35 languages. One million tokens is opening up 

11:56

entirely new possibilities. It’s exciting, but I think we 

12:01

can push ourselves even further. So today, we are expanding the 

12:05

context window to 2 million Tokens.

12:15

[Cheers and Applause]. We are making it available  

12:16

for developers in private preview. It's amazing to look back and 

12:20

see just how much progress we've made in a few months. 

12:24

This represents the next step on our journey  towards the ultimate goal of infinite context. 

12:30

Okay. So far,  

12:31

we’ve talked about two technical advances: 

12:33

multimodality and long context. Each is powerful on its own. 

12:39

But together, they unlock deeper capabilities, and more 

12:42

intelligence. Let’s see how this comes to life 

12:46

with Google Workspace. People are always searching 

12:49

their emails in Gmail. We are working to make it much 

12:52

more powerful with Gemini. Let’s look at how. 

12:56

As a parent, you want to know everything that’s going on with 

13:00

your child’s school. Okay, maybe not everything, but 

13:04

you want to stay informed. Gemini can help you keep up. 

13:08

Now we can ask Gemini to summarize all recent emails from 

13:12

the school. In the background, it’s 

13:15

identifying relevant emails, and even analyzing attachments, like 

13:19

PDFs. And you get a summary of  

13:21

the key points and action items. So helpful. 

13:25

Maybe you were traveling this week and couldn’t make the PTA 

13:28

meeting. The recording of the meeting is 

13:31

an hour long. If it’s from Google Meet, you 

13:34

can ask Gemini to give you the highlights.

13:43

[Cheers and Applause]. There’s a parents group looking 

13:44

for volunteers, and you’re free that day. 

13:47

So of course, Gemini can draft a reply. 

13:50

There are countless other examples of how this can make 

13:52

life easier. Gemini 1.5 Pro is available 

13:56

today in Workspace Labs. Aparna will share more later on.

14:06

[Applause]. We just looked at an example with text outputs. 

14:14

But with a multimodal model, we can do so much more. 

14:17

To show you an early demo of an audio output in NotebookLM, 

14:22

here’s Josh. >>JOSH WOODWARD: Hi, everyone! 

14:32

Last year, at I/O, we introduced Notebook LM, a research and 

14:37

writing tool grounded in the information you give it. 

14:40

Since then, we've seen a lot of momentum with students and 

14:44

teachers using it. And today, Gemini 1.5 Pro is 

14:48

coming to Notebook LM, and it's great. 

14:51

Let me show you. So here we are in Notebook LM. 

14:55

You can load it up with all of the materials here on the left. 

14:59

In this notebook, I've been using it with my younger son,  

15:02

and I've added some of his science worksheets,  a few slide decks from the teacher, and even an  

15:08

open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook  

15:16

guide with a helpful summary and can generate  a study guide, an FAQ, or even quizzes. 

15:23

But for my son Jimmy, he really learns  best when he can listen to something. 

15:29

So we've prototyped a new feature with  Gemini, and it's called Audio Overviews. 

15:35

Notebook LM is going to use all of the materials on the left  

15:39

as input and output them into a lively  science discussion, personalized for him. 

15:46

Let's take a listen. >> So let's dive into physics. 

15:49

What's on deck for today? >> Well, we're starting with the 

15:52

basics. Force in motion. 

15:53

>> Okay. >> And that, of course, means we 

15:55

have to talk about Sir Isaac Newton and his three laws of 

15:57

motion. >> Ah,  

15:58

yes, the foundation -- For understanding how objects move and interact.

16:02

>>JOSH WOODWARD: Oh, yes, this is  where multimodal really shines. 

16:06

Now, it generated this audio  discussion based on that text material. 

16:11

And what's amazing is that my son  and I can join into the conversation. 

16:17

And steer it whichever direction we want. 

16:20

When I tap "join." >> Hold on. 

16:23

We have a question. What's up, Josh? 

16:26

>>JOSH WOODWARD: Yeah, can you give my son Jimmy a basketball 

16:29

example? >> Hey, Jimmy! 

16:35

That's a fantastic idea! Basketball is actually a great 

16:38

way to visualize force in motion. 

16:40

Let's break it down. >> Okay. 

16:41

So first, imagine a basketball just sitting there on the court. 

16:45

It's not moving, right? 

16:46

That's because all the forces acting on it are balanced. 

16:49

The downward pull of gravity -- >>JOSH WOODWARD: Pretty cool, right?

16:58

[Cheers and Applause]. I gotta say, the first  

17:00

time my son heard this, you should  have seen how big his eyes got! 

17:04

Because he was gripped. They were talking to him. 

17:07

He was learning science through  the example of basketball. 

17:11

His favorite sport. Now, what's interesting  

17:13

is under the hood, you saw that Gemini had used  some of the concepts of gravity, Sir Isaac Newton,  

17:19

but nothing in there was about basketball. It connected the dots and created that  

17:24

age-appropriate example for him. And this is what's becoming  

17:29

possible with the power of Gemini. You can give it lots of information in  

17:34

any format, and it can be transformed in a way  that's personalized and interactive for you. 

17:42

Back to you, Sundar. [Applause]. 

17:50

>>SUNDAR PICHAI: Thanks, Josh. The demo shows the real 

17:52

opportunity with multimodality. Soon you’ll be able to mix and 

17:56

match inputs and outputs. This is what we mean when we say 

17:59

it’s an I/O for a new generation. 

18:02

And I can see you all out there thinking about the 

18:05

possibilities. But what if we could go even 

18:07

further? That’s one of the opportunities 

18:10

we see with AI agents. Let me take a step back and 

18:13

explain what I mean by that. I think about them as 

18:17

intelligent systems that show reasoning, planning, and memory. 

18:21

Are able to “think” multiple steps ahead, work across 

18:25

software and systems, all to get something done on your behalf, 

18:30

and most importantly, under your supervision. 

18:33

We are still in the early days, and you’ll see glimpses of our 

18:37

approach throughout the day, but let me show you the kinds of use 

18:41

cases we are working hard to solve. Let’s talk about shopping. 

18:46

It’s pretty fun to shop for shoes, and a lot less fun to 

18:50

return them when they don’t fit. Imagine if Gemini could do all 

18:54

the steps for you: Searching your inbox for the receipt, 

18:59

locating the order number from your email, filling out a return 

19:03

form, and even scheduling a pickup. That's much easier, right?

19:10

[Applause]. Let’s take another example 

19:14

that’s a bit more complex. Say you just moved to Chicago. 

19:18

You can imagine Gemini and Chrome working together to help 

19:22

you do a number of things to get ready: Organizing, reasoning, 

19:27

synthesizing on your behalf. For example, you’ll want to 

19:30

explore the city and find services nearby, from 

19:33

dry-cleaners to dog-walkers. You will have to update your new  

19:37

address across dozens of Web sites. Gemini can work across these 

19:42

tasks and will prompt you for more information when needed, so 

19:46

you are always in control. That part is really important. 

19:49

as we prototype these experiences. We are thinking hard about how to do it in a way  

19:55

that's private, secure and works for everyone. These are simple-use cases, but 

20:01

they give you a good sense of the types of problems we want to 

20:04

solve, by building intelligent systems that think ahead, 

20:08

reason, and plan, all on your behalf. 

20:11

The power of Gemini, with multimodality, long context and 

20:16

agents, brings us closer to our ultimate goal: Making AI helpful 

20:22

for everyone. We see this as how we will make  

20:25

the most progress against our mission. Organizing the world’s 

20:29

information across every input, making it accessible via any 

20:34

output, and combining the world’s information with the 

20:37

information in your world in a way that’s truly useful for you. 

20:42

To fully realize the benefits of AI, we will continue to break 

20:46

new ground. Google DeepMind is hard at work 

20:50

on this. To share more, please welcome, 

20:52

for the first time on the I/O stage, Sir Demis.

20:58

[Applause]. >>DEMIS HASSABIS:  

21:10

Thanks, Sundar. 

21:11

It's so great to be here. Ever since I was a kid, playing 

21:16

chess for the England Junior Team, I’ve been thinking about 

21:19

the nature of intelligence. I was captivated by the idea of 

21:23

a computer that could think like a person. 

21:26

It’s ultimately why I became a programmer and studied 

21:29

neuroscience. I co-founded DeepMind in 2010 

21:33

with the goal of one day building AGI: Artificial general 

21:37

intelligence, a system that has human-level cognitive 

21:41

capabilities. I’ve always believed that if we 

21:44

could build this technology responsibly, its impact would be 

21:48

truly profound and it could benefit humanity in incredible 

21:51

ways. Last year,  

21:54

we reached a milestone on that path when we  formed Google DeepMind, combining AI talent  

21:58

from across the company in to one super unit. Since then, we've built AI systems that can  

22:04

do an amazing range of things, from turning  language and vision into action for robots,  

22:10

navigating complex virtual environments, involving  Olympiad level math problems, and even discovering  

22:18

thousands of new materials. Just last week, we announced  

22:22

our next generation AlphaFold model. It can predict the structure and interactions  

22:27

of nearly all of life's molecules, including how  proteins interact with strands of DNA and RNA. 

22:34

This will accelerate vitally important  biological and medical research from  

22:38

disease understanding to drug discovery. And all of this was made possible with the  

22:44

best infrastructure for the AI era, including  our highly optimized tensor processing units. 

22:51

At the center of our efforts is our Gemini model. It's built up from the ground up to be natively  

22:57

multimodal because that's how we interact  with and understand the world around us. 

23:02

We've built a variety of  models for different use cases. 

23:05

We've seen how powerful Gemini 1.5 Pro is,  but we also know from user feedback that some 

23:11

applications need lower latency and a lower cost to serve. 

23:16

So today we’re introducing Gemini 1.5 Flash.

23:21

[Cheers and Applause]. Flash is a lighter-weight model 

23:30

compared to Pro. It’s designed to be fast and 

23:33

cost-efficient to serve at scale, while still featuring 

23:36

multimodal reasoning capabilities and breakthrough 

23:38

long context. Flash is optimized for tasks 

23:42

where low latency and efficiency matter most. 

23:45

Starting today, you can use 1.5 Flash and 1.5 Pro with up to one 

23:50

million tokens in Google AI Studio and Vertex AI. 

23:54

And developers can sign up to try two million tokens. 

23:58

We’re so excited to see what all of you will create with it. 

24:02

And you'll hear a little more  about Flash later on from Josh. 

24:07

We’re very excited by the progress we’ve made so far with 

24:09

our family of Gemini models. But we’re always striving to 

24:12

push the state-of-the-art even further. 

24:16

At any one time we have many different models in training. 

24:19

And we use our very large and powerful ones to help teach and 

24:22

train our production-ready models. 

24:26

Together with user feedback, this cutting-edge research will 

24:28

help us to build amazing new products for billions of people. 

24:33

For example, in December, we shared a glimpse into the future 

24:37

of how people would interact with multimodal AI, and how this 

24:41

would end up powering a new set of transformative experiences. 

24:46

Today, we have some exciting new progress to share about the 

24:49

future of AI assistants that we’re calling Project Astra.

24:58

[Cheers and Applause]. For a long time, we’ve wanted to 

25:00

build a universal AI agent that can be truly helpful in everyday 

25:04

life. Our work making this vision a 

25:06

reality goes back many years. It's why we made Gemini multimodal  

25:10

from the very beginning. An agent like this has to 

25:14

understand and respond to our complex and dynamic world just 

25:17

like we do. It would need to take in and 

25:20

remember what it sees so it can understand context and take 

25:23

action. And it would have to be 

25:25

proactive, teachable and personal, so you can talk to it 

25:28

naturally, without lag or delay. While we’ve made great strides 

25:33

in developing AI systems that can understand multimodal 

25:36

information, getting response time down to something 

25:39

conversational is a difficult engineering challenge. 

25:42

Building on our Gemini model, we’ve developed agents that can 

25:45

process information faster by continuously encoding video 

25:49

frames, combining the video and speech input into a timeline of 

25:53

events, and caching this for efficient recall. 

25:56

We’ve also enhanced how they sound, with a wider range of 

26:00

intonations. These agents better understand 

26:03

the context you’re in, and can respond quickly in conversation, 

26:06

making the pace and quality of interaction feel much more 

26:09

natural. Here’s a video of our prototype, 

26:13

which you’ll see has two parts. Each part was captured in a 

26:17

single take, in real time. >> Okay. Let's do some tests. 

26:24

Tell me when you see something that makes sound. 

26:28

>> I see a speaker, which makes sound. 

26:31

>> What is that part of the speaker called? 

26:36

>> That is the tweeter. It produces high frequency 

26:40

sounds. >> Give me a creative 

26:45

alliteration about these. >> Creative crayons color 

26:50

cheerfully. They certainly craft colorful 

26:53

creations. >> What does that part of the