Google Keynote (Google I/O ‘24)
Summary
TLDR在谷歌I/O大会上,谷歌宣布了其在人工智能领域的一系列创新和进步。谷歌推出了名为Gemini的先进AI模型,该模型具备多模态功能,能够理解和生成文本、图像、视频和代码等。Gemini的1.5 Pro版本在长文本处理上取得了突破,能够处理高达100万个标记的文本,为开发者和用户带来了新的应用可能性。此外,谷歌还介绍了其在搜索、照片、工作空间和安卓系统中的应用案例,展示了AI如何在提高效率、增强用户体验和推动创新方面发挥作用。谷歌还强调了负责任的AI开发,包括提高模型安全性和防止滥用的措施。最后,谷歌透露了即将推出的新功能,如LearnLM教育模型,以及与教育专家合作,将AI技术应用于学习工具的计划。
Takeaways
- 🚀 谷歌展示了其在人工智能领域的雄心,推出了名为Gemini的生成型AI,这将彻底改变我们的工作方式。
- 📈 Gemini AI模型原生支持多模态输入,能够处理文本、图像、视频、代码等多种数据类型。
- 📱 谷歌正在将其产品如Search、Photos、Workspace、Android等与Gemini集成,以提供更强大的用户体验。
- 🔍 Gemini在谷歌搜索中的应用使得用户能够以全新的方式进行搜索,包括通过照片搜索和查询更复杂的问题。
- 📈 Gemini 1.5 Pro模型在长文本处理上取得了突破,能够处理高达100万个标记的文本,为开发者和消费者提供了新的可能。
- 🌟 谷歌介绍了新的Gemini 1.5 Flash模型,它是为了在保持多模态和长文本处理能力的同时,提供更快、更经济的解决方案。
- 🎓 谷歌正在开发名为LearnLM的新型学习模型,旨在通过个性化和互动的学习体验来增强教育。
- 🤖 谷歌提出了AI代理的概念,这些智能系统能够代表用户执行任务,例如购物、规划和组织信息。
- 📹 谷歌展示了其在生成性视频领域的最新进展,推出了名为Veo的视频生成模型,能够从文本、图像和视频提示中创建高清视频。
- 💡 Gemini的更新和新功能将在今年晚些时候推出,包括对开发者的新工具和改进,以及对消费者的新体验。
- 🌐 谷歌强调了负责任的AI开发的重要性,包括通过水印技术(SynthID)来防止AI生成内容的滥用,并与合作伙伴共同推动数字媒体透明度标准的建立。
Q & A
谷歌在人工智能领域的最新动态是什么?
-谷歌推出了名为Gemini的生成性人工智能,它能够改变我们的工作方式,通过理解文本、图像、视频、代码等多种模态的输入,并将它们转换成任何形式的输出。
Gemini AI在谷歌搜索中的应用有哪些改进?
-谷歌搜索通过Gemini AI提供了全新的搜索体验,包括AI概述,能够回答复杂问题,进行多步推理,并提供个性化的搜索结果页面。
Gemini 1.5 Pro模型的主要特点是什么?
-Gemini 1.5 Pro模型能够处理长达100万个令牌的文本,这是迄今为止最大规模的基础模型,能够在生产中一致性地运行。
谷歌如何通过AI提升其产品的用户体验?
-谷歌通过将Gemini AI集成到其产品中,如搜索、照片、工作空间、Android等,提供了更加个性化和强大的功能,比如通过照片搜索回忆、自动化的邮件摘要和回复建议等。
谷歌如何确保AI技术的负责任使用?
-谷歌遵循其AI原则,通过红队测试、AI辅助的红队测试、与内部安全专家和独立专家的反馈,以及开发像SynthID这样的工具来提高模型的安全性和防止滥用。
Gemini AI如何帮助开发者和创业者?
-谷歌提供了多种Gemini模型,包括1.5 Pro和1.5 Flash,以及即将推出的2百万令牌上下文窗口的私人预览,这些模型可以帮助开发者在全球范围内构建下一代AI应用程序。
谷歌如何利用AI技术推动教育领域的创新?
-谷歌推出了LearnLM模型家族,专门针对学习进行微调,以提供更个性化和吸引人的学习体验,并将这些模型集成到搜索、Android、Gemini和YouTube等产品中。
谷歌在AI领域的基础设施投资有哪些?
-谷歌投资了包括定制的张量处理单元(TPU)在内的世界领先的技术基础设施,以及优化的硬件和开放软件,以支持AI的进步。
谷歌如何通过AI技术帮助用户更有效地规划旅行?
-通过Gemini Advanced的旅行规划体验,用户可以上传航班和酒店信息,Gemini会利用这些数据创建一个动态的旅行选项图,并生成个性化的度假计划。
谷歌的AI技术在提升无障碍功能方面有哪些进展?
-谷歌正在改进TalkBack功能,利用Gemini Nano的多模态能力,为视障用户提供更丰富和清晰的照片描述,帮助他们更好地进行网上购物和日常导航。
谷歌如何通过AI技术保护用户免受诈骗的侵害?
-谷歌在Android上使用Gemini Nano模型,当检测到可疑活动,如银行要求转移资金以保证安全时,能够在设备上即时提醒用户,保护用户的隐私和安全。
Outlines
🚀 谷歌AI的新时代:Gemini的启动与影响
谷歌在人工智能领域的雄心壮志,推出了一款名为Gemini的生成型人工智能,它正在彻底改变我们的工作方式。从新起点到解决老问题的新方法,谷歌在这一年中经历了许多变革。谷歌CEO Sundar Pichai在Google I/O大会上欢迎开发者,并强调了AI在研究、产品和基础设施各层面的创新。Gemini作为多模态模型,能够处理文本、图像、视频和代码等多种输入,是AI领域的一大步。目前,超过150万开发者正在使用Gemini模型进行代码调试、洞察新知和构建AI应用的下一代。
🔍 Gemini在谷歌搜索中的革新性应用
谷歌搜索通过Gemini实现了生成性搜索体验,用户现在可以使用全新的方式进行搜索,包括提出更复杂的问题、使用照片搜索等。谷歌正在测试这种体验,并计划将其推广到更多国家。此外,谷歌照片(Google Photos)也通过Gemini进行了改进,使用户能够更轻松地搜索和整理照片。
📚 Gemini的多模态能力和长文本处理能力
Gemini的多模态特性使其能够理解不同类型的输入,并找到它们之间的联系。长文本处理能力则允许引入更多信息,如数百页的文本、数小时的音频或完整的视频。开发者已经利用这些特性进行了有趣的创新尝试,例如通过视频将书架上的书籍转换成可搜索数据库。
🌐 Gemini 1.5 Pro的全球推广和新特性
谷歌正在将改进版的Gemini 1.5 Pro推广给全球开发者,并为消费者提供Gemini Advanced,支持35种语言。此外,谷歌宣布将上下文窗口扩展到200万个标记,并为开发者提供私有预览。这标志着向无限上下文的终极目标迈出的下一步。
🤖 Gemini在谷歌工作空间(Google Workspace)中的应用
谷歌工作空间通过Gemini的集成,使得电子邮件搜索更加强大。例如,家长可以要求Gemini总结学校最近的电子邮件,它能够识别相关邮件并分析附件,提供关键点和行动项的摘要。此外,Gemini还能够为长时间的会议录音提供重点摘要,并且可以帮助起草回复邮件。
🎓 LearnLM:基于Gemini的教育模型家族
谷歌宣布了LearnLM,这是一系列基于Gemini并针对学习进行微调的模型。LearnLM基于教育研究,旨在使学习体验更加个性化和吸引人。这些模型将被集成到用户日常使用的产品中,如搜索、安卓、Gemini和YouTube。LearnLM还将与教育机构合作,以测试和改进模型的学习功能,并与教师合作开发更多有益的生成型AI工具。
📱 Android与Gemini的融合:智能手机的新纪元
谷歌正在将AI作为Android体验的核心,通过AI提升整个智能手机体验。Android是第一个内置了设备上的基座模型的移动操作系统,这使得体验更快,同时也保护了用户隐私。从今年晚些时候开始,Pixel手机将扩展Gemini Nano的功能,包括多模态能力。此外,谷歌还计划在Android上直接构建更多基于Gemini的AI功能,如改进的TalkBack无障碍功能和欺诈电话警告。
🤖 Gemini作为系统级AI助理在Android上的应用
在Android上,Gemini不仅仅是一个应用,它正在成为Android体验的基础部分。谷歌正在使Gemini具有上下文感知能力,使其能够预测用户的需求并提供及时的帮助。例如,用户可以直接在消息应用中使用Gemini创建图像或询问视频内容。这些改进将在未来几个月内推广到数亿设备上。
📈 Gemini 1.5系列:Pro和Flash模型的全球可用性
谷歌宣布了Gemini 1.5系列的两个模型,1.5 Pro和全新的1.5 Flash,它们都具有多模态能力,并且在全球200多个国家和地区可用。这些模型可以直接在AI Studio或Vertex AI中尝试。谷歌还为开发者提供了新的功能,如视频帧提取、并行函数调用和上下文缓存,以提高长文本处理的效率和可负担性。
🌟 Gemma:推动AI创新和责任的开放模型家族
Gemma是谷歌的开放模型家族,它基于与Gemini相同的研究和技术构建,提供高性能的7亿和20亿参数模型。自从推出以来,Gemma已被下载数百万次,并被开发者和研究人员用于各种定制应用。Gemma的新成员PaliGemma是首个视觉-语言模型,适用于图像描述、视觉问答等任务。此外,Gemma 2将于6月推出,包含27亿参数模型,专为下一代GPU和TPU优化。
🧪 负责任的AI:谷歌如何确保AI技术的安全性和益处
谷歌正在通过多种方式确保AI技术的安全性和益处。公司正在改进模型,使用红队测试和AI辅助的红队技术来识别弱点,并防止模型的滥用。谷歌还与内部安全专家和独立专家合作,以识别新的风险。此外,谷歌正在开发新工具,如SynthID,以防止AI生成的内容被用于传播虚假信息。公司还致力于通过开放源代码和与生态系统合作,帮助其他开发者负责任地构建AI。
🌐 谷歌AI的全球影响力和未来展望
谷歌的AI技术正在帮助全球的科学家、教育者和个人。例如,AlphaFold正在帮助190个国家的180万科学家研究被忽视的疾病,而Data Commons正在帮助组织跟踪联合国的17个可持续发展目标。谷歌的AI技术也在教育领域开辟了新的可能性,如为每个学生提供个性化的AI导师。谷歌正在与教育机构合作,将LearnLM集成到Google Classroom等产品中,以简化课程规划并满足学生的个性化需求。
Mindmap
Keywords
💡人工智能
💡多模态
💡长文本处理
💡个性化
💡实时信息
💡安全性
💡可访问性
💡自然语言处理
💡机器学习
💡教育
💡隐私保护
Highlights
谷歌在人工智能领域的雄心壮志,推出Gemini,一种全新的生成型人工智能,正在彻底改变我们的工作方式。
谷歌展示了过去一年在AI领域的进步,包括新起点、创新思维和解决老问题的全新方案。
谷歌CEO Sundar Pichai在Google I/O大会上欢迎全球开发者,并强调了AI在产品和基础设施各层面的创新。
Gemini模型作为原生多模态模型,能够理解文本、图像、视频、代码等多种输入,并转换为任何输出。
Gemini 1.5 Pro模型在长文本处理上取得突破,能够处理高达100万个标记的文本,超越了以往任何大规模基础模型。
超过150万开发者正在使用Gemini模型进行代码调试、获取新见解和构建下一代AI应用。
谷歌搜索通过Gemini实现了新的搜索生成体验,用户可以使用照片进行搜索,并得到网络上最佳结果。
谷歌照片利用Gemini使搜索个人照片变得更加容易,能够识别常用车辆并提供详细信息。
谷歌Workspace利用Gemini增强了电子邮件搜索功能,能够总结学校邮件并分析附件,提供关键点和行动项。
谷歌演示了Notebook LM使用Gemini 1.5 Pro创建的音频概览,为学生提供个性化的学习体验。
谷歌展示了如何通过AI代理来简化复杂的任务,如购物和搬家,通过智能系统代表用户执行多步骤操作。
DeepMind的Demis Hassabis讨论了公司在创建通用AI助手方面的进展,这些助手能够理解和响应我们的复杂和动态世界。
谷歌宣布了新的Imagen 3图像生成模型,它能够根据文本提示生成高分辨率、真实感强的图像。
Veo是谷歌最新的视频生成模型,能够从文本、图像和视频提示中创建1080P高清视频。
谷歌展示了其在自然语言理解和计算机视觉方面的最新进展,这些技术将被整合到下一代谷歌搜索中。
谷歌正在开发LearnLM,一系列基于Gemini并针对学习进行微调的模型,旨在使学习体验更加个性化和吸引人。
Transcripts
[Cheers and Applause]. >>WOMAN: Google’s ambitions in
artificial intelligence. >>MAN: Google launches Gemini,
the generative AI. >> And it's completely changing
the way we work. >> You know, a lot has happened
in a year. There have been new beginnings.
We found new ways to find new Ways to find new ideas.
And new solutions to age-old problems. >> Sorry about your shirt.
We dreamt of things -- >> Never too old for a
treehouse. >> We trained for things.
>> All right! Let’s go go go!
>> And learned about this thing. We found new paths, took the
next step, and made the big leap. Cannon ball!
We filled days like they were weeks.
And more happened in months, than has happened in years.
>> Hey, free eggs. >> Things got bigger,
like waaay bigger.
And it wasn’t all just for him, or for her.
It was for everyone.
And you know what?
We’re just getting started.
>>SUNDAR PICHAI: Hi, everyone. Good morning.
[Cheers and Applause]. welcome to Google I/O.
It's great to have all of you with us. We have a few thousand
developers with us here today at Shoreline.
Millions more are joining virtually around the world.
Thanks to everyone for being here.
For those of you who haven’t seen I/O before, it’s basically
Google’s version of the Eras Tour, but with fewer costume
changes. [Laughter].
At Google, though, we are fully in our Gemini era. Before we get into it, I want to
reflect on this moment we’re in. We’ve been investing in AI for
more than a decade, and innovating at every layer of the stack:
Research, product, infrastructure We’re going to talk about it all today.
Still, we are in the early days of the AI platform shift.
We see so much opportunity ahead for creators, for developers, for startups, for everyone.
Helping to drive those opportunities is what our Gemini era is all about.
So let’s get started.
A year ago on this stage, we first shared our plans for
Gemini, a frontier model built to be natively multimodal from
the very beginning, that could reason across text, images,
video, code, and more. It’s a big step in turning any
input into any output. An I/O for a new generation.
Since then we introduced the first Gemini models, our most
capable yet. They demonstrated
state-of-the-art performance on every multimodal benchmark.
And that was just the beginning. Two months later, we introduced
Gemini 1.5 Pro, delivering a big breakthrough in long context.
It can run 1 million tokens in production, consistently.
More than any other large-scale foundation model yet.
We want everyone to benefit from what Gemini can do, so we’ve
worked quickly to share these advances with all of you.
Today, more than 1.5 million developers use Gemini models
across our tools. You’re using it to debug code,
get new insights, and build the next generation of AI
applications. We’ve also been bringing
Gemini’s breakthrough capabilities across our products
in powerful ways. We’ll show examples today across
Search, Photos, Workspace, Android and more.
Today, all of our 2-billion user products use Gemini.
And we’ve introduced new experiences, too, including on
Mobile, where people can interact with Gemini directly
through the app. Now available on Android and
iOS. And through Gemini Advanced,
which provides access to our most capable models.
Over 1 million people have signed up to try it, in just
three months. And it continues to show strong
momentum. One of the most exciting
transformations with Gemini has been in Google Search.
In the past year, we’ve answered billions of queries as part of
our Search Generative Experience.
People are using it to Search in entirely new ways.
And asking new types of questions, longer and more
complex queries, even searching with photos, and getting back
the best the web has to offer. We’ve been testing this
experience outside of Labs, and we’re encouraged to see not only
an increase in Search usage, but also an increase in user
satisfaction. I’m excited to announce that
we’ll begin launching this fully revamped experience, AI
Overviews, to everyone in the U.S. this week.
And we’ll bring it to more countries soon.
[Cheers and Applause]. There’s so much innovation
happening in Search. Thanks to Gemini we can create
much more powerful search experiences, including within
our products. Let me show you an example in
Google Photos. We launched Google Photos almost
nine years ago. Since then, people have used it
to organize their most important memories.
Today that amounts to more than 6 billion photos and videos
uploaded every single day. And people love using Photos to
search across their life. With Gemini, we’re making that a
whole lot easier. Say you’re at a parking station
ready to pay, but you can’t recall your license plate
number. Before, you could search Photos
for keywords and then scroll through years’ worth of photos,
looking for the right one. Now, you can simply ask Photos.
It knows the cars that appear often, it triangulates which one
is yours, and just tells you the license plate number.
[Cheers and Applause]. And Ask Photos can help you
search your memories in a deeper way.
For example, you might be reminiscing about your daughter
Lucia’s early milestones. You can ask photos, when did Lucia learn to swim?
And you can follow up with up with something more complex.
Show me how Lucia's swimming has progressed. Here, Gemini goes beyond a
simple search, recognizing different contexts from doing
laps in the pool, to snorkeling in the ocean, to the text and
dates on her swimming certificates.
And Photos packages it all up together in a summary, so you
can really take it all in, and relive amazing memories all over
again. We’re rolling out Ask Photos
this summer, with more capabilities to come.
[Cheers and Applause]. Unlocking knowledge across
formats is why we built Gemini to be multimodal from the ground
up. It’s one model, with all the
modalities built in. So not only does it understand
each type of input, it finds connections between them.
Multimodality radically expands the questions we can ask, and
the answers we will get back. Long context takes this a step
further, enabling us to bring in even more information, hundreds
of pages of text, hours of audio, a full hour of video, or
entire code repos. Or, if you want, roughly 96
Cheesecake Factory menus. [Laughter].
For that many menus, you’d need a one million token context
window, now possible with Gemini 1.5 Pro.
Developers have been using it in super interesting ways.
Let’s take a look. >> I remember the announcement,
the 1 million token context window, and my first reaction
was there's no way they were able to achieve this.
>> I wanted to test its technical skills, so I uploaded
a line chart. It was temperatures between like
Tokyo and Berlin and how they were across the 12 months of the
year. >> So
I got in there and I threw in the Python library that was
really struggling with and I just asked it a simple question.
And it nailed it. It could find specific
references to comments in the code and specific requests that
people had made and other issues that people had had, but then
suggest a fix for it that related to what I was working
on. >> I immediately tried to kind
of crash it. So I took, you know, four or
five research papers I had on my desktop, and it's a mind-blowing
experience when you add so much text, and then you see the kind
of amount of tokens you add is not even at half the capacity.
>> It felt a little bit like Christmas because you saw things
kind of peppered up to the top of your feed about, like, oh,
wow, I built this thing, or oh, it's doing this, and I would
have never expected. >> Can I shoot a video of my
possessions and turn that into a searchable database?
So I ran to my bookshelf, and I shot video just panning my
camera along the bookshelf and I fed the video into the model.
It gave me the titles and authors of the books, even
though the authors weren't visible on those book spines,
and on the bookshelf there was a squirrel nut cracker sat in
front of the book, truncating the title.
You could just see the word "sightsee", and it still guessed
the correct book. The range of things you can do
with that is almost unlimited. >> So at that point for me was
just like a click, like, this is it.
I thought, like, I had like a super power in my hands.
>> It was poetry. It was beautiful.
I was so happy! This is going to be amazing!
This is going to help people! >> This is kind of where the
future of language models are going.
Personalized to you, not because you trained it to be personal to
you, but personal to you because you can give it such a vast
understanding of who you are. [Applause].
>>SUNDAR PICHAI: We’ve been rolling out Gemini 1.5 Pro with
long context in preview over the last few months.
We’ve made a series of quality improvements across translation,
coding, and reasoning. You’ll see these updates
reflected in the model starting today.
I'm excited to announce that we’re bringing this improved
version of Gemini 1.5 Pro to all developers globally.
[Cheers and Applause]. In addition, today Gemini 1.5
Pro with 1 million context is now directly available for consumers in Gemini Advanced,
and can be used across 35 languages. One million tokens is opening up
entirely new possibilities. It’s exciting, but I think we
can push ourselves even further. So today, we are expanding the
context window to 2 million Tokens.
[Cheers and Applause]. We are making it available
for developers in private preview. It's amazing to look back and
see just how much progress we've made in a few months.
This represents the next step on our journey towards the ultimate goal of infinite context.
Okay. So far,
we’ve talked about two technical advances:
multimodality and long context. Each is powerful on its own.
But together, they unlock deeper capabilities, and more
intelligence. Let’s see how this comes to life
with Google Workspace. People are always searching
their emails in Gmail. We are working to make it much
more powerful with Gemini. Let’s look at how.
As a parent, you want to know everything that’s going on with
your child’s school. Okay, maybe not everything, but
you want to stay informed. Gemini can help you keep up.
Now we can ask Gemini to summarize all recent emails from
the school. In the background, it’s
identifying relevant emails, and even analyzing attachments, like
PDFs. And you get a summary of
the key points and action items. So helpful.
Maybe you were traveling this week and couldn’t make the PTA
meeting. The recording of the meeting is
an hour long. If it’s from Google Meet, you
can ask Gemini to give you the highlights.
[Cheers and Applause]. There’s a parents group looking
for volunteers, and you’re free that day.
So of course, Gemini can draft a reply.
There are countless other examples of how this can make
life easier. Gemini 1.5 Pro is available
today in Workspace Labs. Aparna will share more later on.
[Applause]. We just looked at an example with text outputs.
But with a multimodal model, we can do so much more.
To show you an early demo of an audio output in NotebookLM,
here’s Josh. >>JOSH WOODWARD: Hi, everyone!
Last year, at I/O, we introduced Notebook LM, a research and
writing tool grounded in the information you give it.
Since then, we've seen a lot of momentum with students and
teachers using it. And today, Gemini 1.5 Pro is
coming to Notebook LM, and it's great.
Let me show you. So here we are in Notebook LM.
You can load it up with all of the materials here on the left.
In this notebook, I've been using it with my younger son,
and I've added some of his science worksheets, a few slide decks from the teacher, and even an
open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook
guide with a helpful summary and can generate a study guide, an FAQ, or even quizzes.
But for my son Jimmy, he really learns best when he can listen to something.
So we've prototyped a new feature with Gemini, and it's called Audio Overviews.
Notebook LM is going to use all of the materials on the left
as input and output them into a lively science discussion, personalized for him.
Let's take a listen. >> So let's dive into physics.
What's on deck for today? >> Well, we're starting with the
basics. Force in motion.
>> Okay. >> And that, of course, means we
have to talk about Sir Isaac Newton and his three laws of
motion. >> Ah,
yes, the foundation -- For understanding how objects move and interact.
>>JOSH WOODWARD: Oh, yes, this is where multimodal really shines.
Now, it generated this audio discussion based on that text material.
And what's amazing is that my son and I can join into the conversation.
And steer it whichever direction we want.
When I tap "join." >> Hold on.
We have a question. What's up, Josh?
>>JOSH WOODWARD: Yeah, can you give my son Jimmy a basketball
example? >> Hey, Jimmy!
That's a fantastic idea! Basketball is actually a great
way to visualize force in motion.
Let's break it down. >> Okay.
So first, imagine a basketball just sitting there on the court.
It's not moving, right?
That's because all the forces acting on it are balanced.
The downward pull of gravity -- >>JOSH WOODWARD: Pretty cool, right?
[Cheers and Applause]. I gotta say, the first
time my son heard this, you should have seen how big his eyes got!
Because he was gripped. They were talking to him.
He was learning science through the example of basketball.
His favorite sport. Now, what's interesting
is under the hood, you saw that Gemini had used some of the concepts of gravity, Sir Isaac Newton,
but nothing in there was about basketball. It connected the dots and created that
age-appropriate example for him. And this is what's becoming
possible with the power of Gemini. You can give it lots of information in
any format, and it can be transformed in a way that's personalized and interactive for you.
Back to you, Sundar. [Applause].
>>SUNDAR PICHAI: Thanks, Josh. The demo shows the real
opportunity with multimodality. Soon you’ll be able to mix and
match inputs and outputs. This is what we mean when we say
it’s an I/O for a new generation.
And I can see you all out there thinking about the
possibilities. But what if we could go even
further? That’s one of the opportunities
we see with AI agents. Let me take a step back and
explain what I mean by that. I think about them as
intelligent systems that show reasoning, planning, and memory.
Are able to “think” multiple steps ahead, work across
software and systems, all to get something done on your behalf,
and most importantly, under your supervision.
We are still in the early days, and you’ll see glimpses of our
approach throughout the day, but let me show you the kinds of use
cases we are working hard to solve. Let’s talk about shopping.
It’s pretty fun to shop for shoes, and a lot less fun to
return them when they don’t fit. Imagine if Gemini could do all
the steps for you: Searching your inbox for the receipt,
locating the order number from your email, filling out a return
form, and even scheduling a pickup. That's much easier, right?
[Applause]. Let’s take another example
that’s a bit more complex. Say you just moved to Chicago.
You can imagine Gemini and Chrome working together to help
you do a number of things to get ready: Organizing, reasoning,
synthesizing on your behalf. For example, you’ll want to
explore the city and find services nearby, from
dry-cleaners to dog-walkers. You will have to update your new
address across dozens of Web sites. Gemini can work across these
tasks and will prompt you for more information when needed, so
you are always in control. That part is really important.
as we prototype these experiences. We are thinking hard about how to do it in a way
that's private, secure and works for everyone. These are simple-use cases, but
they give you a good sense of the types of problems we want to
solve, by building intelligent systems that think ahead,
reason, and plan, all on your behalf.
The power of Gemini, with multimodality, long context and
agents, brings us closer to our ultimate goal: Making AI helpful
for everyone. We see this as how we will make
the most progress against our mission. Organizing the world’s
information across every input, making it accessible via any
output, and combining the world’s information with the
information in your world in a way that’s truly useful for you.
To fully realize the benefits of AI, we will continue to break
new ground. Google DeepMind is hard at work
on this. To share more, please welcome,
for the first time on the I/O stage, Sir Demis.
[Applause]. >>DEMIS HASSABIS:
Thanks, Sundar.
It's so great to be here. Ever since I was a kid, playing
chess for the England Junior Team, I’ve been thinking about
the nature of intelligence. I was captivated by the idea of
a computer that could think like a person.
It’s ultimately why I became a programmer and studied
neuroscience. I co-founded DeepMind in 2010
with the goal of one day building AGI: Artificial general
intelligence, a system that has human-level cognitive
capabilities. I’ve always believed that if we
could build this technology responsibly, its impact would be
truly profound and it could benefit humanity in incredible
ways. Last year,
we reached a milestone on that path when we formed Google DeepMind, combining AI talent
from across the company in to one super unit. Since then, we've built AI systems that can
do an amazing range of things, from turning language and vision into action for robots,
navigating complex virtual environments, involving Olympiad level math problems, and even discovering
thousands of new materials. Just last week, we announced
our next generation AlphaFold model. It can predict the structure and interactions
of nearly all of life's molecules, including how proteins interact with strands of DNA and RNA.
This will accelerate vitally important biological and medical research from
disease understanding to drug discovery. And all of this was made possible with the
best infrastructure for the AI era, including our highly optimized tensor processing units.
At the center of our efforts is our Gemini model. It's built up from the ground up to be natively
multimodal because that's how we interact with and understand the world around us.
We've built a variety of models for different use cases.
We've seen how powerful Gemini 1.5 Pro is, but we also know from user feedback that some
applications need lower latency and a lower cost to serve.
So today we’re introducing Gemini 1.5 Flash.
[Cheers and Applause]. Flash is a lighter-weight model
compared to Pro. It’s designed to be fast and
cost-efficient to serve at scale, while still featuring
multimodal reasoning capabilities and breakthrough
long context. Flash is optimized for tasks
where low latency and efficiency matter most.
Starting today, you can use 1.5 Flash and 1.5 Pro with up to one
million tokens in Google AI Studio and Vertex AI.
And developers can sign up to try two million tokens.
We’re so excited to see what all of you will create with it.
And you'll hear a little more about Flash later on from Josh.
We’re very excited by the progress we’ve made so far with
our family of Gemini models. But we’re always striving to
push the state-of-the-art even further.
At any one time we have many different models in training.
And we use our very large and powerful ones to help teach and
train our production-ready models.
Together with user feedback, this cutting-edge research will
help us to build amazing new products for billions of people.
For example, in December, we shared a glimpse into the future
of how people would interact with multimodal AI, and how this
would end up powering a new set of transformative experiences.
Today, we have some exciting new progress to share about the
future of AI assistants that we’re calling Project Astra.
[Cheers and Applause]. For a long time, we’ve wanted to
build a universal AI agent that can be truly helpful in everyday
life. Our work making this vision a
reality goes back many years. It's why we made Gemini multimodal
from the very beginning. An agent like this has to
understand and respond to our complex and dynamic world just
like we do. It would need to take in and
remember what it sees so it can understand context and take
action. And it would have to be
proactive, teachable and personal, so you can talk to it
naturally, without lag or delay. While we’ve made great strides
in developing AI systems that can understand multimodal
information, getting response time down to something
conversational is a difficult engineering challenge.
Building on our Gemini model, we’ve developed agents that can
process information faster by continuously encoding video
frames, combining the video and speech input into a timeline of
events, and caching this for efficient recall.
We’ve also enhanced how they sound, with a wider range of
intonations. These agents better understand
the context you’re in, and can respond quickly in conversation,
making the pace and quality of interaction feel much more
natural. Here’s a video of our prototype,
which you’ll see has two parts. Each part was captured in a
single take, in real time. >> Okay. Let's do some tests.
Tell me when you see something that makes sound.
>> I see a speaker, which makes sound.
>> What is that part of the speaker called?
>> That is the tweeter. It produces high frequency
sounds. >> Give me a creative
alliteration about these. >> Creative crayons color
cheerfully. They certainly craft colorful
creations. >> What does that part of the
code do? >> This code defines encryption
and decryption functions. It seems to use AES-CBC
encryption to encode and decode data based on a key and an
initialization vector (IV). >> That's right.
What neighborhood do you think I'm in?
>> This appears to be the Kings Cross area of London.
It is known for its railway station and transportation
connections. >> Do you remember where you saw
my glasses? >> Yes, I do.
Your glasses were on the desk near a red apple.
[Applause]. >> What can I add here to make
this system faster? >>
Adding a cache between the server and database could
improve speed. >> What does this remind you of?
>> Schroedinger's cat. >> All right.
Give me a band name for thisduo.
>> Golden Stripes. >> Nice. Thanks, Gemini.
[Applause]. >>DEMIS HASSABIS:
I think you'll agree it's amazing to see how
far AI has come, especially when it comes to spatial
understanding, video processing and memory.
It’s easy to envisage a future where you can have an expert
assistant by your side through your phone or new exciting form
factors like glasses. Some of these agent capabilities
will come to Google products like the Gemini app later this
year. For those of you onsite today,
you can try out a live demo version of this experience in
the AI Sandbox area. [Cheers and Applause].
Next, let’s take a look at how our innovations are helping
people bring new creative ideas to life.
Today, we’re introducing a series of updates across our
generative media tools with new models covering image, music and
video. Over the past year, we’ve been
enhancing quality, improving safety and increasing access.
To help tell this story, here’s Doug.
[Applause]. >>DOUG ECK: Thanks, Demis.
Over the past few months, we’ve been working hard
to build a new image generation model from the ground up, with
stronger evaluations, extensive red teaming, and
state-of-the-art watermarking with SynthID.
Today, I’m so excited to introduce Imagen 3.
It’s our most capable image generation model yet.
Imagen 3 is more photorealistic.
You can literally count the whiskers on its snout.
With richer details, like the incredible sunlight in this
shot, and fewer visual artifacts or distorted images.
It understands prompts written the way people write.
The more creative and detailed you are, the better.
And Imagen 3 remembers to incorporate small details like
the ‘wildflowers’ or ‘a small blue bird’ in this longer
prompt. Plus, this is our best model yet
for rendering text, which has been a challenge for image
generation models. In side-by-side comparisons,
independent evaluators preferred Imagen 3 over other
popular image generation models. In sum, Imagen 3 is our
highest-quality image generation model so far.
You can sign up today to try Imagen 3 in ImageFX, part of our
suite of AI tools at labs.Google, and it will be
coming soon to developers and enterprise customers in Vertex
AI. Another area, full of creative
possibility, is generative music.
I’ve been working in this space for over 20 years and this has
by far the most exciting year of my career. We’re exploring ways of working
with artists to expand their creativity with AI.
Together with YouTube, we’ve been building Music AI Sandbox,
a suite of professional music AI tools that can create new
instrumental sections from scratch, transfer styles between tracks, and more.
To help us design and test them, we’ve been working closely with
incredible musicians, songwriters and producers.
Some of them made even entirely new songs in ways that would not have been
possible without these tools. Let’s hear from some of the
artists we’ve been working with. >>
I'm going to put this right back into the Music AI tool.
The same Boom, boom, bam, boom, boom.
What happens if Haiti meets Brazil?
Dude, I have no clue what's about to be sprat out.
This is what excites me. Da da See see see.
As a hip hop producer, we dug in the crates.
We playin’ these vinyls, and the part where there's no vocal, we
pull it, we sample it, and we create an entire song around
that. So right now we digging in the
infinite crate. It’s endless.
Where I found the AI really useful for me, this way to like fill in the sparser sort of elements
of my loops. Okay.
Let's try bongos. We're going to putviola.
We're going to put rhythmic clapping, and we're going to see
what happens there. Oh, and it makes it sound,
ironically, at the end of the day, a little more human.
So then this is entirely Google's loops right here.
These are Gloops. So it's like having, like, this
weird friend that's just like,
try this, try that. And then you're like, Oh, okay.
Yeah. No, that's pretty dope.
(indistinct noises) >> The tools are capable of
speeding up the process of what's in my head, getting it
out. You're able to move lightspeed
with your creativity. This is amazing.
That right there. [Applause].
>>DEMIS HASSABIS: I think this really shows what’s possible
when we work with the artist community on the future of
music. You can find some brand new
songs from these acclaimed artists and songwriters on their
YouTube channels now. There's one more area I'm
really excited to share with you. Our teams have made some
incredible progress in generative video.
Today, I’m excited to announce our newest, most capable
generative video model, called Veo.
[Cheers and Applause]. Veo creates high-quality, 1080P
videos from text, image and video prompts.
It can capture the details of your instructions in different
visual and cinematic styles. You can prompt for things like
aerial shots of a landscape or a time lapse, and further edit
your videos using additional prompts.
You can use Veo in our new experimental tool called
VideoFX. We’re exploring features like
storyboarding and generating longer scenes.
Veo gives you unprecedented creative control.
Techniques for generating static images have come a long way.
But generating video is a different challenge altogether.
Not only is it important to understand where an object or
subject should be in space, it needs to maintain this
consistency over time, just like the car in this video.
Veo builds upon years of our pioneering generative video
model work, including GQN, Phenaki, Walt, VideoPoet,
Lumiere and much more. We combined the best of these
architectures and techniques to improve consistency, quality and
output resolution. To see what Veo can do, we put
it in the hands of an amazing filmmaker.
Let’s take a look. >>DONALD GLOVER: Well, I've been
interested in AI for a couple of years now.
We got in contact with some of the people at Google, and they
had been working on something of their own.
So we're all meeting here at Gilga Farms to make a short
film. >>KORY MATHEWSON: The core
technology is Google Deep Mind’s
generative video model that has been trained to convert input
text into output video. [Laughter].
>>DONALD GLOVER: It looks good. >>KORY MATHEWSON: We are able to
bring ideas to life that were otherwise not possible.
We can visualize things of time scale that’s 10 or 100 times
faster than before. >>MATTHIEU KIM LORRAIN: When
you're shooting, you can't really iterate as much as you
wish. And so we've been hearing the
feedback that it allows for more optionality, more iteration,
more improvisation. >>DONALD GLOVER: But that's
what's cool about it. It's like you can make a mistake
faster. That's all you really want at
the end of the day, at least in art, is just to make mistakes
fast. >>KORY MATHEWSON: So, using
Gemini’s multimodal capabilities to optimize the model’s training
process, VEO is better able to capture the nuance from prompts.
So this includes cinematic techniques and visual effects,
giving you total creative
control. >>DONALD GLOVER: Everybody's
going to become a director and everybody should be a director.
Because at the heart of all of this is just storytelling.
The closer we are to being able to tell each other our stories,
the more we will understand each other.
>>KORY MATHEWSON: These models are really enabling us to be
more creative and to share that creativity with each other.
[Cheers and Applause]. >>DEMIS HASSABIS:
Over the coming weeks some of these
features will be available to select creators through VideoFX
at labs.google, and the waitlist is open now.
Of course, these advances in generative video go beyond the
beautiful visuals you’ve seen today.
By teaching future AI models how to solve problems creatively, or
in effect simulate the physics of our world, we can build more
useful systems that help people communicate in new ways, and
thereby advance the frontiers of AI.
When we first began this journey to build AI more than 15 years ago,
we knew that one day it would change everything.
Now that time is here. And we continue to be amazed by
the progress we see and inspired by the advances still to come,
on the path to AGI. Thanks, and back to you Sundar.
[Applause]. >>SUNDAR PICHAI: Thanks, Demis.
A huge amount of innovation is happening at Google DeepMind.
it’s amazing how much progress we have made in a year.
Training state-of-the-art models requires a lot of computing
power. Industry demand for ML compute
has grown by a factor of 1 million in the last six years.
And every year, it increases tenfold.
Google was built for this. For 25 years, we’ve invested in
world-class technical infrastructure, from the
cutting-edge hardware that powers Search, to our custom
tensor processing units that power our AI advances.
Gemini was trained and served entirely on our fourth and fifth
generation TPUs. And other leading AI companies,
like Anthropic, have trained their models on TPUs as well.
Today, we are excited to announce the sixth generation of TPUs called Trillium.
[Cheers and Applause]. Trillium delivers a 4.7x
improvement in compute performance per chip over the
previous generation. So our most efficient and performant TPU to date.
We will make trillium available to our cloud customers in late 2024.
Alongside our TPUs, we are proud to offer CPUs and GPUs to support any workload.
That includes the new Axion processes we announced last month, our first custom
on-base CPU with industry-leading performance and energy efficiency.
We are also proud to be one of the first cloud providers to offer Nvidia's cutting edge
Blackwell GPUs, available in early 2025.
[Applause]. We’re fortunate to have a
longstanding partnership with Nvidia, and are excited to bring
Blackwell's capabilities to our customers. Chips are a foundational part of
our integrated end-to-end system, from
performance-optimized hardware and open software to flexible
consumption models. This all comes together in our
AI Hypercomputer, a groundbreaking supercomputer
architecture. Businesses and developers are
using it to tackle more complex challenges, with more than twice
the efficiency relative to just buying the raw hardware and
chips. Our AI Hypercomputer
advancements are made possible in part because of our approach
to liquid cooling in our data centers.
We’ve been doing this for nearly a decade, long before it became
state of the art for the industry.
And today, our total deployed fleet capacity for liquid
cooling systems is nearly 1 Giga Watt, and growing.
That’s close to 70 times the capacity of any other fleet.
[Applause]. Underlying this is the sheer
scale of our network, which connects our infrastructure
globally. Our network spans more than 2
million miles of terrestrial and subsea fiber: Over 10 times the
reach of the next leading cloud provider.
We will keep making the investments necessary to advance
AI innovation and deliver state-of-the-art capabilities.
And one of our greatest areas of investment and innovation is in
our founding product, Search. 25 years ago we created Search
to help people make sense of the waves of information moving
online. With each platform shift, we’ve
delivered breakthroughs to help answer your questions better.
On mobile, we unlocked new types of questions and answers, using
better context, location awareness, and real-time
information. With advances in natural
language understanding and computer vision, we enabled new
ways to search with your voice, or a hum to find your new
favorite song, or an image of that flower you saw on your
walk. And now you can even circle to
Search those cool new shoes you might want to buy.
Go for it, you can always return them later!
Of course, Search in the Gemini Era will take this to a whole
new level. Combining our infrastructure
strengths, the latest AI capabilities, our high bar for
information quality, and our decades of experience connecting
you to the richness of the web. The result is a product that
does the work for you. Google Search is generative AI
at the scale of human curiosity. And it’s our most exciting
chapter of Search yet. To tell you more, here’s Liz.
[Applause]. >>LIZ REID:
Thanks, Sundar! With each of these platform
shifts, we haven’t just adapted, we’ve expanded what’s possible
with Google Search. And now, with generative AI,
Search will do more for you than you ever imagined.
So whatever’s on your mind, and whatever you need to get done,
just ask. And Google will do the Googling
for you. All the advancements you’ll see
today are made possible by a new Gemini model, customized for
Google Search. What really sets this apart is
our three unique strengths. First, our real-time information
with over a trillion facts about people, places, and things.
Second, our unparalleled ranking and quality systems, trusted for
decades to get you the very best of the web.
And third, the power of Gemini, which unlocks new agentive
capabilities, right in Search. By bringing these three things
all together, we are able to dramatically expand what's possible with Google Search, yet again.
This is Search in the Gemini era.
So let's dig in. You've heard today about AI
Overviews, and how helpful people are finding them.
With AI Overviews, Google does the work for you.
Instead of piecing together all the information yourself, you
can ask your question, and as you see here, you can get an answer instantly.
Complete with a range of perspectives and links to dive
deeper. As Sundar shared, AI Overviews
will begin rolling out to everyone in the U.S. starting
today, with more countries soon. By the end of the year, AI
Overviews will come to over a billion people in Google Search.
But this is just the first step. We’re making AI Overviews even
more helpful for your most complex questions, the type that
are really more like ten questions in one!
You can ask your entire question, with all its
sub-questions, and get an AI overview in just seconds.
To make this possible, we’re introducing multi-step reasoning
in Google Search. So Google can do the researching
for you. For example, let’s say you’ve
been trying to get into yoga and Pilates.
Finding the right studio can take a lot of research.
There are so many factors to consider!
Soon you’ll be able to ask Search to: Find the best yoga or
Pilates studios in Boston. And show you details on their
intro offers, and walking time from Beacon Hill.
As you can see here, Google gets to work for you, finding the
most relevant information and bringing it together in your AI
Overview. You get some studios with great
ratings and their intro offers. You can see the distance for
each, like this one is just a ten-minute walk away!
Right below, you see where they're located, laid out visually.
And you've got all this from just a single search! Under the hood, our custom
Gemini model acts as your AI agent, using what we call
multi-step reasoning. It breaks your bigger question
down into all its parts, and it figures out which problems it
needs to solve and in what order. And thanks to our real-time info
and ranking expertise, it reasons using the
highest-quality information out there.
So since you're asking about places, it taps into Google's
index of information about the real world, with over 250
million places, and updated in real-time. Including their ratings,
reviews, business hours, and more.
Research that might have taken you minutes or even hours,
Google can now do on your behalf in just seconds. Next, let me show you another
way multi-step reasoning in Google Search can make your life
that much easier. Take planning, for example.
Dreaming up trips and meal plans can be fun, but doing the work
of actually figuring it all out, no, thank you.
With Gemini in Search, Google does the planning with you.
Planning is really hard for AI to get right.
It's the type of problem that takes advanced reasoning and
logic. After all, if you're meal
planning, you probably don’t want mac'n cheese for breakfast,
lunch and dinner. Okay, my kids might.
But say you’re looking for a bit more variety.
Now, you can ask Search to: Create a three-day meal plan for
a group that’s easy to prepare. And here you get a plan with a
wide range of recipes from across the web.
This one for overnight oats looks particularly interesting.
And you can easily head over to the Web site to learn how to prepare them.
If you want to get more veggies in, you can simply ask Search to
swap in a vegetarian dish. And just like that, Search
customizes your meal plan. And you can export your meal
plan or get the ingredients as a list, just by tapping here.
Looking ahead, you could imagine asking Google to add everything
to your preferred shopping cart. Then, we’re really cooking!
These planning capabilities mean Search will be able to help plan
everything from meals and trips to parties, dates, workout
routines and more. So you can get all the fun of
planning without any of the hassle. You’ve seen how Google Search
can help with increasingly complex questions and planning.
But what about all those times when you don't know exactly what to ask and you
need some help brain storming? When you come to Search for
ideas, you’ll get more than an AI-generated answer.
You’ll get an entire AI-organized page, custom-built
for you and your question. Say you’re heading to Dallas to
celebrate your anniversary and you're looking for the perfect restaurant.
What you get here breaks AI out of the box and it brings it to the whole page.
Our Gemini model uncovers the most interesting angles for you
to explore and organizations these results into these helpful clusters.
Like, you might have never considered restaurants with live
music. Or ones with historic charm!
Our model even uses contextual factors, like the time of year.
So since it’s warm in Dallas, you can get rooftop patios as an idea.
And it pulls everything together into a dynamic, whole-page
experience. You’ll start to see this new
AI-organized search results page when you look for inspiration,
starting with dining and recipes, and coming to movies,
music, books, hotels, shopping, and more.
[Applause].
Today, you’ve seen how you can bring any question to Search,
and Google takes the work out of searching.
But your questions aren’t limited to words in a text box,
and sometimes, even a picture can’t tell the whole story.
Earlier, Demis showed you our latest advancements in video
understanding.
And I'm really excited to share that soon you'll be able to ask questions with video,
right in Google Search. Let me introduce Rose to show
you this in a live demo. [Applause].
>>ROSE YAO: Thank you, Liz! I have always wanted a record player,
and I got this one, and some vinyls at a yard sale recently.
But, umm, when I go to play a it, this thing keeps sliding off.
I have no idea how to fix it or where to even start!
Before, I would have pieced together a bunch of searches to
try to figure this out, like, what make is this record player?
What’s the model? And, what is this thing actually
called? But now I can just ask with a
video. So let's try it.
Let's do a live demo. I'm going to take a video and ask Google,
why will this not stay in place? And in a near instant,
Google gives me an AI overview. I get some reasons this might be
happening, and steps I can take to troubleshoot.
So it looks like first, this is called a tone arm. Very helpful.
And it looks like it may be unbalanced, and there's some really helpful steps here.
And I love that because I'm new to all this. I can check out this helpful link from Audio
Technica to learn even more. So that was pretty quick!
[Applause].
Let me walk you through what just happened. Thanks to a combination of our
state-of-the-art speech models, our deep visual understanding,
and our custom Gemini model, Search was able to understand
the question I asked out loud and break down the video
frame-by-frame. Each frame was fed into Gemini’s
long context window that you heard about earlier today.
Search could then pinpoint the exact make and model of my
record player. And make sense of the motion
across frames to identify that the tonearm was drifting.
Search fanned out and combed the web to find relevant insights
from articles, forums, videos, and more.
And it stitched all of this together into my AI Overview.
The result was music to my ears! Back to you, Liz.
[Applause]. >>LIZ REID: Everything you saw
today is just a glimpse of how we're reimagining Google Search
in the Gemini era. We’re taking the very best of
what makes Google, Google. All the reasons why billions of
people turn to Google Search, and have relied on us for
decades. And we’re bringing in the power
of Gemini’s agentive capabilities.
So Google will do the searching, the Researching.
The planning. The brainstorming.
And so much more. All you need to do, is ask.
You'll start to see these features rolling out in Search
in the coming weeks. Opt in to Search Labs to be
among the first to try them out. Now let's take a look at how
this all comes together in Google Search this year.
>> Why is the lever not moving all the way?
[Applause].
>>APARNA PAPPU:
Since last May, we've been hard at work making Gemini for workspace
even more helpful for businesses and consumers across the world.
Tens of thousands of customers have been using help me write,
help me visualize and help me organize since we launched.
And now, we're really excited that the new Gemini powered side
panel will be generally available next month.
[Cheers and Applause]. One of our customers is a local
favorite right here in California, Sports Basement.
They rolled out Gemini for Workspace to the organization.
And this has helped improve the productivity of their customer support team by more than 30%.
Customers love how Gemini grows participation in meetings with
automatic language detection and real-time captions now expanding to 68 languages.
[Applause]. We are really excited about what
Gemini 1.5 Pro unlocks for Workspace and AI Premium
customers. Let me start by showing you
three new capabilities coming to Gmail mobile.
This is my Gmail account. Okay.
So there's an E-mail up top from my husband. Help me sort out the roof repair thing, please.
Now, we've been trying to find a contractor to fix our roof, and with
work travel, I have clearly dropped the ball. It looks like there's an E-mail thread on this
with lots of E-mails that I haven't read. And luckily for me, I can simply tap the
summarize option up top and skip reading this long back and forth.
Now, Gemini pulls up this helpful mobile card as an overlay.
And this is where I can read a nice summary of all the salient information that I need to know.
So here I see that we have a quote from Jeff at Green
Roofing, and he's ready to start. Now, I know we had other bids
and I don't remember the details. Previously, I would have had to do
a number of searches in G-mail and then remember and compare information across different E-mails.
Now, I can simply type out my question right here in the mobile card and say something like, compare
my roof repair bids by price and availability. This new Q&A feature makes it so easy to get
quick answers on anything in my inbox. For example, when are my shoes arriving,
or what time do doors open for the Knicks game, without having to first
search G-mail and open an E-mail and look for the specific information in attachments and so on.
Anyway, back to my roof. It looks like Gemini has found details that I got
from two other contractors in completely different E-mail threads, and I have this really nicely
organized summary and I can do a quick comparison. So it seems like Jeff's quote was right
in the middle and he can start immediately, so Green Roofing it is.
I'll open that last E-mail from Jeff and confirm the project.
And look at that. I see some suggested replies from Gemini.
Now, what is really, really neat about this evolution of smart reply is that it's contextual.
Gemini understood the back-and-forth in that thread and that Jeff was ready to start.
So offers me a few customize options based on that context.
So, you know, here I see I have decline the service, suggest a new time.
I'll choose proceed and confirm time. I can even see a preview of the
full reply simply by long pressing. This looks reasonable, so I'll hit send.
These new capabilities in Gemini and G-mail will start rolling out this month to labs users.
[Applause]. Okay.
So one of the really neat things about WorkSpace apps like
G-mail, Drive, Docs, Calendar, is how well they work together,
and in our daily lives we often have information that flows from
one app to another. Like, say, adding a calendar entry from G-mail.
Or creating reminders from a spreadsheet tracker.
But what if Gemini could make these journeys totally seamless?
Perhaps even automate them for you entirely.
Let me show you what I mean with a real life example.
My sister is a self-employed photographer, and her in box is
full of appointment bookings, receipts, client feedback on photos and so much more.
Now, if you're a freelancer or a small business, you really want to focus on your
craft and not on bookkeeping and logistics. So let's go to her in box and take a look.
Lots of unread E-mails. Let's click on the first one.
It's got a PDF attachment. From a hotel, there's a receipt.
And I see a suggestion in the side panel. Help me organize and track my receipts.
Let's click on this prompt. The side panel now will show
me more details about what that really means, and as you can see, there's two steps here.
Step one, create a Drive folder and put this receipt and 37 others it's found into that folder.
Makes sense. Step 2,
extract the relevant information from those receipts in that folder into a new spreadsheet.
Now, this sounds useful. Why not?
I also have the option to edit these actions or just hit okay.
So let's hit okay. Gemini will now complete the two
steps described above, and this is where it gets even better.
Gemini offers the option to automate this so that this particular work flow is run on
all future E-mails, keeping your Drive folder and expense sheet up to date with no effort from you.
[Applause]. Now, we know that creating
complex spread sheets can be daunting for most people.
But with this automation, Gemini does the hard work of extracting
all the right information from all the files from in that folder and generates the sheet for you.
So let's take a look. Okay.
It's super well organized, and it even has a category for expense type.
Now, we have this sheet. Things can get even more fun.
We can ask Gemini questions. Questions like, show me where the money is spent.
Gemini not only analyzes the data from the sheet, but also creates a nice visual to
help me see the complete breakdown by category. And you can imagine how this extends to all sorts
of use cases in your in box, like travel expenses, shopping, remodeling projects, you name it.
All of that information in G-mail can be put to good use and help you work, plan and play better.
Now, this particular -- [Applause].
I know!
This particular ability to organize your attachments in Drive and generate a sheet
and do data analysis via Q&A will be rolling out to Labs users this September.
And it's just one of the many automations that we're working on in WorkSpace.
Workspace in the Gemini era will continue to unlock new ways of
getting things done. We’re building advanced agentive
experiences, including customizing how you use Gemini.
Now, as we look to 2025 and beyond, we're exploring
entirely new ways of working with AI. Now, with Gemini, you have an AI-powered
assistant always at your side. But what if you could expand how
you interact with AI? For example, when we work with
other people, we mention them in comments and docs, so we send them E-mail.
We have group chats with them, et cetera. And it's not just how we collaborate with
each other, but we each have a specific role to play in the team.
And as the team works together, we build a set of collective experiences and contexts
to learn from each other. We have the combined set of
skills to draw from when we need help. So how could we introduce AI into this mix
and build on this shared expertise? Well, here’s one way.
We are prototyping a virtual Gemini powered teammate.
This teammate has an identity and a Workspace account, along
with a specific role and objective.
Let me bring Tony up to show you what I mean. Hey, Tony!
>>TONY VINCENT: Hi, Aparna! Hey, everyone.
Okay. So let me start by showing you
how we set up this virtual teammate.
As you can see, the teammate has its very own account.
And we can go ahead and give it a name. We'll do something fun like Chip.
Chip’s been given a specific And set of descriptions on how to be helpful
for the team, you can see that here, and some of the jobs are to monitor and track projects,
we've listed a few out, to organize information and provide contexts, and a few more things.
Now that we've configured our virtual teammate, let's go ahead and see Chip in action.
To do that I'll switch us over here to Google chat.
First, when planning for an event like I/O, we have a ton of chat rooms for various purposes.
Luckily for me, chip is in all of them. To quickly catch up, I might ask a question like,
anyone know if our I/O storyboards are approved? Because we’ve instructed Chip to
track this project, Chip searches across all the conversations
and knows to respond with an answer. There it is.
Simple, but very helpful. Now, as the team adds Chip to more
group chats, more files, more E-mail threads, Chip builds a collective memory of our work together.
Let's look at an example. To show you I'll switch over to a different room.
How about Project Sapphire over here and here we are discussing a product release coming up and
as usual, many pieces are still in flight, so I can go ahead and ask, are we on track for launch?
Chip gets to work not only searching through everything it has access to,
but also synthesizing what's found and coming back with an up-to-date response.
There it is. A clear time line, a nice summary and
notice even in this first message here, Chip flags a potential issue the team should be aware of.
Because we're in a group space, everyone can follow along, anyone can jump in at any time,
as you see someone just did. Asking Chip to help create a
doc to help address the issue. A task like this could take me
hours, dozens of hours. Chip can get it all done in just a few minutes,
sending the doc over right when it's ready. And so much of this practical helpfulness
comes from how we've customized Chip to our team's needs, and how seamlessly this AI is integrated
directly into where we're already working. Back to you, Aparna.
>>APARNA PAPPU: Thank you, Tony! I can imagine a number of
different types of virtual teammates configured by
businesses to help them do what they need. Now, we have a lot of work to do to figure out how
to bring these agentive experiences like virtual teammates into WorkSpace, including enabling third
parties to make their very own versions of Chip. We're excited about where this is headed,
so stay tuned. And as Gemini and its capabilities continue
to evolve, we're diligently bringing that power directly into WorkSpace to make all our users more
productive and creative, both at home and at work. And now, over to Sissie to tell you more about
Gemini app. [Applause].
>>SISSIE HSIAO: Our vision for the Gemini app is to be the most
helpful, personal AI assistant by giving you direct access to
Google’s latest AI models. Gemini can help you learn,
create, code, and anything else you can imagine.
And over the past year, Gemini has put Google’s AI in the hands
of millions of people, with experiences designed for your
phone and the web. We also launched Gemini
Advanced, our premium subscription for access to the
latest AI innovations from Google.
Today, we’ll show you how Gemini is delivering our most
intelligent AI experience. Let’s start with the Gemini app,
which is redefining how we interact with AI.
It’s natively multimodal, so you can use text, voice or your
phone’s camera to express yourself naturally.
And this summer, you can have an in-depth conversation with
Gemini using your voice. We’re calling this new
experience "Live". Using Google’s latest speech
models, Gemini can better understand you and answer
naturally. You can even interrupt while
Gemini is responding, and it will adapt to your speech
patterns. And this is just the beginning.
We're excited to bring the speed games and video understanding capabilities
from Project Astra to the Gemini app. When you go live, you'll be able to
open your camera so Gemini can see what you see and respond to your surroundings in real-time.
Now, the way I use Gemini isn't the way you use Gemini.
So we're rolling out a new feature that lets you customize it for your own needs.
And create personal experts on any topic you want. We're calling these "Gems."
[Applause].
They're really simple to set up. Just tap to create a gem, write your instructions
once, and come back whenever you need it. For example, here's a gem that I created
that acts as a personal writing coach. It specializes in short stories with
mysterious twists, and it even builds on the story drafts in my Google drive.
I call it the cliff hanger curator. Now, gems are a great time saver when
you have specific ways that you want to interact with Gemini again and again.
Gems will roll out in the coming months, and our trusted testers
are already finding so many creative ways to put them to
use. They can act as your yoga
bestie, your personal sous chef, a brainy calculus tutor, a peer
reviewer for your code, and so much more.
Next, I’ll show you how Gemini is taking a step closer to being
a true AI assistant by planning and taking action for you.
We all know chatbots can give you ideas for your next
vacation. But there’s a lot more that goes
into planning a great trip. It requires reasoning that
considers space-time logistics, and the intelligence to
prioritize and make decisions. That reasoning and intelligence
all comes together in the new trip planning experience in
Gemini Advanced. Now, it all starts with a prompt.
Okay. So here we go.
We’re going to Miami. My son loves art, my husband
loves seafood, and our flight and hotel details are already in
my Gmail inbox. Now there’s a lot going on in
that prompt. Everyone has their own things
that they want to do. To make sense of those
variables, Gemini starts by gathering all kinds of
information from Search, and helpful extensions like Maps and
G-mail. It uses that data to create a
dynamic graph of possible travel options, taking into account all
my priorities and constraints. The end result is a personalized
vacation plan, presented in Gemini’s new dynamic UI.
Now, based on my flight information, Gemini knows that I
need a two and a half day itinerary.
And you can see how Gemini uses spatial data to make decisions.
Our flight lands in the late afternoon, so Gemini skips a big
activity that day, and finds a highly rated seafood restaurant
close to our hotel. Now, on Sunday, we have a jam-packed day.
I like these recommendations, but my family likes to sleep in.
So I tap to change the start time, and just like that, Gemini adjusted my
itinerary for the rest of the trip. It moved our walking tour to the
next day and added lunch options near the street art museum to
make the most of our Sunday afternoon.
This looks great! It would have taken me hours of
work, checking multiple sources, figuring out schedules, and
Gemini did this in a fraction of the time.
This new trip-planning experience will be rolling out
to Gemini Advanced this summer, just in time to help you plan your
own Labor Day weekend. [Applause].
All right. We saved the best for last.
You heard Sundar say earlier that starting today, Gemini
Advanced subscribers get access to Gemini 1.5 Pro, with one
million tokens. That is the longest context
window of any chatbot in the world.
[Cheers and Applause]. It unlocks incredible new
potential in AI, so you can tackle complex problems that
were previously unimaginable. You can upload a PDF up to 1,500
pages long, or multiple files to get insights across a project.
And soon, you can upload as much as 30,000 lines of code or even an hour-long video.
Gemini Advanced is the only chatbot that lets you process
this amount of information. Now, just imagine how useful
this will be for students. Let’s say you’ve spent months on
your thesis, and you could really use a fresh perspective.
You can upload your entire thesis, your sources, notes,
your research, and soon interview, audio recordings and videos, too.
so Gemini has all this context to give you actionable advice.
It can dissect your main points, identify improvements, and even
role play as your professor. So you can feel confident in
your work. And check out what Gemini
Advanced can do with your spreadsheets, with the new data
analysis feature launching in the coming weeks.
Maybe you have a side hustle selling handcrafted products.
But you’re a better artist than accountant, and it's really hard to understand
which products are worth your time. Simply upload all of your
spreadsheets and ask Gemini to visualize your earnings and help
you understand your profit. Gemini goes to work calculating
your returns and pulling its analysis together into a single
chart, so you can easily understand which products are
really paying off. Now, behind the scenes, Gemini writes
custom Python code to crunch these numbers. And of course, your files are
not used to train our models. Oh, and just one more thing.
Later this year, we'll be doubling the long context window to 2 million tokens.
[Cheers and Applause]. We absolutely can't wait for
you to try all of this for yourself. Gemini is continuing to evolve
and improve at a breakthrough pace.
We’re making Gemini more multimodal, more agentive, and
more intelligent, with the capacity to process the most
information of any chatbot in the world.
And as you heard earlier, we're also expanding Gemini Advanced
to over 35 supported languages, available today.
[Applause]. But, of course, what makes
Gemini so compelling is how easy
it is to do just about anything you want, with a simple prompt.
Let's take a look. >> Enter prompt here.
Okay. Can't be that hard.
How about generate an image of a cat playing guitar?
Is that how it works? Am I doing AI?
Yeah. Just does whatever you type
What are last minute gift ideas you can make with arts and
crafts? Plan a workout routine to get
bigger calves. Help me think of titles to my
tell-all memoir. What's something smart I can say
about Renoir? Generate another image of a cat
playing guitar. If a girl calls me a snack, how
do I reply? Yeah, that's how it works.
you're doing AI. Make this email sound more
professional before I hit send. What's a good excuse to cancel
dinner with my friends? We're literally sitting right
here. There's no wrong way to prompt.
Yeah, you're doing AI. There's no wrong way to prompt.
It does whatever you type. Just prompt your prompt in the
prompt bar. Or just generate an image of a
cat playing guitar. You know it can do other stuff,
right? [Applause].
>>SAMEER SAMAT: Hi, everyone. It’s great to be back at Google
I/O. Today, you’ve seen how AI is
transforming our products across Gemini, Search, Workspace and
more. We're bringing all of these
innovations right onto your Android phone.
And we're going even further, to make Android the best place to
experience Google AI. This new era of AI is a profound
opportunity to make smartphones truly smart.
Our phones have come a long way in a short time, but if you
think about it, it’s been years since the user experience has
fundamentally transformed. This is a once-in-a-generation
moment to reinvent what phones can do.
So we’ve embarked on a multi-year journey to reimagine
Android, with AI at the core. And it starts with three
breakthroughs you’ll see this year.
First, we're putting AI-powered search right at your fingertips,
creating entirely new ways to get the answers you need.
Second, Gemini is becoming your new AI assistant on Android,
there to help you any time. And third, we’re harnessing
on-device AI to unlock new experiences that work as fast as
you do, while keeping your sensitive data private.
Let's start with AI-powered search.
Earlier this year, we took an important first step at Samsung
Unpacked, by introducing Circle to Search.
It brings the best of Search directly into the user
experience. So you can go deeper on anything
you see on your phone, without switching apps.
Fashionistas are finding the perfect shoes, home chefs are
discovering new ingredients, and with our latest update, it’s
never been easier to translate whatever’s on your screen, like
a social post in another language.
And there are even more ways Circle to Search can help.
One thing we’ve heard from students is that they are doing
more of their schoolwork directly on their phones and
tablets. So, we thought: Could Circle to
Search be your perfect study buddy?
Let’s say my son needs help with a tricky physics word problem,
like this one. My first thought is, oh boy,
it’s been a while since I’ve thought about kinematics.
If he’s stumped on this question, instead of putting me
on the spot, he can circle the exact part he’s stuck on and get
step-by-step instructions. Right where he’s already doing
the work. Ah, of course, final velocity
equals initial velocity plus acceleration times elapsed time.
Right. I was just about to say that.
Seriously, though, I love that it shows how to solve the
problem, not just the answer. This new capability is available
today! And later this year, Circle to
Search will be able to tackle more complex problems involving
symbolic formulas, diagrams, graphs and more.
Circle to Search is only on Android.
It’s available on more than 100 million devices today, and we’re
on track to double that by the end of the year.
[Cheers and Applause]. You’ve already heard from Sissie
about the incredible updates coming to the Gemini app.
On Android, Gemini is so much more.
It’s becoming a foundational part of the Android experience.
Here’s Dave to share more. [Applause].
>>DAVE BURKE: Hey, everyone. A couple months ago we launched
Gemini on Android. Like Circle to Search, Gemini
works at the system level. So instead of going to a
separate app, I can bring Gemini right to what I’m doing.
Now, we're making Gemini context aware, so it can
anticipate what you're trying to do and provide more helpful
situations in the moment. In other words, to be a more
helpful assistant. So let me show you
how this works.
And I've got my shiny new Pixel 8a here to help me.
[Applause].
So my friend Pete is asking me if I want to play pickleball
this weekend. And I know how to play tennis, sort of.
I have to say that for the demo. But I'm new to this pickleball thing,
so I'm to reply and try to be funny and say is that like tennis but with pickles?
This would be actually a lot funnier with a meme, so let me bring up Gemini to help with that,
and I'll say create image of tennis with pickles. Now, one new thing you'll notice
is that the Gemini window hovers in place above the app so that I
stay in the flow. Okay.
So I generated some pretty good images. What's nice is I can drag and drop any of
these directly into the images below. So cool, let me send that.
[Applause]. All right.
So Pete's typing, and he says -- he's sending me a video on how to play pickleball.
All right. Thanks, Pete.
Let's tap on that. And that launches YouTube but, you know, I only
have one or two burning questions about the game. I could bring up Gemini to help with that,
and because it's context-aware, Gemini knows I'm looking at a video, so it proactively shows me
an ask this video chip. So let me tap on that.
And now, I can ask specific questions about the video.
So, for example, what is the 2 bounce rule? Because that's something that I've heard about but
don't quite understand in the game. By the way, this uses signals like
YouTube's captions, which means you can use it on billions of videos.
So give it a moment, and, there. I get a nice,succinct answer.
The ball must bounce once on each side of the court after a serve.
Okay. Cool.
Let me go back to messages and Pete's followed up, and he says,
you're an engineer, so here's the official rule book for pickleball.
Thanks, Pete. Pete is very helpful, by the way.
Okay. So we tap on that.
It launches a PDF, now, that's an 84-page PDF. I don't know how much time Pete thinks I have.
Anyway, us engineers, as you all know, like to work smarter, not harder,
so instead of trolling through this entire document, I can pull up Gemini to help.
And again, Gemini anticipates what I need, and offers me an ask this PDF option.
So if I tap on that, Gemini now ingests all of the rules to become a pickleball expert,
and that means I can ask very esoteric questions, like, for example, are spin serves allowed?
And let's hit that, because I've heard that rule may be changing.
Now, because I'm a Gemini advanced user, this works on any PDF and takes full advantage
of the long context window and there's just lots of times where that's useful.
For example, let's say you're looking for a quick answer in an appliance user manual.
And there you have it. It turns out, no, spin serves are not allowed.
So Gemini not only gives me a clear answer to my question, it also shows me exactly where in the
PDF to learn more. Awesome.
Okay. So that’s a few of the ways
that we're enhancing Gemini to be more context aware and helpful in the moment.
And what you've seen here are the first of really many new ways that Gemini will unlock
new experiences at the system level, and they're only available on Android.
You’ll see these, and more, coming to hundreds of millions of
devices over the next couple of months. Now, building Google AI directly
into the OS elevates the entire smartphone experience.
Android is the first mobile operating system to include a
built-in, on-device foundation model.
This lets us bring Gemini goodness from the data center
right into your pocket. So the experience is faster,
while also protecting your privacy. Starting with Pixel later this
year, we’ll be expanding what’s possible with our latest model,
Gemini Nano with Multimodality. This means your phone can
understand the world the way you understand it.
So not just through text input, but also through sights, sounds, and spoken language.
Let me give you an example. 2.2 billion people experience
blindness or low vision. So several years ago, we
developed TalkBack, an accessibility feature that helps
people navigate their phone through touch and spoken feedback.
Helping with images is especially important.
In fact, my colleague Karo, who uses TalkBack, will typically
come across 90 unlabeled images per day.
Thankfully, TalkBack makes them accessible, and now we’re taking
that to the next level with the multimodal capabilities of
Gemini Nano. So when someone sends Karo a
photo, she’ll get a richer and clearer description of what’s
happening. Or, let’s say Karo is shopping
online for an outfit. Now she can get a crystal clear
description of the style and cut to find the perfect look.
Running Gemini Nano on-device helps minimize latency, and the
model even works when there's no network connection.
These improvements to TalkBack are coming later this year.
Let me show you another example of what on-device AI can unlock.
People lost more than one trillion dollars to fraud last
year. And as scams continue to evolve
across texts, phone calls, and even videos, Android can help
protect you from the bad guys, no matter how they try to reach
you. So let’s say I get rudely
interrupted by an unknown caller right in the middle of my
presentation. [Phone ringing].
>> Hello! >> Hi.
I'm calling from Save More Bank Security Department.
Am I speaking to Dave? >>DAVE BURKE: Yes, this is Dave.
I’m kinda in the middle of something.
>> We've detected some suspicious activity on your
account. It appears someone is trying to
make unauthorized charges. >>DAVE BURKE: Oh, yeah?
What kind of charges? >> I can't give you specifics
over the phone, but to protect your account, I’m going to help
you transfer your money to a secure account we’ve set up for
you. [Laughter].
>>DAVE BURKE: And look at this,
my phone gives me a warning that this call might be a scam!
[Applause]. Gemini Nano alerts me the second
it detects suspicious activity, like a bank asking me to move my
money to keep it safe. And everything happens right on
my phone, so the audio processing stays completely
private to me and on my device. We’re currently testing this
feature, and we’ll have more updates to share later in the
summer. And we’re really just scratching
the surface on the kinds of fast, private experiences that
on-device AI unlocks. Later this year, Gemini will be
able to more deeply understand the content of your screen,
without any information leaving your phone, thanks to the
on-device model. So, remember that pickleball
example earlier? Gemini on Android will be able
to automatically understand the conversation and provide
relevant suggestions, like where to find pickleball clubs near
me.
And this is a powerful concept that will work across many apps on your phone.
In fact, later today at the developer keynote, you’ll hear
about how we’re empowering our developer community with our
latest AI models and tools like Gemini Nano and Gemini in
Android Studio. Also, stay tuned tomorrow for
our upcoming Android 15 updates, which we can’t wait to share.
As we said at the outset, we’re reimagining Android with Gemini
at the core. From your favorite apps, to the
OS itself, we’re bringing the power of AI to every aspect of
the smartphone experience. And with that, let me hand over
to Josh to share more on our latest news for developers.
Thank you. [Applause].
>>JOSH WOODWARD: It’s amazing to see Gemini Nano do all of that
directly on Android. That was our plan all along, to
create a natively multimodal Gemini in a range of sizes so
you all, as developers, can choose the one that works best for you.
Throughout the morning, you’ve heard a lot about our Gemini 1.5
series, and is I want to talk about the two models you can access today.
1.5 Pro, which is getting a
series of quality improvements that go out, right about now,
and the brand new 1.5 Flash. Both are available globally in
over 200 countries and territories.
[Cheers and Applause]. You can go over to AI Studio
or Vertex AI if you're a Google cloud customer and you can give them a try.
Now, both models are also natively multimodal.
That means you can interleave text, images, audio, video as
inputs, and pack that massive 1 million token context window.
And if you go to ai.google.dev today, you can sign up to try
the 2 million token context window for 1.5 Pro.
We're also adding a bunch of new developer features, starting with video frame extraction.
That's going to be in the Gemini API, parallel function calling,
so you can return more than one function call at a time, and my favorite, context caching, so
you can send all of your files to the model once and not have to re-send them over and over again.
That should make the long context even more useful,
and more affordable. It ships next month.
[Applause]. Now, we're using Google's
infrastructure to serve these
models, so developers like all of you can get great prices.
1.5 Pro is $7 per 1 million tokens, and I'm excited to share
that for prompts up to 128K, it will be 50% less, for $3.50.
And 1.5 flash will start at .35 cents for 1 million tokens.
[Cheers and Applause].
Now, one thing you might be wondering is which model is best for your use case?
Here’s how I think about it. We use 1.5 Pro for complex tasks, where you
really want the highest quality response, and it's okay if it takes a little bit longer to come back.
We're using 1.5 Flash for quick tasks, where the speed of the model is what matters the most.
And as a developer, you can go try them both out today and see what works best for you.
Now, I'm going to show you how it works here in AI Studio, the fastest way to build with Gemini.
And we'll pull it up here, and you can see this is AI studio.
It's free to use. You don't have to configure anything to get going.
You just go to AI studio.Google.com, log in with your Google account, and you can just pick the
model here on the right that works best for you. So one of the ways we've been using 1.5
Flash is to actually learn from customer feedback about some of our labs products.
Flash makes this possible with its low latency. So what we did here is we just took a bunch of
different feedback from our customer forums. You can put it in to Flash, load up
a prompt, and hit run. Now, in the background,
what it's going to do is it's going to go through that 93,000 token pile of information and you
can see here start streaming it back. Now, this is really helpful because
it pulls out the themes for us. It gives us all the right places
where we can start to look. We can see this is from some of the
benefits from Notebook LM, like we showed earlier. Now, what's great about this is that you can take
something like this in AI Studio, prototyped here in ten seconds, and with one click in
the upper left, get an API key, or over here in the upper right, just tap get code, and you've
got all the model configurations, the safety settings, ready to go, straight into your IDE.
Now, over time, if you find that you need more enterprise-grade features
you can use the same Gemini 1.5 models and the same configurations right in Vertex AI.
That way, you can scale up with Google Cloud as your enterprise needs grow.
So that's our newly updated Gemini 1.5 Pro and the new 1.5 Flash, both of which are available today
globally, and you'll hear a lot more about them in the developer keynote later today.
[Applause].
Now, let's shift gears and talk about Gemma, our family of open
models, which are crucial for driving AI innovation and
responsibility. Gemma is being built from the
same research and technology as Gemini. It offers top performance and comes in
light weight 7b and 2b sizes. Since it launched less than
three months ago, it’s been downloaded millions of times
across all the major model hubs. Developers and researchers have
been using it and customizing the base Gemma model and using some of our pre-trained variants,
like RecurrentGemma, and CodeGemma, and today's newest member, PaliGemma,
our first vision-language model, and it's available right now.
[Applause]. It's optimized for
a range of image captioning, visual Q&A and other image labeling tasks, so go give it a try.
I'm also excited to announce that we have Gemma 2 coming.
It's the next generation of Gemma, and it will be available in June.
One of the top requests we've heard from developers is for a bigger Gemma model,
but it's still going to fit in the size that's easy for all of you to use.
So in a few weeks, we'll be adding a new 27 billion
parameter model to Gemma 2, and here's what's great about it.
This size is optimized by Nvidia to run on next-gen GPUs and can run efficiently
on a single TPU host in Vertex AI. So this quality to size ratio is
amazing because it will outperform models more than twice its size.
We can't wait to see what you're going to build with it.
[Applause].
To wrap up, I want to share this inspiring story from India, where developers have been
using Gemma and its unique tokenization to create Navrasa, a set of instruction-tuned
models to expand access to 15 Indic languages. This builds on our efforts to make information
accessible in more than 7,000 languages and the world.
Take a look. >>AASHI:
Language is an interesting problem to solve,
actually, and given India has a huge variety of languages and it
changes every five kilometers. >>HARSH: When technology is
developed for a particular culture, it won't be able to
solve and understand the nuances of a country like India.
One of Gemma’s features is an incredibly powerful tokenizer
which enables the model to use hundreds of thousands of words,
symbols, and characters across so many alphabets and language
systems.
This large vocabulary is critical to adapting Gemma to
power projects like Navarasa. >>RAMSRI: Navarasa is a model
that’s trained for Indic languages.
It's a fine-tuned model based on Google’s Gemma.
We built Navarasa to make large language models culturally
rooted where people can talk in their native language and get
the responses in their native language.
Our biggest dream is to build a model to include everyone from
all corners of India. >>GAURAV: We need a technology
that will harness AI so that everyone can use it and no one
is left behind. >>HARSH: Today the language that
you speak in could be the tool and the technology that you use
for solving your real-world problems.
And that's the power of generative AI that we want to
bring to every corner of India and the entire world.
[Applause].
[Cheers and Applause]. >>JAMES MANYIKA: Listening to
everything that’s been announced today, it’s clear that AI is
already helping people, from their everyday tasks to their
most ambitious, productive, and imaginative endeavors.
Our AI innovations, like multimodality, long context and
agents, are at the cutting-edge of what this technology can do,
taking to a whole new level its capacity to help people.
Yet, as with any emerging technology, there are still
risks and new questions that will arise as AI advances and
its uses evolve. In navigating these
complexities, we are guided by our AI Principles, and we’re
learning from our users, partners, and our own research.
To us, building AI responsibly means both addressing the risks
and maximizing the benefits for people and society.
Let me begin with what we’re doing to address risks.
Here, I want to focus on how we are improving our models and
protecting against their misuse. Beyond what Demis shared
earlier, we are improving our models with an industry-standard
practice called red-teaming, in which we test our own models and try
to break them to identify weaknesses. Adding to this work, we’re
developing a cutting-edge technique we call AI-assisted
red teaming. This draws on Google DeepMind's
gaming breakthroughs like AlphaGo, where we train AI
agents to compete against each other and improve and expand the
scope of their red teaming capabilities.
We are developing AI models with these capabilities to help
address adversarial prompting and limit problematic outputs.
We’re also improving our models with feedback from two important
groups: Thousands of internal safety experts with a range of
disciplines, and a range of independent experts from academia to civil society.
Both groups help us identify emerging risks, from
cybersecurity threats to potentially dangerous
capabilities in areas like Chem-Bio.
Combining human insight with our safety testing methods will help
make our models and products more accurate, reliable and safer.
This is particularly important as technical advances like better
intonation make interactions with AI feel and sound more human-like.
We're doing a lot of research in this area, including the potential for harm and misuse.
We're also developing new tools to help prevent the misuse of our models.
For example, Imagen 3 and Veo create more realistic imagery
and videos, we must also consider how they might be
misused to spread misinformation.
To help, last year we introduced SynthID, a tool that adds
imperceptible watermarks to our AI-generated images and audio so
that they’re easier to identify. Today, we’re expanding SynthID
to two new modalities: Text and video.
These launches build on our efforts to deploy
state-of-the-art watermarking capabilities across modalities.
Moving forward, we will keep integrating advances like
watermarking and other emerging techniques, to secure our latest
generations of Gemini, Imagine, Lyria, and Veo models.
We’re also committed to working with the ecosystem with all of you
to help others build on the advances we're making. And in the coming months, we'll be open-sourcing
SynthID text watermarking. This will be available in our
updated Responsible Generative AI Toolkit, which we created to
make it easier for developers to build AI responsibly.
We're also collaborating on C2PA, and we support C2PA,
collaborating with Adobe, Microsoft, startups, and many
others, to build and implement a standard that improves the
transparency of digital media. Now, let’s turn to the second
and equally important part of our responsible AI approach:
How we’re building AI to benefit people and society.
Today, our AI advances are helping to solve real-world
problems, like accelerating the work of 1.8 million scientists
in 190 countries who are using AlphaFold to work on issues like
neglected diseases. Helping to predict floods in
more than 80 countries. And helping organizations, like
the United Nations track progress on the world's 17 sustainable development
goals with Data Commons. And now, generative AI is
unlocking new ways for us to make the world’s information,
and knowledge, universally accessible and useful for
learning. Billions of people already use
Google products to learn every day, and generative AI is opening up new possibilities, allowing us to
ask questions like, what if everyone everywhere could have their own
personal AI tutor, on any topic? Or, what if every educator could
have their own assistant in the classroom?
Today marks a new chapter for learning and education at
Google. I am excited to introduce
LearnLM, our new family of models, based on Gemini, and
fine-tuned for learning. LearnLM is grounded in
educational research, making learning experiences more
personal and engaging. And it’s coming to the products
you use every day. Like Search, Android, Gemini and YouTube.
In fact, you've already seen LearnLM on stage today when it helped Sameer
with his son's homework on Android. Now, let's see how it works in
the Gemini app. Earlier, Sissie introduced Gems,
custom versions of Gemini that can act as personal assistive
experts on any topic. We are developing some pre-made
Gems, which will be available in the Gemini App and web
experience, including one called Learning Coach.
With Learning Coach, you can get step-by-step study guidance,
along with helpful practice and memory techniques, designed to
build understanding rather than just give you the answer.
Let’s say you’re a college student studying for an upcoming
biology exam. If you need a tip to remember
the formula for photosynthesis, Learning Coach can help.
Learning Coach, along with other pre-made gems, will launch in
Gemini in the coming months. And you can imagine what
features like Gemini Live can unlock for learning.
Another example is a new feature in YouTube that uses LearnLM to
make educational videos more interactive, allowing you to ask
a clarifying question, get a helpful explanation, or take a
quiz. This even works
for those long lectures or seminars, thanks to Gemini model's long context capabilities.
This feature in YouTube is already rolling out to select
Android users. As we work to extend LearnLM
beyond our own products, we are partnering with experts and
institutions like Columbia Teachers College, Arizona State
University and Khan Academy to test and improve the new
capabilities in our models for learning.
And we’ve collaborated with MIT RAISE to develop an online
course to help educators better understand and use generative
AI. We’re also working directly with
educators to build more helpful generative AI tools with Learn
LM. For example, in Google
Classroom, we’re drawing on the advances you’ve heard about
today to develop new ways to simplify and improve lesson planning, and enable
teachers to tailor lessons and content to meet the individual needs of their students.
Standing here today makes me think back to my own time as an
undergraduate. Then, AI was considered
speculative, far from any real world uses.
Today, we can see how much is already real, how much it is
already helping people, from their everyday tasks to their
most ambitious, productive and imaginative endeavors, and how
much more is still to come. This is what motivates us.
I’m excited about what’s ahead and what we’ll build with all of
you. Back to you, Sundar.
[Applause]. >>SUNDAR PICHAI:
Thanks, James. All of this shows the important
progress we’ve made, as we take a bold and responsible approach
to making AI helpful for everyone.
Before we wrap, I have a feeling that someone out there might be
counting how many times we’ve mentioned AI today.
[Laughter]. And since a big theme today has
been letting Google do the work for you, we went ahead and
counted, so that you don’t have to.
[Cheers and Applause]. That might be a record in how
many times someone has said AI. I’m tempted to say it a few more
times. But I won't.
Anyhow, this tally is more than just a punchline.
It reflects something much deeper.
We’ve been AI-first in our approach for a long time.
Our decades of research leadership have pioneered many
of the modern breakthroughs that power AI progress, for us and
for the industry. On top of that, we have
world-leading infrastructure built for the AI Era,
cutting-edge innovation in Search, now powered by Gemini,
products that help at extraordinary scale, including
fifteen products with over half a billion users, and platforms
that enable everyone, partners, customers, creators, and all of
you, to invent the future. This progress is only possible
because of our incredible developer community.
You are making it real, through the experiences you build every
day. So, to everyone here in
Shoreline and the millions more watching around the world,
here’s to the possibilities ahead and creating them
together. Thank you.
[Cheers and Applause]. >> What does this remind you of?
>> Cat. >> Wow.
>> Wow! >> Okay!
>> When all of these tools come together, it's a powerful
combination. >> It's amazing.
>> It's amazing. It's an entire suite of different
kinds of possibilities. >> Hi.
I'm Gemini. >> What neighborhood do you
think I'm in? >> This appears to be the Kings
Cross area of London. >> Together we're creating a new
era.
5.0 / 5 (0 votes)
【人工智能】Google大神Jeff Dean最新演讲 | 机器学习令人兴奋的趋势 | 计算的十年飞跃 | 神经网络 | 语言模型十五年发展 | Gemini | ImageNet | AlexNet
What We Expect from WWDC and Google I/O
Microsoft's New PHI-3 AI Turns Your iPhone Into an AI Superpower! (Game Changer!)
GPT-4o - Full Breakdown + Bonus Details
6款工具帮你自动赚钱,轻松上手帮你打开全新的收入渠道,赚钱效率高出100倍,用好这几款AI人工智能工具,你会发现赚钱从来没如此简单过
How big is AI's carbon footprint? | BBC News