Google Keynote (Google I/O ‘24)
Summary
TLDRThe Google I/O 2024 event showcased a multitude of AI innovations, highlighting Google's commitment to integrating artificial intelligence into every aspect of technology. Sundar Pichai, CEO of Google, introduced Gemini, a generative AI model, as a cornerstone of Google's AI strategy. Gemini is designed to be multimodal, capable of processing text, images, video, and code, and is set to revolutionize the way people work and interact with technology. The event covered advancements in Google Search, Workspace, Android, and the introduction of new AI models like Gemini 1.5 Pro and 1.5 Flash. The updates aim to make AI more accessible and beneficial for creators, developers, and users worldwide. The narrative emphasized the potential for AI to personalize and enhance various facets of life, from education to daily tasks, while also addressing the importance of responsible AI development and deployment.
Takeaways
- 🚀 Google has launched Gemini, a generative AI model, aiming to revolutionize the way we work by being natively multimodal and capable of reasoning across various forms of data like text, images, video, and code.
- 📈 Over 1.5 million developers are already using Gemini models for tasks such as debugging code, gaining insights, and building AI applications, highlighting its rapid adoption and impact on the developer community.
- 🔍 Google Search has been transformed with Gemini, enabling new ways of searching, including complex queries, and photo searches, leading to an increase in user satisfaction and search usage.
- 📱 Gemini's capabilities are being integrated into Google's products across Mobile, Search, Photos, Workspace, and Android, providing a seamless AI experience for users.
- 🎉 Sundar Pichai announced the expansion of Gemini's context window to 2 million tokens, a significant leap towards the goal of infinite context, allowing for even more detailed and long-range reasoning.
- 📈 Google Workspace is set to benefit from Gemini's multimodality and long context features, streamlining tasks like email summarization and meeting highlight generation, enhancing productivity.
- 🎓 LearnLM, a new family of models based on Gemini, is designed to enhance learning experiences and is being integrated into everyday products like Search, Android, Gemini, and YouTube.
- 🤖 The concept of AI agents was introduced, which are intelligent systems capable of reasoning, planning, and memory, designed to perform tasks on behalf of users while ensuring user supervision and control.
- 💬 Gemini's real-time speech models enable a more natural conversational experience with AI, allowing users to interrupt and receive immediate responses, making interactions feel more human-like.
- 🌐 Google is committed to responsible AI development, focusing on improving model safety, preventing misuse, and expanding AI's benefits to society, including education and accessibility.
- 📊 Google's investment in AI infrastructure, such as Tensor Processing Units (TPUs), is pivotal in training and serving state-of-the-art models like Gemini, reinforcing Google's position at the forefront of AI innovation.
Q & A
What is Google's new generative AI model called?
-Google's new generative AI model is called Gemini.
How does Gemini redefine the way we work with AI?
-Gemini redefines the way we work with AI by being natively multimodal, allowing users to interact with it through text, voice, or the phone's camera, and by providing more natural and context-aware responses.
What is the significance of the 1 million token context window in Gemini 1.5 Pro?
-The 1 million token context window in Gemini 1.5 Pro is significant because it is the longest context window of any chatbot in the world, allowing it to process complex problems and large amounts of information that were previously unimaginable.
How does Gemini Advanced's trip planning feature work?
-Gemini Advanced's trip planning feature works by gathering information from various sources like Search, Maps, and Gmail. It uses this data to create a dynamic graph of possible travel options, taking into account the user's priorities and constraints, and then presents a personalized vacation plan.
What is the role of Gemini in the future of Google Search?
-In the future of Google Search, Gemini plays the role of an AI agent that uses multi-step reasoning to break down complex questions, figure out the problems that need to be solved, and in what order. It taps into Google's index of information about the real world to provide comprehensive and customized search results.
How does Google ensure the responsible use of its AI technology?
-Google ensures the responsible use of its AI technology by adhering to its AI Principles, red-teaming to identify weaknesses, involving internal safety experts and independent experts for feedback, and developing tools like SynthID to watermark AI-generated content, making it easier to identify.
What is the new feature called 'Live' in the Gemini app?
-'Live' is a new feature in the Gemini app that allows users to have in-depth conversations with Gemini using their voice. It utilizes Google's latest speech models to better understand users and provide more natural responses.
How does Gemini's 'Gems' feature help users customize their AI experience?
-Gemini's 'Gems' feature allows users to create personalized AI assistants, or 'Gems,' tailored to specific topics or tasks. Users can set up these Gems once with their instructions and then use them whenever needed for a customized AI experience.
What is the purpose of the 'AI-organized search results page' in Google Search?
-The 'AI-organized search results page' in Google Search is designed to provide users with a whole page of AI-generated and AI-organized content that is custom-built for their query. It uncovers the most interesting angles for the user to explore and organizes the results into helpful clusters.
How does the Gemini app integrate with Android to enhance the smartphone experience?
-The Gemini app integrates with Android by becoming a foundational part of the Android experience, working at the system level. It provides context-aware assistance, allowing users to bring Gemini to their current activity without switching apps, and offers features like video understanding and on-device processing for faster and more private experiences.
What is the 'SynthID' tool, and how does it contribute to responsible AI?
-SynthID is a tool developed by Google that adds imperceptible watermarks to AI-generated images, audio, text, and video. This makes the synthetic media easier to identify and helps prevent the misuse of AI-generated content, such as spreading misinformation.
Outlines
🚀 Google's Gemini AI: A Leap Forward in Technology
The first paragraph introduces Google's ambitious strides in artificial intelligence with the launch of Gemini, a generative AI model. Sundar Pichai, CEO of Google, welcomes the audience to Google I/O and emphasizes the transformative impact of Gemini on various sectors, including the way we work, find solutions, and interact with technology. The paragraph highlights Google's commitment to AI innovation across research, product development, and infrastructure, and the potential of Gemini to drive opportunities for creators and developers.
🔍 Google Search Transformation with Generative AI
The second paragraph discusses the transformation of Google Search with the integration of Gemini's capabilities. It talks about the Search Generative Experience that has led to new ways of searching, including complex queries and photo-based searches. The paragraph also mentions the user satisfaction increase and the launch of AI Overviews, which will be available to users in the U.S. with plans for global expansion.
📸 Google Photos Enhancement with Gemini
The third paragraph showcases how Gemini is enhancing Google Photos by making the search process more intuitive and context-aware. It describes a scenario where a user can find their car's license plate number by simply asking Photos, thanks to Gemini's ability to recognize and understand the context. The paragraph also teases the upcoming 'Ask Photos' feature, which will allow for deeper memory search capabilities.
🧠 Multimodality and Long Context in Gemini
The fourth paragraph delves into the technical aspects of Gemini, focusing on its multimodality and long context capabilities. It discusses how Gemini's design allows it to understand different types of inputs and find connections between them. The paragraph also highlights the developer's excitement about the 1 million token context window and how it has been used to improve tasks such as coding and data analysis.
📚 Innovative Applications of Gemini
The fifth paragraph presents real-world applications of Gemini, where developers have used its advanced features to perform tasks like turning a video of a bookshelf into a searchable database. It illustrates the potential of Gemini to understand and process vast amounts of data, providing innovative solutions to complex problems.
🌐 Expanding Gemini's Reach and Capabilities
The sixth paragraph discusses the expansion of Gemini's capabilities with the introduction of Gemini 1.5 Pro, which offers long context support and is now available globally. It also announces the expansion of the context window to 2 million tokens for developers and highlights new updates in translation, coding, and reasoning.
🤖 AI Agents and the Future of Intelligent Systems
The seventh paragraph explores the concept of AI agents, which are intelligent systems capable of reasoning, planning, and memory. It provides examples of how these agents can simplify tasks like shopping and moving to a new city by automating multiple steps on behalf of the user. The paragraph emphasizes the importance of privacy, security, and user control in the development of these intelligent systems.
🧑🤝🧑 Personalized AI for Everyone
The eighth paragraph focuses on the ultimate goal of making AI helpful and accessible to everyone. It discusses the combination of multimodality, long context, and AI agents as a means to organize the world's information and make it useful for individuals. The paragraph also introduces the concept of AI-first approach and the role of Google's infrastructure in supporting AI advancements.
🎓 LearnLM: Advancing Education with AI
The ninth paragraph introduces LearnLM, a new family of models based on Gemini and fine-tuned for educational purposes. It highlights the potential of LearnLM to provide personalized and engaging learning experiences through products like Search, Android, Gemini, and YouTube. The paragraph also mentions partnerships with educational institutions to enhance the capabilities of these models for learning.
🌟 The Impact of AI on Society and Future Innovations
The tenth paragraph emphasizes the real-world impact of AI, its role in solving global issues, and the ethical considerations guiding its development. It discusses Google's AI principles, the use of red-teaming and AI-assisted red teaming to improve model safety, and the expansion of the SynthID watermarking tool. The paragraph concludes with a forward-looking statement on the potential of AI to enhance learning and education.
🤝 Collaboration and the Era of AI Innovation
The eleventh paragraph celebrates the developer community's role in bringing AI innovations to life. It acknowledges the collaborative efforts in creating AI technologies and the ongoing journey to explore and build the future of AI. The paragraph ends with a tribute to the possibilities ahead and the commitment to creating them together.
📈 The Significance of AI in Google's Ecosystem
The twelfth paragraph reflects on the frequency of mentioning AI throughout the discussion, symbolizing the integral role of AI in Google's approach and offerings. It underscores Google's AI-first mindset, research leadership, and the infrastructure built for the AI era. The paragraph concludes with a nod to the developer community's contributions to realizing AI's potential.
Mindmap
Keywords
💡Artificial Intelligence (AI)
💡Gemini
💡Multimodal
💡Long Context
💡AI Overviews
💡Google Workspace
💡AI Agents
💡Project Astra
💡Tensor Processing Units (TPUs)
💡AI-Generated Media
💡AI Principles
Highlights
Google has launched Gemini, a generative AI, which is transforming the way we work.
Google I/O introduced new beginnings and innovative solutions to age-old problems through advancements in AI.
Sundar Pichai emphasized that Google is in the early days of the AI platform shift with significant opportunities ahead.
Gemini models have demonstrated state-of-the-art performance on every multimodal benchmark.
Over 1.5 million developers are using Gemini models for various applications such as debugging code and building AI apps.
Google Search has been revolutionized by Gemini, allowing for more complex queries and photo-based searches.
Google Photos integration with Gemini makes searching through personal memories more accessible and efficient.
Google Workspace is set to enhance productivity with Gemini's capabilities, including summarizing emails and automating tasks.
Google is expanding the context window to 2 million tokens for developers, a significant step towards infinite context.
The introduction of Gemini 1.5 Flash, a lighter-weight model optimized for low latency and cost-efficient tasks at scale.
Project Astra represents the future of AI assistants, aiming to build a universal AI agent for everyday use.
Imagen 3, Google's most capable image generation model yet, offers more photorealistic and detailed results.
Google's Music AI Sandbox is a suite of tools that can create new instrumental sections and transfer styles between tracks.
Veo, the new generative video model, creates high-quality 1080P videos from various prompts, offering creative control.
Google's AI innovations are enabling more natural and interactive experiences with AI, with real-world applications.
Google is committed to responsible AI development, focusing on safety, privacy, and ethical considerations in AI advancements.
The introduction of LearnLM, a new family of models designed to enhance learning experiences across Google products.
Transcripts
[Cheers and Applause]. >>WOMAN: Google’s ambitions in
artificial intelligence. >>MAN: Google launches Gemini,
the generative AI. >> And it's completely changing
the way we work. >> You know, a lot has happened
in a year. There have been new beginnings.
We found new ways to find new Ways to find new ideas.
And new solutions to age-old problems. >> Sorry about your shirt.
We dreamt of things -- >> Never too old for a
treehouse. >> We trained for things.
>> All right! Let’s go go go!
>> And learned about this thing. We found new paths, took the
next step, and made the big leap. Cannon ball!
We filled days like they were weeks.
And more happened in months, than has happened in years.
>> Hey, free eggs. >> Things got bigger,
like waaay bigger.
And it wasn’t all just for him, or for her.
It was for everyone.
And you know what?
We’re just getting started.
>>SUNDAR PICHAI: Hi, everyone. Good morning.
[Cheers and Applause]. welcome to Google I/O.
It's great to have all of you with us. We have a few thousand
developers with us here today at Shoreline.
Millions more are joining virtually around the world.
Thanks to everyone for being here.
For those of you who haven’t seen I/O before, it’s basically
Google’s version of the Eras Tour, but with fewer costume
changes. [Laughter].
At Google, though, we are fully in our Gemini era. Before we get into it, I want to
reflect on this moment we’re in. We’ve been investing in AI for
more than a decade, and innovating at every layer of the stack:
Research, product, infrastructure We’re going to talk about it all today.
Still, we are in the early days of the AI platform shift.
We see so much opportunity ahead for creators, for developers, for startups, for everyone.
Helping to drive those opportunities is what our Gemini era is all about.
So let’s get started.
A year ago on this stage, we first shared our plans for
Gemini, a frontier model built to be natively multimodal from
the very beginning, that could reason across text, images,
video, code, and more. It’s a big step in turning any
input into any output. An I/O for a new generation.
Since then we introduced the first Gemini models, our most
capable yet. They demonstrated
state-of-the-art performance on every multimodal benchmark.
And that was just the beginning. Two months later, we introduced
Gemini 1.5 Pro, delivering a big breakthrough in long context.
It can run 1 million tokens in production, consistently.
More than any other large-scale foundation model yet.
We want everyone to benefit from what Gemini can do, so we’ve
worked quickly to share these advances with all of you.
Today, more than 1.5 million developers use Gemini models
across our tools. You’re using it to debug code,
get new insights, and build the next generation of AI
applications. We’ve also been bringing
Gemini’s breakthrough capabilities across our products
in powerful ways. We’ll show examples today across
Search, Photos, Workspace, Android and more.
Today, all of our 2-billion user products use Gemini.
And we’ve introduced new experiences, too, including on
Mobile, where people can interact with Gemini directly
through the app. Now available on Android and
iOS. And through Gemini Advanced,
which provides access to our most capable models.
Over 1 million people have signed up to try it, in just
three months. And it continues to show strong
momentum. One of the most exciting
transformations with Gemini has been in Google Search.
In the past year, we’ve answered billions of queries as part of
our Search Generative Experience.
People are using it to Search in entirely new ways.
And asking new types of questions, longer and more
complex queries, even searching with photos, and getting back
the best the web has to offer. We’ve been testing this
experience outside of Labs, and we’re encouraged to see not only
an increase in Search usage, but also an increase in user
satisfaction. I’m excited to announce that
we’ll begin launching this fully revamped experience, AI
Overviews, to everyone in the U.S. this week.
And we’ll bring it to more countries soon.
[Cheers and Applause]. There’s so much innovation
happening in Search. Thanks to Gemini we can create
much more powerful search experiences, including within
our products. Let me show you an example in
Google Photos. We launched Google Photos almost
nine years ago. Since then, people have used it
to organize their most important memories.
Today that amounts to more than 6 billion photos and videos
uploaded every single day. And people love using Photos to
search across their life. With Gemini, we’re making that a
whole lot easier. Say you’re at a parking station
ready to pay, but you can’t recall your license plate
number. Before, you could search Photos
for keywords and then scroll through years’ worth of photos,
looking for the right one. Now, you can simply ask Photos.
It knows the cars that appear often, it triangulates which one
is yours, and just tells you the license plate number.
[Cheers and Applause]. And Ask Photos can help you
search your memories in a deeper way.
For example, you might be reminiscing about your daughter
Lucia’s early milestones. You can ask photos, when did Lucia learn to swim?
And you can follow up with up with something more complex.
Show me how Lucia's swimming has progressed. Here, Gemini goes beyond a
simple search, recognizing different contexts from doing
laps in the pool, to snorkeling in the ocean, to the text and
dates on her swimming certificates.
And Photos packages it all up together in a summary, so you
can really take it all in, and relive amazing memories all over
again. We’re rolling out Ask Photos
this summer, with more capabilities to come.
[Cheers and Applause]. Unlocking knowledge across
formats is why we built Gemini to be multimodal from the ground
up. It’s one model, with all the
modalities built in. So not only does it understand
each type of input, it finds connections between them.
Multimodality radically expands the questions we can ask, and
the answers we will get back. Long context takes this a step
further, enabling us to bring in even more information, hundreds
of pages of text, hours of audio, a full hour of video, or
entire code repos. Or, if you want, roughly 96
Cheesecake Factory menus. [Laughter].
For that many menus, you’d need a one million token context
window, now possible with Gemini 1.5 Pro.
Developers have been using it in super interesting ways.
Let’s take a look. >> I remember the announcement,
the 1 million token context window, and my first reaction
was there's no way they were able to achieve this.
>> I wanted to test its technical skills, so I uploaded
a line chart. It was temperatures between like
Tokyo and Berlin and how they were across the 12 months of the
year. >> So
I got in there and I threw in the Python library that was
really struggling with and I just asked it a simple question.
And it nailed it. It could find specific
references to comments in the code and specific requests that
people had made and other issues that people had had, but then
suggest a fix for it that related to what I was working
on. >> I immediately tried to kind
of crash it. So I took, you know, four or
five research papers I had on my desktop, and it's a mind-blowing
experience when you add so much text, and then you see the kind
of amount of tokens you add is not even at half the capacity.
>> It felt a little bit like Christmas because you saw things
kind of peppered up to the top of your feed about, like, oh,
wow, I built this thing, or oh, it's doing this, and I would
have never expected. >> Can I shoot a video of my
possessions and turn that into a searchable database?
So I ran to my bookshelf, and I shot video just panning my
camera along the bookshelf and I fed the video into the model.
It gave me the titles and authors of the books, even
though the authors weren't visible on those book spines,
and on the bookshelf there was a squirrel nut cracker sat in
front of the book, truncating the title.
You could just see the word "sightsee", and it still guessed
the correct book. The range of things you can do
with that is almost unlimited. >> So at that point for me was
just like a click, like, this is it.
I thought, like, I had like a super power in my hands.
>> It was poetry. It was beautiful.
I was so happy! This is going to be amazing!
This is going to help people! >> This is kind of where the
future of language models are going.
Personalized to you, not because you trained it to be personal to
you, but personal to you because you can give it such a vast
understanding of who you are. [Applause].
>>SUNDAR PICHAI: We’ve been rolling out Gemini 1.5 Pro with
long context in preview over the last few months.
We’ve made a series of quality improvements across translation,
coding, and reasoning. You’ll see these updates
reflected in the model starting today.
I'm excited to announce that we’re bringing this improved
version of Gemini 1.5 Pro to all developers globally.
[Cheers and Applause]. In addition, today Gemini 1.5
Pro with 1 million context is now directly available for consumers in Gemini Advanced,
and can be used across 35 languages. One million tokens is opening up
entirely new possibilities. It’s exciting, but I think we
can push ourselves even further. So today, we are expanding the
context window to 2 million Tokens.
[Cheers and Applause]. We are making it available
for developers in private preview. It's amazing to look back and
see just how much progress we've made in a few months.
This represents the next step on our journey towards the ultimate goal of infinite context.
Okay. So far,
we’ve talked about two technical advances:
multimodality and long context. Each is powerful on its own.
But together, they unlock deeper capabilities, and more
intelligence. Let’s see how this comes to life
with Google Workspace. People are always searching
their emails in Gmail. We are working to make it much
more powerful with Gemini. Let’s look at how.
As a parent, you want to know everything that’s going on with
your child’s school. Okay, maybe not everything, but
you want to stay informed. Gemini can help you keep up.
Now we can ask Gemini to summarize all recent emails from
the school. In the background, it’s
identifying relevant emails, and even analyzing attachments, like
PDFs. And you get a summary of
the key points and action items. So helpful.
Maybe you were traveling this week and couldn’t make the PTA
meeting. The recording of the meeting is
an hour long. If it’s from Google Meet, you
can ask Gemini to give you the highlights.
[Cheers and Applause]. There’s a parents group looking
for volunteers, and you’re free that day.
So of course, Gemini can draft a reply.
There are countless other examples of how this can make
life easier. Gemini 1.5 Pro is available
today in Workspace Labs. Aparna will share more later on.
[Applause]. We just looked at an example with text outputs.
But with a multimodal model, we can do so much more.
To show you an early demo of an audio output in NotebookLM,
here’s Josh. >>JOSH WOODWARD: Hi, everyone!
Last year, at I/O, we introduced Notebook LM, a research and
writing tool grounded in the information you give it.
Since then, we've seen a lot of momentum with students and
teachers using it. And today, Gemini 1.5 Pro is
coming to Notebook LM, and it's great.
Let me show you. So here we are in Notebook LM.
You can load it up with all of the materials here on the left.
In this notebook, I've been using it with my younger son,
and I've added some of his science worksheets, a few slide decks from the teacher, and even an
open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook
guide with a helpful summary and can generate a study guide, an FAQ, or even quizzes.
But for my son Jimmy, he really learns best when he can listen to something.
So we've prototyped a new feature with Gemini, and it's called Audio Overviews.
Notebook LM is going to use all of the materials on the left
as input and output them into a lively science discussion, personalized for him.
Let's take a listen. >> So let's dive into physics.
What's on deck for today? >> Well, we're starting with the
basics. Force in motion.
>> Okay. >> And that, of course, means we
have to talk about Sir Isaac Newton and his three laws of
motion. >> Ah,
yes, the foundation -- For understanding how objects move and interact.
>>JOSH WOODWARD: Oh, yes, this is where multimodal really shines.
Now, it generated this audio discussion based on that text material.
And what's amazing is that my son and I can join into the conversation.
And steer it whichever direction we want.
When I tap "join." >> Hold on.
We have a question. What's up, Josh?
>>JOSH WOODWARD: Yeah, can you give my son Jimmy a basketball
example? >> Hey, Jimmy!
That's a fantastic idea! Basketball is actually a great
way to visualize force in motion.
Let's break it down. >> Okay.
So first, imagine a basketball just sitting there on the court.
It's not moving, right?
That's because all the forces acting on it are balanced.
The downward pull of gravity -- >>JOSH WOODWARD: Pretty cool, right?
[Cheers and Applause]. I gotta say, the first
time my son heard this, you should have seen how big his eyes got!
Because he was gripped. They were talking to him.
He was learning science through the example of basketball.
His favorite sport. Now, what's interesting
is under the hood, you saw that Gemini had used some of the concepts of gravity, Sir Isaac Newton,
but nothing in there was about basketball. It connected the dots and created that
age-appropriate example for him. And this is what's becoming
possible with the power of Gemini. You can give it lots of information in
any format, and it can be transformed in a way that's personalized and interactive for you.
Back to you, Sundar. [Applause].
>>SUNDAR PICHAI: Thanks, Josh. The demo shows the real
opportunity with multimodality. Soon you’ll be able to mix and
match inputs and outputs. This is what we mean when we say
it’s an I/O for a new generation.
And I can see you all out there thinking about the
possibilities. But what if we could go even
further? That’s one of the opportunities
we see with AI agents. Let me take a step back and
explain what I mean by that. I think about them as
intelligent systems that show reasoning, planning, and memory.
Are able to “think” multiple steps ahead, work across
software and systems, all to get something done on your behalf,
and most importantly, under your supervision.
We are still in the early days, and you’ll see glimpses of our
approach throughout the day, but let me show you the kinds of use
cases we are working hard to solve. Let’s talk about shopping.
It’s pretty fun to shop for shoes, and a lot less fun to
return them when they don’t fit. Imagine if Gemini could do all
the steps for you: Searching your inbox for the receipt,
locating the order number from your email, filling out a return
form, and even scheduling a pickup. That's much easier, right?
[Applause]. Let’s take another example
that’s a bit more complex. Say you just moved to Chicago.
You can imagine Gemini and Chrome working together to help
you do a number of things to get ready: Organizing, reasoning,
synthesizing on your behalf. For example, you’ll want to
explore the city and find services nearby, from
dry-cleaners to dog-walkers. You will have to update your new
address across dozens of Web sites. Gemini can work across these
tasks and will prompt you for more information when needed, so
you are always in control. That part is really important.
as we prototype these experiences. We are thinking hard about how to do it in a way
that's private, secure and works for everyone. These are simple-use cases, but
they give you a good sense of the types of problems we want to
solve, by building intelligent systems that think ahead,
reason, and plan, all on your behalf.
The power of Gemini, with multimodality, long context and
agents, brings us closer to our ultimate goal: Making AI helpful
for everyone. We see this as how we will make
the most progress against our mission. Organizing the world’s
information across every input, making it accessible via any
output, and combining the world’s information with the
information in your world in a way that’s truly useful for you.
To fully realize the benefits of AI, we will continue to break
new ground. Google DeepMind is hard at work
on this. To share more, please welcome,
for the first time on the I/O stage, Sir Demis.
[Applause]. >>DEMIS HASSABIS:
Thanks, Sundar.
It's so great to be here. Ever since I was a kid, playing
chess for the England Junior Team, I’ve been thinking about
the nature of intelligence. I was captivated by the idea of
a computer that could think like a person.
It’s ultimately why I became a programmer and studied
neuroscience. I co-founded DeepMind in 2010
with the goal of one day building AGI: Artificial general
intelligence, a system that has human-level cognitive
capabilities. I’ve always believed that if we
could build this technology responsibly, its impact would be
truly profound and it could benefit humanity in incredible
ways. Last year,
we reached a milestone on that path when we formed Google DeepMind, combining AI talent
from across the company in to one super unit. Since then, we've built AI systems that can
do an amazing range of things, from turning language and vision into action for robots,
navigating complex virtual environments, involving Olympiad level math problems, and even discovering
thousands of new materials. Just last week, we announced
our next generation AlphaFold model. It can predict the structure and interactions
of nearly all of life's molecules, including how proteins interact with strands of DNA and RNA.
This will accelerate vitally important biological and medical research from
disease understanding to drug discovery. And all of this was made possible with the
best infrastructure for the AI era, including our highly optimized tensor processing units.
At the center of our efforts is our Gemini model. It's built up from the ground up to be natively
multimodal because that's how we interact with and understand the world around us.
We've built a variety of models for different use cases.
We've seen how powerful Gemini 1.5 Pro is, but we also know from user feedback that some
applications need lower latency and a lower cost to serve.
So today we’re introducing Gemini 1.5 Flash.
[Cheers and Applause]. Flash is a lighter-weight model
compared to Pro. It’s designed to be fast and
cost-efficient to serve at scale, while still featuring
multimodal reasoning capabilities and breakthrough
long context. Flash is optimized for tasks
where low latency and efficiency matter most.
Starting today, you can use 1.5 Flash and 1.5 Pro with up to one
million tokens in Google AI Studio and Vertex AI.
And developers can sign up to try two million tokens.
We’re so excited to see what all of you will create with it.
And you'll hear a little more about Flash later on from Josh.
We’re very excited by the progress we’ve made so far with
our family of Gemini models. But we’re always striving to
push the state-of-the-art even further.
At any one time we have many different models in training.
And we use our very large and powerful ones to help teach and
train our production-ready models.
Together with user feedback, this cutting-edge research will
help us to build amazing new products for billions of people.
For example, in December, we shared a glimpse into the future
of how people would interact with multimodal AI, and how this
would end up powering a new set of transformative experiences.
Today, we have some exciting new progress to share about the
future of AI assistants that we’re calling Project Astra.
[Cheers and Applause]. For a long time, we’ve wanted to
build a universal AI agent that can be truly helpful in everyday
life. Our work making this vision a
reality goes back many years. It's why we made Gemini multimodal
from the very beginning. An agent like this has to
understand and respond to our complex and dynamic world just
like we do. It would need to take in and
remember what it sees so it can understand context and take
action. And it would have to be
proactive, teachable and personal, so you can talk to it
naturally, without lag or delay. While we’ve made great strides
in developing AI systems that can understand multimodal
information, getting response time down to something
conversational is a difficult engineering challenge.
Building on our Gemini model, we’ve developed agents that can
process information faster by continuously encoding video
frames, combining the video and speech input into a timeline of
events, and caching this for efficient recall.
We’ve also enhanced how they sound, with a wider range of
intonations. These agents better understand
the context you’re in, and can respond quickly in conversation,
making the pace and quality of interaction feel much more
natural. Here’s a video of our prototype,
which you’ll see has two parts. Each part was captured in a
single take, in real time. >> Okay. Let's do some tests.
Tell me when you see something that makes sound.
>> I see a speaker, which makes sound.
>> What is that part of the speaker called?
>> That is the tweeter. It produces high frequency
sounds. >> Give me a creative
alliteration about these. >> Creative crayons color
cheerfully. They certainly craft colorful
creations. >> What does that part of the