Google I/O 2024 keynote in 17 minutes
Summary
TLDRGoogle I/O has unveiled a plethora of innovative AI advancements aimed at enhancing user experience across various platforms. The event highlighted the launch of Gemini 1.5 Pro, which offers a million-token context window for developers globally, with an expansion to 2 million tokens. The introduction of Gemini 1.5 Flash, a lighter model, and the upcoming Project Astra were also announced. New generative media tools, including Imagine 3 for photorealistic images and VR for high-quality video creation, were presented. Additionally, the sixth generation of TPUs, Trillium, was introduced, promising a 4.7x improvement in compute performance. Google also demonstrated multi-step reasoning in Google Search, new Gmail mobile capabilities, and the expansion of AI assistance with live interactions and personalized gems. The script concluded with the announcement of the next generation of Gemini and the open sourcing of Synth ID, showcasing Google's commitment to advancing AI technology for a smarter, more integrated future.
Takeaways
- ð **Google IO Launch**: Google is unveiling a revamped AI experience with new features and improvements across various services.
- ð **Gemini Update**: Gemini, Google's AI, is now more context-aware with an expanded context window up to 2 million tokens, enhancing multimodal capabilities.
- ð± **Mobile Gmail Enhancements**: Gmail mobile is introducing new features such as summarization and Q&A directly from the email interface.
- ð **Google Search Updates**: Google Search will incorporate multi-step reasoning to answer complex questions and break down larger queries into manageable parts.
- ðš **AI Media Tools**: New models for image, music, and video are being introduced, offering higher quality and more detailed generative content.
- ð§ **Project Astra**: This new AI assistance project aims to further the capabilities of understanding and interacting with AI through sound and code analysis.
- ð¡ **TPU Generation**: Google is set to release the sixth generation of TPU (Tensor Processing Units) called Trillium, offering significant compute performance improvements.
- ð **Workspace and Notebook**: Google is integrating AI into workspace tools, allowing for personalized and automated information synthesis and organization.
- ð€ **Virtual Teammate**: A prototype of a virtual Gemini-powered teammate named Chip is being developed to assist with project tracking and information synthesis.
- ð **Live Interaction**: An upcoming feature called 'live' will allow Gemini to interact with users in real-time through voice and visual inputs.
- ð **Educational Tools**: New models like learn LM are being introduced to assist with learning, including pre-made 'gems' for specific educational needs.
Q & A
What is the new feature called that Google is launching to improve the search experience?
-Google is launching a feature called Gemini, which is designed to provide a fully revamped experience by offering AI overviews and recognizing different contexts in searches.
How does Gemini help with identifying a user's car in a parking station?
-Gemini uses an AI system that recognizes cars that appear often, triangulates which one is the user's, and provides the license plate number.
What does the term 'multimodality' refer to in the context of Gemini's capabilities?
-Multimodality in Gemini refers to the ability to handle and analyze various types of data inputs, such as text, audio, video, or code, to provide more comprehensive search results.
What is the significance of the 1 million token context window in Gemini 1.5 Pro?
-The 1 million token context window in Gemini 1.5 Pro allows for the processing of long contexts, such as hundreds of pages of text or hours of audio, to provide more detailed and accurate information.
How is Gemini 1.5 Pro making it easier for developers globally?
-Google is making Gemini 1.5 Pro available to all developers globally, offering a powerful tool that can be used across 35 languages with an expanded context window of 2 million tokens.
What is the purpose of the 'flash' model in Gemini?
-The Gemini 1.5 Flash is a lighter weight model compared to the Pro version, designed to be more accessible and cost-effective for users with up to 1 million tokens in Google AI Studio and Vertex AI.
How does Google's AI assistance project Astra enhance the understanding of objects in space and time?
-Project Astra focuses on maintaining consistency of an object or subject's position in space over time, allowing for a more accurate and detailed understanding of its context and behavior.
What are the new generative media tools introduced by Google?
-Google has introduced new models for image, music, and video as part of their generative media tools, including Imagine 3 for photorealistic images and a new generative video model called VR.
How does the new Gemini powered side panel in Gmail mobile help users?
-The Gemini powered side panel in Gmail mobile provides a summary of salient information from emails, allows users to ask questions directly from the mobile card, and offers quick answers without the need to open emails.
What is the 'gems' feature in Gemini that is being introduced?
-Gems are customizable personal experts on any topic created by users in Gemini. They act based on the user's instructions and can be reused whenever needed for specific tasks or information.
What is the significance of the Trillium TPU and when will it be available to customers?
-The Trillium TPU is the sixth generation of Google's tensor processing units, offering a 4.7x improvement in compute performance per chip. It will be made available to Google Cloud customers in late 2024.
How does the new trip planning experience in Gemini Advanced work?
-The trip planning experience in Gemini Advanced gathers information from various sources like search, maps, and Gmail to create a personalized vacation plan. Users can interact with the plan, making adjustments as needed, and Gemini will dynamically update the itinerary.
Outlines
ð Google IO Launches Gemini 1.5 Pro and Advanced Features
Google IO introduces a revamped AI experience with the launch of Gemini 1.5 Pro, which offers a 1 million token context window for developers globally. The platform is set to expand to 2 million tokens, aiming for infinite context. Gemini's capabilities are showcased through various use cases, including parking station payments, sports motion analysis, and drafting applications. The script also mentions new AI tools like Imagine 3 for photorealistic images, Music AI Sandbox for music creation, and VR for generative videos. Project Astra is teased as the future of AI assistance.
ð New TPU Generation and AI Overviews for Complex Queries
The sixth generation of TPU, Trillium, is announced with a 4.7x improvement in compute performance. Google search is set to receive multi-step reasoning to handle complex queries, such as finding the best yoga studios in Boston, including details on their offers and walking times. Additionally, Google search will soon allow users to ask questions with videos, and Gmail mobile will get new capabilities like summarizing emails and a Q&A feature for quick answers.
ð€ Gemini Nano and Personalized AI Tools for Enhanced Accessibility
The script discusses the upcoming improvements to the talk back feature with the multimodal capabilities of Gemini Nano, providing richer and clearer descriptions for users, even without a network connection. The introduction of Poly Gemma, the first Vision language open model, and the next generation of Gemma, Jimma 2, is also highlighted. Synth ID is being expanded to include text and video modalities, and plans for open sourcing Synth ID text watermarking are shared.
ð Learning Tools and Personalized AI Experiences with Gems
Google introduces Learn LM, a new family of models based on Gemini and fine-tuned for learning. Pre-made gems for the Gemini app and web experience are in development, including a learning coach. The script also mentions the ability to create personalized experts on any topic through 'gems' and a new trip planning experience in Gemini Advanced that uses information from various sources to create a personalized vacation plan.
Mindmap
Keywords
ð¡Google IO
ð¡Gemini
ð¡Multimodality
ð¡1 million token context window
ð¡AI Assistance
ð¡Project Astra
ð¡TPUs (Tensor Processing Units)
ð¡Imagine 3
ð¡Video FX
ð¡Gmail Mobile
ð¡Gemini Advanced
Highlights
Google IO introduces a fully revamped AI experience with a focus on multimodality and long context understanding.
Gemini, Google's AI assistant, is set to expand its capabilities to more countries with enhanced context recognition.
Google Photos will use AI to identify and provide license plate numbers of frequently appearing cars, simplifying parking payments.
The new Gemini 1.5 Pro will allow for up to 1 million token context windows, significantly improving the depth of AI understanding.
Google is expanding the context window to 2 million tokens, a step towards the goal of infinite context.
Gemini can provide meeting highlights from Google Meet recordings, aiding in time management for busy professionals.
Google Workspace Labs Notebook will personalize science discussions for users, enhancing the learning experience.
Gemini 1.5 Flash, a lighter model, is introduced for use in Google AI Studio and Vertex AI with up to 1 million tokens.
Project Astra is a new initiative in AI assistance that will recognize objects and sounds, like speakers, and provide detailed information.
Imagine 3, a new generative media tool, offers highly realistic image generation with rich details and fewer artifacts.
Google and YouTube are developing Music AI Sandbox, a suite of professional music AI tools for creating and transforming music.
VR, a new generative video model, can create high-quality 1080p videos from text, image, and video prompts in various styles.
Google is introducing Trillium, the sixth generation of TPUs, promising a 4.7x improvement in compute performance per chip.
Multi-step reasoning in Google Search will allow users to ask more complex questions and receive detailed answers.
Google Search will soon support video questions, providing AI overviews and troubleshooting steps for issues shown in videos.
Gmail mobile will receive new capabilities, including a summarize feature and a Q&A card for quick responses.
Gemini's new capabilities will help users organize and track receipts, automating the process of data extraction and analysis.
A virtual Gemini-powered teammate, Chip, is being prototyped to monitor and track projects, organize information, and provide context.
Live, a new Gemini feature, will allow users to have in-depth conversations with Gemini using voice and real-time visual feedback.
Gems, personalized AI experts on any topic, will be introduced, allowing users to create custom AI assistance tailored to their needs.
Gemini Advanced will offer a new trip planning experience, utilizing gathered information to create a personalized vacation plan.
Google is working on making Gemini context-aware, allowing it to generate images and understand video content based on user interactions.
Talk Back, an accessibility feature, will be enhanced with multimodal capabilities of Gemini Nano for a richer user experience.
Google is expanding Synth ID to text and video modalities and plans to open source Synth ID text in the coming months.
Learn LM, a new family of models based on Gemini and fine-tuned for learning, will be introduced with pre-made gems for educational purposes.
Transcripts
[Applause]
[Music]
Google we all ready to do a little
Googling welcome to Google IO it's great
to have all of you with us we'll begin
launching this fully revamped experience
AI overviews to everyone in the US this
week and we'll bring it to more
countries soon with Gemini you're making
that a whole lot easier say you're at a
parking station ready to pay now you can
simply ask photos it knows the cars that
appear often it triangulates which one
is yours and just tells you the license
plate number you can even follow up with
something more complex show me how Luci
swimming has progressed here Gemini goes
beyond a simple search recognizing
different contexts from doing laps in
the pool to snorkeling in the ocean we
are rolling out as photos this this
summer with more capabilities to come
multimodality radically expands the
questions we can ask and the answers we
will get back long context takes this a
step further enabling us to bring in
even more information hundreds of pages
of text hours of audio a full hour of
video or entire code repost you need a 1
million token context window now
possible with Gemini 1.5 Pro I'm excited
to announce that we are bringing this
improved version of Gemini 1.5 Pro to to
all developers globally Gemini 1.5 Pro
with 1 million contexts is now directly
available for consumers in Gemini
Advanced and can be used across 35
languages so today we are expanding the
context window to 2 million
tokens this represents the next step on
our journey towards the ultimate goal of
infinite context and you couldn't make
the PTA meeting the recording of the
meeting is an hour along if it's from
Google meet you can ask Gemini to give
you the
highlights there's a parents group
looking for volunteers you're free that
day of course Gemini can draft a apply
Gemini 1.5 Pro is available today in
workspace Labs notebook LM is going to
take all the materials on the left as
input and output them into a lively
science discussion personalized for him
so let's uh let's dive into physics
what's on deck for today well uh we're
starting with the basics force and
motion okay and that of course means we
have to talk about Sir Isaac Newton and
his three laws of motion and what's
amazing is that my son and I can join
into the conversation and steer it
whichever direction we want when I tap
join hold on we have a question what's
up
Josh yeah can you give my son Jimmy a
basketball
example hey Jimmy that's a fantastic
idea basketball is actually a great way
to visualize force and motion let's
break it down okay so first imagine a
basketball just sitting there on the
court it's not moving right that's
because all the forces acting on it are
balanced the downward pull of grav it
connected the dots and created that age
appropriate example for him making AI
helpful for everyone last year we
reached a milestone on that path when we
formed Google Deep Mind So today we're
introducing
Gemini 1.5 flash flash is a lighter
weight model compared to Pro starting
today you can use 1.5 Flash and 1.5 Pro
with up to 1 million tokens in Google AI
studio and vertex AI today we have some
exciting new progress to share about the
future of AI assistance that we're
calling project Astra tell me when you
see something that makes
sound I see a speaker which makes sound
what is that part of the speaker
called that is the Tweeter it produces
high frequency
sounds what does that part of the code
do this code defines encryption and
decryption functions it seems to use AES
CBC encryption to encode and decode data
based on a key and an initialization
Vector
IV what can I add here here to make this
system
faster adding a cache between the server
and database could improve speed today
we're introducing a series of updates
across our generative media tools with
new models covering image music and
video today I'm so excited to introduce
imagine 3 imagine 3 is more
photorealistic you can literally count
the whiskers on its snout with richer
details like this incredible sunlight in
the shot and fewer visual artifacts or
distorted images you can sign up today
to try imagine 3 in image FX part of our
suite of AI tools at labs. gooogle
together with YouTube we've been
building music AI sandbox a suite of
professional music AI tools that can
create new instrumental sections from
scratch transfer Styles between tracks
and more today I'm excited to announce
our newest most capable generative video
model called
VR VR creates high quality 1080p videos
from text image and video prompts it can
capture the details of your instructions
in different Visual and cinematic Styles
you can prompt for things like aerial
shots of a landscape or time lapse and
further edit your videos using
additional prompts you can use vo in our
new experimental tool called video FX
we're exploring features like
storyboarding and generating longer
scenes not only is it important to
understand where an object or subject
should be in space it needs to maintain
this consistency over time just like the
car in this video over the coming weeks
some of these features will be available
to select creators through video effects
at labs. gooogle and the weit list is
open now today we are exited to announce
the sixth generation of tpus called
Trillium Trillium delivers a 4.7x
Improvement in compute performance per
chip over the previous generation will
make Trillium available to our Cloud
customers in late 2024 we're making AI
overviews even more helpful for your
most complex questions to make this
possible we're introducing multi-step
reasoning in Google search soon you'll
be able to ask search to find the best
yoga or Pilates studios in Boston and
show you details on their intro offers
and the walking time from Beacon Hill
you get some studios with great ratings
and their introductory offers and you
can see the distance for each like this
one it's just a 10-minute walk away
right below you see where they're
located laid out visually it breaks your
bigger question down into all its parts
and it figures out which problems it
needs to solve and in what
order next take planning for example now
you can ask search to create a 3-day
meal plan for a group that's easy to
prepare and here you get a plan with a
wide range of recipes from across the
web if you want to get more veggies in
you can simply ask search to swap in a
vegetarian dish and you can export your
meal plan or get the ingredients as a
list just by tapping here soon you'll be
able to ask questions with video right
in Google search I'm going to take a
video and ask
Google why will this not stay in
place and a near instant Google gives me
an AI overview I guess some reasons this
might be happening and steps I can take
to troubleshoot you'll start to see
these features rolling out in search in
the coming weeks and now we're really
excited that the new Gemini powered side
panel will be generally available next
month three new capabilities coming to
Gmail mobile it looks like there's an
email threat on this with lots of emails
that I haven't read and luckily for me I
can simply tap the summarize option up
top and Skip reading this long back and
forth now Gemini pulls up this helpful
Mobile card as an overlay and this is
where I can read a nice summary of all
the Salient information that I need to
know now I can simply type out my
question right here in the Mobile card
and say something like compare my roof
repair bids by price and availability
this new Q&A feature makes it so easy to
get quick answers on anything in my
inbox without having to First search
Gmail then open the email and then look
for the specific information and
attachments and so on I see some
suggested replies from Gemini now here I
see I have declined the service
suggested new time these new
capabilities in Gemini and Gmail will
start rolling out this month to Labs
users it's got a PDF that's an
attachment from a hotel as a receipt and
I see a suggestion in the side panel
help me organize and track my receipts
step one create a drive folder and put
this receipt and 37 others it's found
into that folder step two extract the
relevant information from those receipts
in that folder into a new spreadsheet
Gemini offers you the option to automate
this so that this particular workflow is
run on all future emails Gemini does the
hard work of extracting all the right
information from all the files and in
that folder and generates this sheet for
you show me where the money is
spent Gemini not only analyzes the data
from the sheet but also creates a nice
visual to help me see the complete
breakdown by category this particular
ability will be rolling out to Labs
users this September we're prototyping a
virtual Gemini powered teammate Chip's
been given a specific job role with a
set of descriptions on how to be helpful
for the team you can see that here and
some of the jobs are to Monitor and
track projects we've listed a few out to
organize information and provide context
and a few more things are we on
track for
launch chip gets to work not only
searching through everything it has
access to but also synthesizing what's
found and coming back with an up-to-date
response there it is a clear timeline a
nice summary and notice even in this
first message here chip Flags a
potential issue the team should be aware
of because we're in a group space
everyone can follow along anyone can
jump in at any time as you see someone
just did asking chip to help create a
doc to help address the issue and this
summer you can have an in-depth
conversation with gini using your voice
we're calling this new experience live
when you go live you'll be able to open
your camera so Gemini can see what you
see and respond to your surroundings in
real time so we're rolling out a new
feature that lets you customize it for
your own needs and create personal
experts on any topic you want we're
calling these gems just tap to create a
gem write your instructions once and
come back whenever you need it for
example here's a gem that I created that
acts as a personal writing coach it
specializes in short stories with
mysterious twists and it even Builds on
the story drafts in my Google Drive gems
will roll out in the coming months that
reasoning and intelligence all come
together in the new trip planning
experience in in Gemini Advanced we're
going to Miami my son loves art my
husband loves seafood and our flight and
hotel details are already in my Gmail
inbox to make sense of these variables
Gemini starts by gathering all kinds of
information from search and helpful
extensions like maps and Gmail the end
result is a personalized vacation plan
presented in Gemini's new Dynamic UI I
like these recommendations but my family
likes to sleep in so I tap to change the
start time and just like that Gemini
adjusted my intinerary for the rest of
the trip this new trip planning
experience will be rolling out to Gemini
Advanced this summer you can upload your
entire thesis your sources your notes
your research and soon interview audio
recordings and videos too it can dissect
your main points identify improvements
and even roleplay as your profession
maybe you have a side hustle selling
handcrafted products simply upload all
of your spreadsheets and ask Gemini to
visualize your
earnings Gemini goes to work calculating
your returns and pulling its analysis
together into a single chart and of
course your files are not used to train
our models later this year we'll be
doubling the long context window to two
million tokens we're putting AI powered
search right at your fingertips create
let's say my son needs help with a
tricky physics word problem like this
one if he stumped on this question
instead of putting me on the spot he can
Circle the exact part he's stuck on and
get stepbystep
instructions right where he's already
doing the work this new capability is
available today now we're making Gemini
context aware so my friend Pete is
asking if I want to play pickle ball
this weekend so I'm going to reply and
try to be funny and I'll say uh is that
like tennis but with uh pickles and I'll
say uh create image of tennis with
Pickles now one new thing you'll notice
is that the Gemini window now hovers in
place above the app so I stay in the
flow okay so that generated some pretty
good images uh what's nice is I can then
drag and drop any of these directly into
the messages app below so like so cool
let me send that and because it's
context aware Gemini knows I'm looking
at a video so it proactively shows me an
ask this video chip what is is can't
type the two bounce rule by the way this
uses signals like YouTube's captions
which means you can use it on billions
of videos so give it a moment and there
starting with pixel later this year
we'll be expanding what's possible with
our latest model Gemini Nano with
multimodality so several years ago we
developed talk back an accessibility
feature that helps people navigate their
phone through touch and spoken feedback
and now we're taking that to the next
level with the multimodal capabilities
of Gemini Nano so when someone sends
Cara a photo she'll get a richer and
clearer description of what's happening
and the model even works when there's no
network connection these improvements to
talk back are coming later this year 1.5
Pro is $7 per 1 million tokens and I'm
excited to share that for prompts up to
128k it'll be 50% less for
$3.50 and 1.5 flash will start at 35
cents per 1 million tokens and today's
newest member poly Gemma our first
Vision language open model and it's
available right now I'm also too excited
to announce that we have Jimma 2 coming
it's the next generation of Gemma and it
will be available in June today we're
expanding synth ID to two new
modalities text and
video and in the coming months we'll be
open sourcing synth ID text water
marking I'm excited to introduce learn
LM our new family of models based on
Gemini and fine-tuned for learning we're
developing some pre-made gems which will
be available in the Gemini app and web
experience including one called learning
coach I have a feeling that someone out
there might be
counting how many times we have
mentioned AI today we went ahead and
counted so that you don't have
[Applause]
to that might be a record in how many
times someone has said
AI here's to the possibilities ahead and
creating them together thank you
5.0 / 5 (0 votes)