Google I/O 2024 keynote in 17 minutes

The Verge
14 May 202417:03

Summary

TLDRGoogle I/O has unveiled a plethora of innovative AI advancements aimed at enhancing user experience across various platforms. The event highlighted the launch of Gemini 1.5 Pro, which offers a million-token context window for developers globally, with an expansion to 2 million tokens. The introduction of Gemini 1.5 Flash, a lighter model, and the upcoming Project Astra were also announced. New generative media tools, including Imagine 3 for photorealistic images and VR for high-quality video creation, were presented. Additionally, the sixth generation of TPUs, Trillium, was introduced, promising a 4.7x improvement in compute performance. Google also demonstrated multi-step reasoning in Google Search, new Gmail mobile capabilities, and the expansion of AI assistance with live interactions and personalized gems. The script concluded with the announcement of the next generation of Gemini and the open sourcing of Synth ID, showcasing Google's commitment to advancing AI technology for a smarter, more integrated future.

Takeaways

  • ๐Ÿš€ **Google IO Launch**: Google is unveiling a revamped AI experience with new features and improvements across various services.
  • ๐ŸŒŸ **Gemini Update**: Gemini, Google's AI, is now more context-aware with an expanded context window up to 2 million tokens, enhancing multimodal capabilities.
  • ๐Ÿ“ฑ **Mobile Gmail Enhancements**: Gmail mobile is introducing new features such as summarization and Q&A directly from the email interface.
  • ๐Ÿ” **Google Search Updates**: Google Search will incorporate multi-step reasoning to answer complex questions and break down larger queries into manageable parts.
  • ๐ŸŽจ **AI Media Tools**: New models for image, music, and video are being introduced, offering higher quality and more detailed generative content.
  • ๐Ÿง  **Project Astra**: This new AI assistance project aims to further the capabilities of understanding and interacting with AI through sound and code analysis.
  • ๐Ÿ’ก **TPU Generation**: Google is set to release the sixth generation of TPU (Tensor Processing Units) called Trillium, offering significant compute performance improvements.
  • ๐Ÿ“ˆ **Workspace and Notebook**: Google is integrating AI into workspace tools, allowing for personalized and automated information synthesis and organization.
  • ๐Ÿค– **Virtual Teammate**: A prototype of a virtual Gemini-powered teammate named Chip is being developed to assist with project tracking and information synthesis.
  • ๐ŸŒ **Live Interaction**: An upcoming feature called 'live' will allow Gemini to interact with users in real-time through voice and visual inputs.
  • ๐Ÿ“š **Educational Tools**: New models like learn LM are being introduced to assist with learning, including pre-made 'gems' for specific educational needs.

Q & A

  • What is the new feature called that Google is launching to improve the search experience?

    -Google is launching a feature called Gemini, which is designed to provide a fully revamped experience by offering AI overviews and recognizing different contexts in searches.

  • How does Gemini help with identifying a user's car in a parking station?

    -Gemini uses an AI system that recognizes cars that appear often, triangulates which one is the user's, and provides the license plate number.

  • What does the term 'multimodality' refer to in the context of Gemini's capabilities?

    -Multimodality in Gemini refers to the ability to handle and analyze various types of data inputs, such as text, audio, video, or code, to provide more comprehensive search results.

  • What is the significance of the 1 million token context window in Gemini 1.5 Pro?

    -The 1 million token context window in Gemini 1.5 Pro allows for the processing of long contexts, such as hundreds of pages of text or hours of audio, to provide more detailed and accurate information.

  • How is Gemini 1.5 Pro making it easier for developers globally?

    -Google is making Gemini 1.5 Pro available to all developers globally, offering a powerful tool that can be used across 35 languages with an expanded context window of 2 million tokens.

  • What is the purpose of the 'flash' model in Gemini?

    -The Gemini 1.5 Flash is a lighter weight model compared to the Pro version, designed to be more accessible and cost-effective for users with up to 1 million tokens in Google AI Studio and Vertex AI.

  • How does Google's AI assistance project Astra enhance the understanding of objects in space and time?

    -Project Astra focuses on maintaining consistency of an object or subject's position in space over time, allowing for a more accurate and detailed understanding of its context and behavior.

  • What are the new generative media tools introduced by Google?

    -Google has introduced new models for image, music, and video as part of their generative media tools, including Imagine 3 for photorealistic images and a new generative video model called VR.

  • How does the new Gemini powered side panel in Gmail mobile help users?

    -The Gemini powered side panel in Gmail mobile provides a summary of salient information from emails, allows users to ask questions directly from the mobile card, and offers quick answers without the need to open emails.

  • What is the 'gems' feature in Gemini that is being introduced?

    -Gems are customizable personal experts on any topic created by users in Gemini. They act based on the user's instructions and can be reused whenever needed for specific tasks or information.

  • What is the significance of the Trillium TPU and when will it be available to customers?

    -The Trillium TPU is the sixth generation of Google's tensor processing units, offering a 4.7x improvement in compute performance per chip. It will be made available to Google Cloud customers in late 2024.

  • How does the new trip planning experience in Gemini Advanced work?

    -The trip planning experience in Gemini Advanced gathers information from various sources like search, maps, and Gmail to create a personalized vacation plan. Users can interact with the plan, making adjustments as needed, and Gemini will dynamically update the itinerary.

Outlines

00:00

๐Ÿš€ Google IO Launches Gemini 1.5 Pro and Advanced Features

Google IO introduces a revamped AI experience with the launch of Gemini 1.5 Pro, which offers a 1 million token context window for developers globally. The platform is set to expand to 2 million tokens, aiming for infinite context. Gemini's capabilities are showcased through various use cases, including parking station payments, sports motion analysis, and drafting applications. The script also mentions new AI tools like Imagine 3 for photorealistic images, Music AI Sandbox for music creation, and VR for generative videos. Project Astra is teased as the future of AI assistance.

05:01

๐Ÿ“ˆ New TPU Generation and AI Overviews for Complex Queries

The sixth generation of TPU, Trillium, is announced with a 4.7x improvement in compute performance. Google search is set to receive multi-step reasoning to handle complex queries, such as finding the best yoga studios in Boston, including details on their offers and walking times. Additionally, Google search will soon allow users to ask questions with videos, and Gmail mobile will get new capabilities like summarizing emails and a Q&A feature for quick answers.

10:01

๐Ÿค– Gemini Nano and Personalized AI Tools for Enhanced Accessibility

The script discusses the upcoming improvements to the talk back feature with the multimodal capabilities of Gemini Nano, providing richer and clearer descriptions for users, even without a network connection. The introduction of Poly Gemma, the first Vision language open model, and the next generation of Gemma, Jimma 2, is also highlighted. Synth ID is being expanded to include text and video modalities, and plans for open sourcing Synth ID text watermarking are shared.

15:03

๐Ÿ“š Learning Tools and Personalized AI Experiences with Gems

Google introduces Learn LM, a new family of models based on Gemini and fine-tuned for learning. Pre-made gems for the Gemini app and web experience are in development, including a learning coach. The script also mentions the ability to create personalized experts on any topic through 'gems' and a new trip planning experience in Gemini Advanced that uses information from various sources to create a personalized vacation plan.

Mindmap

Keywords

๐Ÿ’กGoogle IO

Google IO is Google's annual developer conference where the company announces new products, features, and updates. It is a key event for developers and tech enthusiasts to learn about the latest developments in Google's ecosystem. In the script, it is the event where the speaker introduces various AI advancements and new products.

๐Ÿ’กGemini

Gemini is an AI model referenced in the script that seems to be associated with Google's advancements in search and context-aware capabilities. It is used to demonstrate how AI can understand complex queries and provide relevant information. In the context of the video, Gemini is shown to enhance user experiences in various Google services.

๐Ÿ’กMultimodality

Multimodality refers to the ability of a system to process and understand multiple forms of input, such as text, images, audio, and video. In the script, it is mentioned as a feature that expands the types of questions users can ask and the richness of the answers they receive, highlighting the next generation of AI's ability to interact with users in more natural and comprehensive ways.

๐Ÿ’ก1 million token context window

The '1 million token context window' is a feature of the Gemini 1.5 Pro model that allows it to process and understand up to one million 'tokens,' which are units of meaning in language processing. This feature is significant as it enables the AI to handle long and complex inputs, thereby providing more detailed and contextually rich responses, as mentioned in the script when discussing the capabilities of Gemini 1.5 Pro.

๐Ÿ’กAI Assistance

AI Assistance refers to the use of artificial intelligence to help users with tasks, answer questions, and perform various functions. In the script, AI assistance is a central theme, with the introduction of new AI capabilities aimed at making everyday tasks easier and more intuitive, such as summarizing emails, creating meal plans, and providing travel recommendations.

๐Ÿ’กProject Astra

Project Astra is a future AI initiative mentioned in the script. Although not much detail is provided, it is suggested to be a significant step in the evolution of AI assistance. The name implies a connection to advanced or stellar (astral) capabilities, indicating that it may involve cutting-edge AI technologies.

๐Ÿ’กTPUs (Tensor Processing Units)

TPUs are specialized hardware accelerators developed by Google that are designed to speed up machine learning tasks. In the script, the sixth generation of TPUs, called Trillium, is announced, which offers a significant improvement in compute performance. This advancement is crucial for the development and deployment of more powerful AI models and applications.

๐Ÿ’กImagine 3

Imagine 3 is a new model in Google's suite of AI tools that is described as being more photorealistic, allowing for the creation of highly detailed images with fewer visual artifacts. It represents an advancement in generative AI for visual media, as it can produce images that are incredibly detailed, such as counting the whiskers on an animal's snout.

๐Ÿ’กVideo FX

Video FX is an experimental tool mentioned in the script that allows for the creation and editing of high-quality videos using AI. It is part of Google's generative media tools and is designed to help users generate longer scenes and storyboards. This tool is significant as it represents the expansion of AI capabilities into the realm of video production.

๐Ÿ’กGmail Mobile

Gmail Mobile refers to the mobile version of Google's email service. In the script, new capabilities for Gmail Mobile are discussed, such as the ability to summarize emails and provide quick answers to questions directly from the inbox. These features aim to make email management more efficient and accessible on mobile devices.

๐Ÿ’กGemini Advanced

Gemini Advanced is a version of the Gemini AI model that is mentioned to have additional capabilities and is available to developers globally. It is highlighted as being able to handle an expanded context window of up to 2 million tokens, which is significant for processing more complex and nuanced information. The script suggests that it will be used in various applications, from trip planning to personalized learning experiences.

Highlights

Google IO introduces a fully revamped AI experience with a focus on multimodality and long context understanding.

Gemini, Google's AI assistant, is set to expand its capabilities to more countries with enhanced context recognition.

Google Photos will use AI to identify and provide license plate numbers of frequently appearing cars, simplifying parking payments.

The new Gemini 1.5 Pro will allow for up to 1 million token context windows, significantly improving the depth of AI understanding.

Google is expanding the context window to 2 million tokens, a step towards the goal of infinite context.

Gemini can provide meeting highlights from Google Meet recordings, aiding in time management for busy professionals.

Google Workspace Labs Notebook will personalize science discussions for users, enhancing the learning experience.

Gemini 1.5 Flash, a lighter model, is introduced for use in Google AI Studio and Vertex AI with up to 1 million tokens.

Project Astra is a new initiative in AI assistance that will recognize objects and sounds, like speakers, and provide detailed information.

Imagine 3, a new generative media tool, offers highly realistic image generation with rich details and fewer artifacts.

Google and YouTube are developing Music AI Sandbox, a suite of professional music AI tools for creating and transforming music.

VR, a new generative video model, can create high-quality 1080p videos from text, image, and video prompts in various styles.

Google is introducing Trillium, the sixth generation of TPUs, promising a 4.7x improvement in compute performance per chip.

Multi-step reasoning in Google Search will allow users to ask more complex questions and receive detailed answers.

Google Search will soon support video questions, providing AI overviews and troubleshooting steps for issues shown in videos.

Gmail mobile will receive new capabilities, including a summarize feature and a Q&A card for quick responses.

Gemini's new capabilities will help users organize and track receipts, automating the process of data extraction and analysis.

A virtual Gemini-powered teammate, Chip, is being prototyped to monitor and track projects, organize information, and provide context.

Live, a new Gemini feature, will allow users to have in-depth conversations with Gemini using voice and real-time visual feedback.

Gems, personalized AI experts on any topic, will be introduced, allowing users to create custom AI assistance tailored to their needs.

Gemini Advanced will offer a new trip planning experience, utilizing gathered information to create a personalized vacation plan.

Google is working on making Gemini context-aware, allowing it to generate images and understand video content based on user interactions.

Talk Back, an accessibility feature, will be enhanced with multimodal capabilities of Gemini Nano for a richer user experience.

Google is expanding Synth ID to text and video modalities and plans to open source Synth ID text in the coming months.

Learn LM, a new family of models based on Gemini and fine-tuned for learning, will be introduced with pre-made gems for educational purposes.

Transcripts

00:00

[Applause]

00:02

[Music]

00:06

Google we all ready to do a little

00:09

Googling welcome to Google IO it's great

00:11

to have all of you with us we'll begin

00:13

launching this fully revamped experience

00:16

AI overviews to everyone in the US this

00:19

week and we'll bring it to more

00:21

countries soon with Gemini you're making

00:24

that a whole lot easier say you're at a

00:26

parking station ready to pay now you can

00:30

simply ask photos it knows the cars that

00:33

appear often it triangulates which one

00:35

is yours and just tells you the license

00:38

plate number you can even follow up with

00:41

something more complex show me how Luci

00:44

swimming has progressed here Gemini goes

00:48

beyond a simple search recognizing

00:50

different contexts from doing laps in

00:53

the pool to snorkeling in the ocean we

00:56

are rolling out as photos this this

00:58

summer with more capabilities to come

01:01

multimodality radically expands the

01:03

questions we can ask and the answers we

01:04

will get back long context takes this a

01:08

step further enabling us to bring in

01:10

even more information hundreds of pages

01:13

of text hours of audio a full hour of

01:17

video or entire code repost you need a 1

01:20

million token context window now

01:22

possible with Gemini 1.5 Pro I'm excited

01:25

to announce that we are bringing this

01:26

improved version of Gemini 1.5 Pro to to

01:30

all developers globally Gemini 1.5 Pro

01:34

with 1 million contexts is now directly

01:37

available for consumers in Gemini

01:39

Advanced and can be used across 35

01:42

languages so today we are expanding the

01:45

context window to 2 million

01:49

tokens this represents the next step on

01:51

our journey towards the ultimate goal of

01:54

infinite context and you couldn't make

01:55

the PTA meeting the recording of the

01:58

meeting is an hour along if it's from

02:01

Google meet you can ask Gemini to give

02:03

you the

02:04

highlights there's a parents group

02:06

looking for volunteers you're free that

02:08

day of course Gemini can draft a apply

02:12

Gemini 1.5 Pro is available today in

02:14

workspace Labs notebook LM is going to

02:17

take all the materials on the left as

02:19

input and output them into a lively

02:23

science discussion personalized for him

02:26

so let's uh let's dive into physics

02:27

what's on deck for today well uh we're

02:30

starting with the basics force and

02:31

motion okay and that of course means we

02:33

have to talk about Sir Isaac Newton and

02:35

his three laws of motion and what's

02:37

amazing is that my son and I can join

02:39

into the conversation and steer it

02:42

whichever direction we want when I tap

02:46

join hold on we have a question what's

02:48

up

02:49

Josh yeah can you give my son Jimmy a

02:53

basketball

02:57

example hey Jimmy that's a fantastic

03:00

idea basketball is actually a great way

03:03

to visualize force and motion let's

03:05

break it down okay so first imagine a

03:07

basketball just sitting there on the

03:09

court it's not moving right that's

03:11

because all the forces acting on it are

03:13

balanced the downward pull of grav it

03:16

connected the dots and created that age

03:18

appropriate example for him making AI

03:22

helpful for everyone last year we

03:24

reached a milestone on that path when we

03:26

formed Google Deep Mind So today we're

03:29

introducing

03:30

Gemini 1.5 flash flash is a lighter

03:33

weight model compared to Pro starting

03:35

today you can use 1.5 Flash and 1.5 Pro

03:39

with up to 1 million tokens in Google AI

03:41

studio and vertex AI today we have some

03:44

exciting new progress to share about the

03:47

future of AI assistance that we're

03:49

calling project Astra tell me when you

03:52

see something that makes

03:54

sound I see a speaker which makes sound

04:00

what is that part of the speaker

04:03

called that is the Tweeter it produces

04:06

high frequency

04:08

sounds what does that part of the code

04:13

do this code defines encryption and

04:16

decryption functions it seems to use AES

04:20

CBC encryption to encode and decode data

04:23

based on a key and an initialization

04:25

Vector

04:27

IV what can I add here here to make this

04:30

system

04:33

faster adding a cache between the server

04:36

and database could improve speed today

04:39

we're introducing a series of updates

04:41

across our generative media tools with

04:43

new models covering image music and

04:46

video today I'm so excited to introduce

04:49

imagine 3 imagine 3 is more

04:52

photorealistic you can literally count

04:54

the whiskers on its snout with richer

04:55

details like this incredible sunlight in

04:58

the shot and fewer visual artifacts or

05:00

distorted images you can sign up today

05:02

to try imagine 3 in image FX part of our

05:05

suite of AI tools at labs. gooogle

05:08

together with YouTube we've been

05:09

building music AI sandbox a suite of

05:13

professional music AI tools that can

05:15

create new instrumental sections from

05:17

scratch transfer Styles between tracks

05:20

and more today I'm excited to announce

05:22

our newest most capable generative video

05:25

model called

05:27

VR VR creates high quality 1080p videos

05:31

from text image and video prompts it can

05:35

capture the details of your instructions

05:36

in different Visual and cinematic Styles

05:39

you can prompt for things like aerial

05:41

shots of a landscape or time lapse and

05:43

further edit your videos using

05:45

additional prompts you can use vo in our

05:48

new experimental tool called video FX

05:51

we're exploring features like

05:52

storyboarding and generating longer

05:54

scenes not only is it important to

05:57

understand where an object or subject

05:58

should be in space it needs to maintain

06:00

this consistency over time just like the

06:03

car in this video over the coming weeks

06:06

some of these features will be available

06:08

to select creators through video effects

06:10

at labs. gooogle and the weit list is

06:13

open now today we are exited to announce

06:16

the sixth generation of tpus called

06:19

Trillium Trillium delivers a 4.7x

06:23

Improvement in compute performance per

06:25

chip over the previous generation will

06:28

make Trillium available to our Cloud

06:30

customers in late 2024 we're making AI

06:33

overviews even more helpful for your

06:35

most complex questions to make this

06:37

possible we're introducing multi-step

06:39

reasoning in Google search soon you'll

06:41

be able to ask search to find the best

06:43

yoga or Pilates studios in Boston and

06:46

show you details on their intro offers

06:48

and the walking time from Beacon Hill

06:50

you get some studios with great ratings

06:52

and their introductory offers and you

06:54

can see the distance for each like this

06:57

one it's just a 10-minute walk away

07:00

right below you see where they're

07:01

located laid out visually it breaks your

07:04

bigger question down into all its parts

07:07

and it figures out which problems it

07:09

needs to solve and in what

07:11

order next take planning for example now

07:15

you can ask search to create a 3-day

07:16

meal plan for a group that's easy to

07:19

prepare and here you get a plan with a

07:22

wide range of recipes from across the

07:24

web if you want to get more veggies in

07:26

you can simply ask search to swap in a

07:28

vegetarian dish and you can export your

07:30

meal plan or get the ingredients as a

07:32

list just by tapping here soon you'll be

07:35

able to ask questions with video right

07:38

in Google search I'm going to take a

07:40

video and ask

07:42

Google why will this not stay in

07:46

place and a near instant Google gives me

07:50

an AI overview I guess some reasons this

07:53

might be happening and steps I can take

07:55

to troubleshoot you'll start to see

07:57

these features rolling out in search in

07:59

the coming weeks and now we're really

08:02

excited that the new Gemini powered side

08:05

panel will be generally available next

08:10

month three new capabilities coming to

08:13

Gmail mobile it looks like there's an

08:17

email threat on this with lots of emails

08:19

that I haven't read and luckily for me I

08:22

can simply tap the summarize option up

08:26

top and Skip reading this long back and

08:28

forth now Gemini pulls up this helpful

08:32

Mobile card as an overlay and this is

08:35

where I can read a nice summary of all

08:38

the Salient information that I need to

08:40

know now I can simply type out my

08:43

question right here in the Mobile card

08:45

and say something like compare my roof

08:48

repair bids by price and availability

08:50

this new Q&A feature makes it so easy to

08:53

get quick answers on anything in my

08:55

inbox without having to First search

08:56

Gmail then open the email and then look

08:58

for the specific information and

09:00

attachments and so on I see some

09:02

suggested replies from Gemini now here I

09:04

see I have declined the service

09:06

suggested new time these new

09:09

capabilities in Gemini and Gmail will

09:11

start rolling out this month to Labs

09:14

users it's got a PDF that's an

09:16

attachment from a hotel as a receipt and

09:19

I see a suggestion in the side panel

09:21

help me organize and track my receipts

09:24

step one create a drive folder and put

09:27

this receipt and 37 others it's found

09:30

into that folder step two extract the

09:33

relevant information from those receipts

09:35

in that folder into a new spreadsheet

09:37

Gemini offers you the option to automate

09:40

this so that this particular workflow is

09:43

run on all future emails Gemini does the

09:46

hard work of extracting all the right

09:48

information from all the files and in

09:50

that folder and generates this sheet for

09:52

you show me where the money is

09:54

spent Gemini not only analyzes the data

09:57

from the sheet but also creates a nice

10:01

visual to help me see the complete

10:03

breakdown by category this particular

10:06

ability will be rolling out to Labs

10:08

users this September we're prototyping a

10:12

virtual Gemini powered teammate Chip's

10:16

been given a specific job role with a

10:18

set of descriptions on how to be helpful

10:20

for the team you can see that here and

10:22

some of the jobs are to Monitor and

10:23

track projects we've listed a few out to

10:25

organize information and provide context

10:27

and a few more things are we on

10:31

track for

10:34

launch chip gets to work not only

10:36

searching through everything it has

10:38

access to but also synthesizing what's

10:40

found and coming back with an up-to-date

10:44

response there it is a clear timeline a

10:47

nice summary and notice even in this

10:48

first message here chip Flags a

10:51

potential issue the team should be aware

10:52

of because we're in a group space

10:54

everyone can follow along anyone can

10:56

jump in at any time as you see someone

10:59

just did asking chip to help create a

11:01

doc to help address the issue and this

11:04

summer you can have an in-depth

11:06

conversation with gini using your voice

11:09

we're calling this new experience live

11:12

when you go live you'll be able to open

11:15

your camera so Gemini can see what you

11:17

see and respond to your surroundings in

11:20

real time so we're rolling out a new

11:22

feature that lets you customize it for

11:25

your own needs and create personal

11:27

experts on any topic you want we're

11:30

calling these gems just tap to create a

11:34

gem write your instructions once and

11:36

come back whenever you need it for

11:39

example here's a gem that I created that

11:41

acts as a personal writing coach it

11:44

specializes in short stories with

11:46

mysterious twists and it even Builds on

11:48

the story drafts in my Google Drive gems

11:52

will roll out in the coming months that

11:54

reasoning and intelligence all come

11:56

together in the new trip planning

11:58

experience in in Gemini Advanced we're

12:01

going to Miami my son loves art my

12:04

husband loves seafood and our flight and

12:06

hotel details are already in my Gmail

12:09

inbox to make sense of these variables

12:12

Gemini starts by gathering all kinds of

12:15

information from search and helpful

12:17

extensions like maps and Gmail the end

12:20

result is a personalized vacation plan

12:23

presented in Gemini's new Dynamic UI I

12:27

like these recommendations but my family

12:29

likes to sleep in so I tap to change the

12:33

start time and just like that Gemini

12:37

adjusted my intinerary for the rest of

12:39

the trip this new trip planning

12:41

experience will be rolling out to Gemini

12:43

Advanced this summer you can upload your

12:45

entire thesis your sources your notes

12:48

your research and soon interview audio

12:51

recordings and videos too it can dissect

12:54

your main points identify improvements

12:57

and even roleplay as your profession

13:00

maybe you have a side hustle selling

13:01

handcrafted products simply upload all

13:04

of your spreadsheets and ask Gemini to

13:06

visualize your

13:08

earnings Gemini goes to work calculating

13:11

your returns and pulling its analysis

13:13

together into a single chart and of

13:15

course your files are not used to train

13:17

our models later this year we'll be

13:20

doubling the long context window to two

13:23

million tokens we're putting AI powered

13:26

search right at your fingertips create

13:29

let's say my son needs help with a

13:30

tricky physics word problem like this

13:33

one if he stumped on this question

13:36

instead of putting me on the spot he can

13:38

Circle the exact part he's stuck on and

13:41

get stepbystep

13:42

instructions right where he's already

13:44

doing the work this new capability is

13:47

available today now we're making Gemini

13:51

context aware so my friend Pete is

13:55

asking if I want to play pickle ball

13:56

this weekend so I'm going to reply and

13:58

try to be funny and I'll say uh is that

14:00

like tennis but with uh pickles and I'll

14:04

say uh create image of tennis with

14:08

Pickles now one new thing you'll notice

14:10

is that the Gemini window now hovers in

14:12

place above the app so I stay in the

14:15

flow okay so that generated some pretty

14:17

good images uh what's nice is I can then

14:19

drag and drop any of these directly into

14:22

the messages app below so like so cool

14:25

let me send that and because it's

14:27

context aware Gemini knows I'm looking

14:30

at a video so it proactively shows me an

14:33

ask this video chip what is is can't

14:38

type the two bounce rule by the way this

14:41

uses signals like YouTube's captions

14:43

which means you can use it on billions

14:45

of videos so give it a moment and there

14:49

starting with pixel later this year

14:51

we'll be expanding what's possible with

14:53

our latest model Gemini Nano with

14:56

multimodality so several years ago we

14:58

developed talk back an accessibility

15:01

feature that helps people navigate their

15:03

phone through touch and spoken feedback

15:06

and now we're taking that to the next

15:07

level with the multimodal capabilities

15:09

of Gemini Nano so when someone sends

15:12

Cara a photo she'll get a richer and

15:14

clearer description of what's happening

15:17

and the model even works when there's no

15:18

network connection these improvements to

15:21

talk back are coming later this year 1.5

15:24

Pro is $7 per 1 million tokens and I'm

15:29

excited to share that for prompts up to

15:31

128k it'll be 50% less for

15:36

$3.50 and 1.5 flash will start at 35

15:41

cents per 1 million tokens and today's

15:45

newest member poly Gemma our first

15:49

Vision language open model and it's

15:51

available right now I'm also too excited

15:55

to announce that we have Jimma 2 coming

15:59

it's the next generation of Gemma and it

16:01

will be available in June today we're

16:04

expanding synth ID to two new

16:07

modalities text and

16:09

video and in the coming months we'll be

16:12

open sourcing synth ID text water

16:15

marking I'm excited to introduce learn

16:18

LM our new family of models based on

16:22

Gemini and fine-tuned for learning we're

16:25

developing some pre-made gems which will

16:28

be available in the Gemini app and web

16:30

experience including one called learning

16:33

coach I have a feeling that someone out

16:35

there might be

16:36

counting how many times we have

16:38

mentioned AI today we went ahead and

16:42

counted so that you don't have

16:45

[Applause]

16:48

to that might be a record in how many

16:50

times someone has said

16:54

AI here's to the possibilities ahead and

16:57

creating them together thank you

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
AI InnovationGoogle IOGemini ProAI AssistanceSearch EnhancementMultimodal AIAI Language ModelProject AstraGenerative MediaTPUsAI Studio