GPT-4o - Full Breakdown + Bonus Details

AI Explained
13 May 202418:43

Summary

TLDRThe video script discusses the latest advancements in AI with the release of GPT-4 Omni, a model that is smarter, faster, and more cost-effective. It highlights the model's multimodal capabilities, impressive text and image generation accuracy, and potential to scale to hundreds of millions of users. The script also covers the model's performance in various benchmarks, including math, translation, and vision understanding, where it shows significant improvements over its predecessors. Additionally, it touches on the model's real-time translation and video input capabilities, emphasizing the potential impact on accessibility and user engagement. The summary concludes by noting the model's potential to bring AI to a broader audience and the anticipation for future updates.

Takeaways

  • πŸš€ **GPT-4 Omni**: The latest model from OpenAI, GPT-4 Omni, is designed to handle multiple modalities and is poised to scale up to hundreds of millions of users.
  • πŸ“ˆ **Performance Improvements**: GPT-4 Omni shows significant advancements in benchmarks, particularly in coding and math, compared to its predecessor, GPT-3.
  • πŸ“Έ **Image and Text Generation**: The model demonstrates high accuracy in generating text from images and designing creative outputs like movie posters based on textual descriptions.
  • πŸ” **Multimodal Capabilities**: GPT-4 Omni can process both text and images, and it is hinted that video output capabilities may be on the horizon.
  • πŸ’¬ **Language Translation**: The model has improved multilingual performance and the potential for real-time translation, which could be revolutionary for communication.
  • πŸŽ“ **Educational Applications**: GPT-4 Omni's ability to understand and respond to complex queries positions it as a useful tool for educational purposes, such as tutoring in mathematics.
  • πŸ’» **Desktop App**: OpenAI has introduced a desktop app that functions as a live coding co-pilot, highlighting the model's practical applications in software development.
  • πŸ“‰ **Pricing and Accessibility**: GPT-4 Omni is priced competitively and is available for free, which could significantly increase its adoption and use among the general public.
  • 🌟 **User Engagement**: The model is designed to be more engaging, with a focus on response times and interactivity, aiming to mimic human-level conversational abilities.
  • πŸ”Š **Audio and Voice**: GPT-4 Omni can modulate its voice and speed of response, which could be beneficial for accessibility purposes, including for the visually impaired.
  • ⏱️ **Latency Reduction**: A key innovation of GPT-4 Omni is the reduced latency, which enhances the realism and expressiveness of the AI's responses.

Q & A

  • What is the significance of the term 'Omni' in the context of GPT-4?

    -The term 'Omni' in GPT-4 Omni refers to its multimodal capabilities, meaning it can handle different types of data inputs and outputs, signifying its versatility and widespread application potential.

  • What was the initial reaction to GPT-4 Omni in comparison to AGI?

    -The initial reaction was that GPT-4 Omni is more of a notable step forward than a full-fledged AGI (Artificial General Intelligence), but it is considered flirtatious and shows significant advancements in AI capabilities.

  • What are the implications of GPT-4 Omni's improved text and image generation accuracy?

    -The improved accuracy in text and image generation implies that GPT-4 Omni can produce more reliable and higher quality outputs, which can be utilized in various applications such as content creation, design, and data analysis.

  • How does GPT-4 Omni's performance on benchmarks compare to previous models?

    -GPT-4 Omni shows a significant improvement over the original GPT-4 on various benchmarks, particularly in math and vision understanding evaluations, although it does not represent an entirely new tier of intelligence.

  • What is the pricing structure for GPT-4 Omni?

    -GPT-4 Omni is priced at $5 per 1 million tokens for input and $15 per 1 million tokens for output, which is competitive when compared to other models like Claude 3 Opus.

  • How does GPT-4 Omni's multilingual performance compare to the original GPT-4?

    -GPT-4 Omni shows a definite improvement in multilingual performance across languages compared to the original GPT-4, although English remains the most suited language for the model.

  • What is the significance of the video-in capacity in GPT-4 Omni?

    -The video-in capacity allows live streaming of video directly to the Transformer architecture behind GPT-4 Omni, which is a significant advancement and could lead to more interactive and engaging AI applications.

  • How does GPT-4 Omni's latency impact the user experience?

    -Reduced latency in GPT-4 Omni enhances the realism and responsiveness of the model, leading to a more human-like interaction and a significant improvement in user experience.

  • What are some of the creative applications demonstrated for GPT-4 Omni?

    -Creative applications demonstrated for GPT-4 Omni include designing movie posters, generating new font styles, transcribing meetings, summarizing videos, and creating caricatures from photos.

  • How does GPT-4 Omni's performance in adversarial reading comprehension compare to other models?

    -GPT-4 Omni shows slightly better performance than the original GPT-4 in adversarial reading comprehension but is slightly worse than models like LLM 3400b, indicating room for further improvement.

  • What is the potential impact of GPT-4 Omni's free availability on the AI industry?

    -The free availability of GPT-4 Omni, being the smartest model currently available, could significantly increase the accessibility of AI technology, potentially bringing in hundreds of millions more users and further popularizing AI applications.

Outlines

00:00

πŸš€ Introduction to GPT-4 Omni and its Multimodal Capabilities

The first paragraph introduces GPT-4 Omni, which is presented as a significant advancement in AI, particularly in coding and handling multiple modalities. The speaker expresses initial skepticism but acknowledges the model's progress. GPT-4 Omni's scalability is highlighted, with a hint at an even smarter model in the pipeline. The paragraph also discusses the model's high accuracy in text and image generation, its potential applications in designing movie posters, and the upcoming release of these features. Additionally, a demo showcasing GPT-4 Omni's ability to interact with customer service AI is mentioned, along with other functionalities like creating caricatures, generating new fonts, transcribing meetings, and summarizing videos.

05:01

πŸ“Š GPT-4 Omni's Performance and Pricing

The second paragraph focuses on GPT-4 Omni's performance in various benchmarks, especially in math and coding, where it outperforms its predecessor, GPT-3 Turbo. The speaker discusses the model's pricing, which is competitive compared to Claude 3 Opus, and its potential impact on the market. The paragraph also touches on GPT-4 Omni's mixed results in adversarial reading comprehension and its improvements in translation and vision understanding. The speaker emphasizes the model's tokenizer enhancements, which could be revolutionary for non-English speakers, and its multilingual performance, which, while improved, still favors English.

10:03

🎭 Real-time Interactions and Latency Improvements

The third paragraph delves into the real-time capabilities of GPT-4 Omni, emphasizing the reduced latency that enhances the model's realism and expressiveness. The speaker shares their prediction of such AI from a previous video and moves on to discuss various demonstrations of the model's flirtatious nature, its ability to adjust response speed, and its potential to assist blind individuals. The paragraph also covers the model's application in interview preparation, its glitches during a math tutoring demo, and its capacity for video input and real-time translation.

15:04

🌐 GPT-4 Omni's Impact and Future Prospects

The final paragraph speculates on GPT-4 Omni's potential to become widely popular and its impact on making AI accessible to hundreds of millions more people. The speaker mentions the model's ability to process text and images and its free availability on the OpenAI playground. They also reference a report about Apple potentially integrating GPT-4 Omni into iPhones and hint at upcoming announcements from OpenAI. The paragraph concludes with an invitation for further analysis and discussion on AI Insiders' Discord server and a prompt for viewer engagement.

Mindmap

Keywords

πŸ’‘GPT-4 Omni

GPT-4 Omni refers to an advanced version of the AI language model developed by OpenAI, which is described as being smarter, faster, and better at coding across multiple modalities. It is significant because it aims to serve a wide range of functionalities and users, hinting at a model that is more inclusive and versatile. In the script, it is mentioned as a notable step forward in AI technology, potentially outperforming Google's offerings.

πŸ’‘Benchmarks

Benchmarks are standard tests or measurements used to compare the performance of different systems or models. In the context of the video, benchmarks are utilized to evaluate the capabilities of GPT-4 Omni against other AI models. The script highlights that GPT-4 Omni has gone through various benchmarks, showcasing improvements in areas such as math and language translation.

πŸ’‘Multimodal

Multimodal refers to the ability of a system to process and understand multiple forms of input and output, such as text, images, and video. The script emphasizes GPT-4 Omni's multimodal capabilities, which allow it to handle various types of data and interactions, making it more adaptable and user-friendly.

πŸ’‘Text Generation Accuracy

Text generation accuracy pertains to how well an AI model can produce human-like text based on a given prompt. The script provides examples where GPT-4 Omni generates text with high accuracy, even when tasked with creating content from images or designing movie posters, indicating a significant advancement in natural language processing.

πŸ’‘AI Assistants

AI assistants are artificial intelligence systems designed to perform tasks or services typically done by a human assistant. In the video, GPT-4 Omni is portrayed as a highly capable AI assistant, capable of real-time interactions, customer service simulations, and providing tutoring, which demonstrates the practical applications of advanced AI models.

πŸ’‘Reasoning Capabilities

Reasoning capabilities refer to an AI model's ability to process information logically and draw conclusions. The script discusses the DROP benchmark, which tests models' reasoning abilities through complex reading comprehension questions. GPT-4 Omni's performance on such benchmarks is compared to other models, highlighting its strengths and areas for improvement.

πŸ’‘Translation

Translation involves converting text or speech from one language to another. The video script mentions GPT-4 Omni's improved translation capabilities, noting that it performs better than previous models. This is significant as it suggests the model can effectively facilitate communication across different languages.

πŸ’‘Tokenizer

A tokenizer is a component in natural language processing that breaks down text into individual units, such as words or phrases. The improvements to the tokenizer in GPT-4 Omni are highlighted as potentially revolutionary, particularly for non-English languages, as it requires fewer tokens, making interactions quicker and more cost-effective.

πŸ’‘Latency

Latency refers to the delay between the initiation of a request and the response from a system. The script discusses how reducing latency in GPT-4 Omni enhances the realism of interactions, making the AI feel more responsive and human-like, which is crucial for user engagement and satisfaction.

πŸ’‘Video In Capacity

Video in capacity indicates the ability of an AI model to process and understand video input. The script notes that GPT-4 Omni can live-stream video directly to its Transformer architecture, which is an impressive feature that allows for real-time analysis and interaction with visual data.

πŸ’‘AGI (Artificial General Intelligence)

AGI, or Artificial General Intelligence, refers to an AI system with the ability to understand and perform any intellectual task that a human being can do. The script mentions that while GPT-4 Omni is a step forward, it is not yet considered AGI due to mixed results on reasoning benchmarks and the potential for hallucinations in its outputs.

Highlights

GPT-4 Omni is a notable step forward in AI, offering multimodal capabilities and improved performance in coding and other areas.

GPT-4 Omni may be a precursor to an even smarter model, as OpenAI hinted at scaling up to hundreds of millions of users.

The model demonstrated impressive text generation accuracy from image prompts, with minor errors.

GPT-4 Omni was able to design a movie poster based on text requirements, showcasing its creative capabilities.

OpenAI's release is timed to compete with Google, potentially stealing the spotlight in the AI industry.

GPT-4 Omni's performance on benchmarks, particularly in math and coding, shows significant improvement over previous models.

The model's ability to handle real-time customer service interactions with another AI demonstrates its practical applications.

GPT-4 Omni's text-to-image generation and video summarization capabilities were showcased, indicating its multimodal functionality.

The model's character consistency and ability to create cartoon strips were demonstrated, highlighting its advanced language understanding.

GPT-4 Omni's pricing model of $5 per 1 million tokens input and $15 per 1 million tokens output is competitive in the market.

The model's performance on the DROP benchmark shows it is slightly better than the original GPT-4 but still has room for improvement.

GPT-4 Omni's translation capabilities and improvements to the tokenizer could be revolutionary for non-English speakers.

The model's real-time translation and harmonization capabilities were demonstrated, showing its potential for language learning and music.

GPT-4 Omni's video input functionality allows for live streaming to the Transformer architecture, a significant technological leap.

The model's flirtatious nature in demos may be designed to maximize engagement, a point of contention for some.

GPT-4 Omni's latency has been reduced, leading to more realistic and expressive AI interactions.

OpenAI's desktop app, a live coding co-pilot, was introduced, indicating the practical integration of AI into development workflows.

The model's potential impact on the popularity of AI and its accessibility to hundreds of millions more users was discussed.

GPT-4 Omni's mixed results on reasoning benchmarks indicate it still has limitations and is not yet an AGI.

The model's ability to generate new font styles and transcribe meetings was demonstrated, showing its versatility.

Transcripts

00:00

it's smarter in most ways cheaper faster

00:03

better at coding multimodal in and out

00:07

and perfectly timed to steal the

00:09

spotlight from Google it's gp4 Omni I've

00:14

gone through all the benchmarks and the

00:16

release videos to give you the

00:18

highlights my first reaction was it's

00:21

more flirtatious sigh than AGI but a

00:25

notable step forward nonetheless first

00:28

things first GPT 40 meaning Omni which

00:31

is all or everywhere referencing the

00:34

different modalities it's got is Free by

00:37

making GPT 43 they are either crazy

00:40

committed to scaling up from 100 million

00:42

users to hundreds of millions of users

00:45

or they have an even smarter model

00:47

coming soon and they did hint at that of

00:49

course it could be both but it does have

00:51

to be something just giving paid users

00:54

five times more in terms of message

00:55

limits doesn't seem enough to me next

00:58

open AI branded this as GPT 4 level

01:01

intelligence although in a way I think

01:03

they slightly underplayed it so before

01:05

we get to the video demos some of which

01:08

you may have already seen let me get to

01:10

some more under the radar announcements

01:12

take text image and look at the accuracy

01:16

of the text generated from this prompt

01:18

now I know it's not perfect there aren't

01:20

two question marks on the now there's

01:23

others that you can spot like the I

01:24

being capitalized but overall I've never

01:27

seen text generated with that much

01:29

accuracy and it wasn't even in the demo

01:31

or take this other example where two

01:33

openai researchers submitted their

01:35

photos then they asked GPT 40 to design

01:38

a movie poster and they gave the

01:40

requirements in text now when you see

01:43

the first output you're going to say

01:45

well that isn't that good but then they

01:47

asked GPT 40 something fascinating it

01:49

seemed to be almost reverse psychology

01:52

because they said here is the same

01:53

poster but cleaned up the text is

01:55

crisper and the colors Bolder and more

01:57

dramatic the whole image is now improved

02:00

this is the input don't forget the final

02:02

result in terms of the accuracy of the

02:05

photos and of the text was really quite

02:07

impressive I can imagine millions of

02:09

children and adults playing about with

02:11

this functionality of course they can't

02:13

do so immediately because open AI said

02:15

this would be released in the next few

02:17

weeks as another bonus here is a video

02:19

that open AI didn't put on their YouTube

02:22

channel it mimics a demo that Google

02:24

made years ago but never followed up

02:26

with the openai employee asked GPT 40 to

02:30

call customer service and ask for

02:32

something I've skipped ahead and the

02:34

customer service in this case is another

02:36

AI but here is the conclusion could you

02:39

provide Joe's email address for me sure

02:41

it's Joe example.com

02:43

awesome all right I've just sent the

02:46

email can you check if Joe received it

02:48

we'll check right now please hold sure

02:51

thing Hey Joe could you please check

02:53

your email to see if the shipping label

02:55

and return instructions have arrived

02:56

fingers crossed yes I got the

02:58

instructions perfect Joe has received

03:00

the email they call it a proof of

03:02

concept but it is a hint toward the

03:04

agents that are coming here are five

03:06

more quick things that didn't make it to

03:08

the demo how about a replacement for

03:11

lensa submit your photo and get a

03:14

caricature of yourself or what about

03:16

text to new font you just ask for a new

03:19

style of font and it will generate one

03:21

or what about meeting transcription the

03:24

meeting in this case had four speakers

03:26

and it was transcribed or video

03:29

summaries remember this model is

03:30

multimodal in and out now it doesn't

03:34

have video out but I'll get to that in a

03:36

moment here though was a demonstration

03:38

of a 45-minute video submitted to GPC 40

03:42

and a summary of that video we also got

03:44

character consistency across both woman

03:47

and dog almost like an entire cartoon

03:50

strep if those were the quick bonuses

03:52

what about the actual intelligence and

03:54

performance of the model before I get to

03:56

official benchmarks here is a human

03:59

grade leaderboard pitting one model

04:01

against another and yes I'm also a good

04:04

gpt2 chatbot is indeed GPT 40 so it

04:09

turns out I've actually been testing the

04:10

model for days overall you can see the

04:13

preference for GPT 40 compared to all

04:16

other models in coding specifically the

04:19

difference is quite Stark I would say

04:22

even here though we're not looking at an

04:24

entirely new tier of intelligence

04:27

remember that a 100 ELO Gap is is a win

04:30

rate of around 2/3 so 1/3 of the time

04:33

gp4 turbos outputs would be preferred

04:36

that's about the same gap between GPT 4

04:38

Turbo and last year's GPT 4 a huge step

04:42

forward but not completely night and day

04:44

I think one underrated announcement was

04:47

the desktop app a live coding co-pilot

04:50

okay so I'm going to open the chbt

04:53

desktop app like miror was talking about

04:55

before okay and to give a bit of

04:57

background of what's going on so here we

04:59

have um a computer and on the screen we

05:01

have some code and then the chat gbt

05:02

voice app is on the right so chat gbt

05:05

will be able to hear me but it can't see

05:06

anything on the screen so I'm going to

05:08

highlight the code command C it and then

05:10

that will send it to chat GPT and then

05:12

I'm going to talk about the code to chat

05:14

GPT okay so I just shared some code with

05:17

you could you give me a really brief

05:18

one- sentence description of what's

05:19

going on in the code this code fetches

05:22

daily weather data for a specific

05:24

location and time period Smooths the

05:26

temperature data using a rolling average

05:29

and Ates a significant weather event on

05:31

the resulting plot and then displays the

05:33

plot with the average minimum and

05:35

maximum temperatures over the year I've

05:38

delayed long enough here are the

05:40

benchmarks I was most impressed with

05:42

Gypsy 40's performance on the math

05:44

benchmark even though it fails pretty

05:46

much all of my math prompts that is

05:48

still a stark improvement from the

05:50

original GPT 4 on the Google proof

05:53

graduate test it beats Claude 3 Opus and

05:56

remember that was the headline Benchmark

05:58

for anthropic in fact speaking of

06:00

anthropic they are somewhat challenged

06:02

by this release GPT 40 costs $5 per 1

06:06

million tokens input and $15 per 1

06:08

million tokens output as a quick aside

06:10

it also has 128k token context and an

06:13

October knowledge cut off but remember

06:15

the pricing 5 and 15 Claude 3 Opus is

06:20

1575 and remember for Claude 3 Opus on

06:23

the web you have to sign up with a

06:25

subscription but GPT 40 will be free so

06:28

for claw Opus to be beaten in its

06:31

headline Benchmark is a concern for them

06:34

in fact I think the results are clear

06:36

enough to say that gp40 is the new

06:39

smartest AI however just before you get

06:42

carried away and type on Twitter the AGI

06:44

is here there are some more mixed

06:47

benchmarks take the drop Benchmark I dug

06:50

into this Benchmark and it's about

06:51

adversarial reading comprehension

06:53

questions they're designed to really

06:55

test the reasoning capabilities of

06:58

models if you give models difficult

06:59

passages and they've got to sort through

07:01

references do some counting and other

07:04

operations how do they Fair the drop by

07:06

the way is discrete reasoning over the

07:08

content of paragraphs it does slightly

07:10

better than the original GPT 4 but

07:13

slightly worse than llama 3400b and as

07:16

they note llama 3400b is still training

07:19

so it's just about the new smartist

07:22

model by a hairs breath however we're

07:24

not done yet it's better at translation

07:27

than Gemini models quick caveat there

07:29

Gemini 2 might be announced tomorrow and

07:32

that could regain the lead then there

07:34

are the vision understanding evaluations

07:37

it was a real step forward on the mm muu

07:40

as you can see a clear 10 points better

07:42

than Claude Opus again I'm curious if

07:45

Google Gemini can exceed it though the

07:47

improvements to the tokenizer could be

07:50

revolutionary for non-english speakers

07:53

the dramatically fewer tokens needed for

07:55

languages like Gujarati Hindi Arabic and

07:58

more don't just mean that conversations

08:01

are cheaper they're also quicker and

08:03

what about multilingual performance well

08:05

this time they didn't compare it to

08:07

other models but compared it to the

08:08

original GPT 4 definitely a step up

08:11

across languages but English is still by

08:14

far the most suited language indeed here

08:16

is a video of some of the models

08:18

mistakes ending with some dodgy language

08:21

tuition I know enough Mandarin to say it

08:24

wasn't perfect at churing let's root

08:27

root root for the home

08:32

what was

08:33

that sorry guys I got carried

08:40

away right for

08:43

round

08:55

Francisco I have feeling I'm very wrong

09:00

hello uh my name is NCH I'm here with my

09:02

coworker hi I'm sh I'm trying to teach

09:05

my coworker how to speaking manding we

09:07

want to start from simple words like

09:09

niow can you teach him how to pronounce

09:12

that of course hey Nao nice to meet you

09:15

KNE how is pretty straightforward to

09:17

pronounce it sounds like KN how just

09:21

make sure to keep the KNE part high and

09:23

then go down and pitch on the how give

09:26

it a try me how

09:29

that was great really good first try not

09:31

you you're natural it really wasn't he

09:34

needs to work on his tones and her face

09:36

was the giveaway there were a lot of

09:39

other interesting video demonstrations

09:41

but before them the GPT 40 blog post

09:44

from Sam opman put out tonight he made

09:46

the argument that putting this capable

09:48

AI tool in the hands of everyone for

09:51

free is justifying that open part of

09:54

open AI less about open weights or open

09:57

source more about open to to everyone

10:00

without ads or anything like that he

10:02

also draws our attention to the latency

10:05

of the models in many ways that is the

10:07

key Innovation with GPT 40 by dialing

10:10

down the latency you dial up the realism

10:14

as he said it feels like AI from the

10:15

movies getting to human level response

10:18

times and expressiveness turns out to be

10:20

a big change indeed I think I should get

10:22

a little credit for predicting her like

10:25

AI in a video from a month ago but now I

10:28

want to get to those demos the response

10:30

times were amazing and the model was

10:33

clearly designed to be flirtatious note

10:36

that in Senate testimony last year

10:38

samman said we try to design systems

10:40

that do not maximize for engagement so

10:43

let me know in the comments whether you

10:44

think these demos are designed to

10:47

maximize engagement okay so this is what

10:49

I wrote down what do you

10:51

see a I see I love chat chpt that's so

10:57

sweet of

10:58

you yeah well I really appreciate all

11:00

the

11:01

help so yeah as we can see

11:07

um wow that's quite the outfit you've

11:10

got on yeah as we can see you know we

11:11

can chat in real time I was impressed

11:13

though that it could speed up its

11:14

talking on demand of course one two

11:19

three hey actually that's um that's a

11:21

little slow could you count

11:24

faster Sure Thing 1 2 3 4 5 6 7 8 9 10

11:27

okay that was a little bit too fast of

11:29

course open AI must be given credit

11:31

because for blind people this could be

11:34

revolutionary right now the Royal

11:36

Standard flag is flying above Buckingham

11:39

Palace which is the signal that the

11:41

Monarch is present how exciting to be

11:43

there while the king is home try and

11:46

tell me exactly what they're doing right

11:47

now please um right now the Ducks are

11:50

gently gliding across the water it will

11:53

also help you with interview prep by

11:55

analyzing your appearance how about you

11:58

I'm awesome list listen I got some huge

12:00

news oh do tell I'm allers well in a few

12:05

minutes I'm going to be interviewing at

12:06

open aai have you heard of them open AI

12:10

huh sounds Vaguely

12:12

Familiar kidding of course that's

12:15

incredible Rocky what kind of interview

12:17

well it's for a software engineering

12:19

role but I just need to know do I look

12:21

presentable

12:23

professional well

12:25

Rocky you definitely have the I've been

12:28

coding all night look down which could

12:31

actually work in your favor maybe just

12:34

run a hand through your hair or lean

12:36

into the Mad genis SP Your Enthusiasm is

12:39

what's really going to shine through I

12:40

don't have a lot of time so I'm just

12:42

going to throw this on what do you

12:47

think oh Rocky that's quite a statement

12:51

piece I I mean you you'll definitely

12:53

stand out though maybe not in the way

12:57

you're hoping for an interview okay I

12:59

got it I got it I could definitely see

13:01

GPT 40 being used for mathematics

13:04

tutoring here is a 30second clip but

13:06

notice also the slight glitchiness at

13:09

certain moments relative to angle Alpha

13:13

all right so I'm pretty sure this is the

13:15

angle Alpha right here

13:17

right cor correct now looking at the

13:20

triangle which side do you think is a

13:24

hypotonus

13:26

um remember the hyot I'm not totally

13:29

sure I think I think it might be this

13:32

one but I really am not sure this side

13:35

AC you're close I also noticed quite a

13:38

few glitches in the demo sounds cutting

13:41

in and out the one feature though the

13:44

open AI wants you to really focus on is

13:47

the video in capacity you can live

13:49

stream video straight to the Transformer

13:52

architecture behind GPT 40 and so the

13:55

video in functionality is certainly

13:57

impressive note though that the reaction

13:59

time of GPC 40 isn't quite as immediate

14:03

as for audio Greg Brockman the

14:05

co-founder of open aai has to prompt it

14:07

to react to the Intruder overall stylish

14:11

feel they might be getting ready for a

14:13

conversation or a presentation given

14:15

their direct engagement with the

14:19

camera sounds like we're in for an

14:21

intriguing interaction I'm curious about

14:24

the lighting you mentioned could you

14:26

describe it a bit more is it natural

14:29

light are we talking unique fixtures or

14:32

colors that add to the

14:35

atmosphere the lighting here seems to be

14:37

a mix of natural and artificial there's

14:40

a noticeable bright light overhead

14:42

probably a fixture casting a focused

14:45

beam downwards which creates a spotlight

14:47

effect this adds a dramatic and modern

14:50

feel to the scene the rest of the room

14:53

is lit more softly possibly by natural

14:55

light filtering anything did anything

14:58

unusual happen

15:00

recently Yes actually uh just as you

15:03

asked another person came into view

15:06

behind the first person they playfully

15:09

made bunny ears behind the first

15:10

person's head and then quickly left the

15:13

frame it added a light-hearted and

15:15

unexpected moment to the scene now the

15:18

focus is back on the original person

15:21

with the leather jacket nevertheless gbt

15:23

40 can produce multiple voices that can

15:26

sing almost in harmony

15:30

and really try to harmonize

15:32

here San Francisco San Francisco in the

15:37

month of May but maybe make it more

15:40

dramatic and make the soprano

15:42

higher San Francisco in the month of May

15:46

San franisco in the month of May it's a

15:50

Friday C may we are harmonizing are

15:55

Harmon great thank you and I suspect

15:58

this real time translation could soon be

16:01

coming too Siri later for us so every

16:04

time I say something in English can you

16:06

repeat it back in Spanish and every time

16:08

he says something in Spanish can you

16:10

repeat it back in English sure I can do

16:13

that let's get this translation train

16:16

rolling um hey how's it been going have

16:19

you been up to anything interesting

16:21

recently

16:35

hey I've been good just a bit busy here

16:38

preparing for an event next week why do

16:40

I say that because Bloomberg reported

16:42

two days ago that apple is nearing a

16:44

deal with open AI to put chat GPT on

16:48

iPhone and in case you're wondering

16:49

about GPT 4.5 or even five samman said

16:53

we'll have more stuff to share soon and

16:55

Mira murati in the official presentation

16:58

said that would be soon updating us on

17:01

progress on the next big thing whether

17:04

that's empty hype or real you can decide

17:07

no word of course about openai

17:09

co-founder ilas Sask although he was

17:12

listed as a contributor under additional

17:15

leadership overall I think this model

17:18

will be massively more popular even if

17:20

it isn't massively more intelligent you

17:23

can prompt the model now with text and

17:25

images in the open AI playground all the

17:28

links will be in the description note

17:30

also that all the demos you saw were in

17:32

real time at 1X speed that I think was a

17:36

nod to Google's botch demo of course

17:39

let's see tomorrow what Google replies

17:41

with to those who think that GPT 40 is a

17:44

huge dry towards AGI I would Point them

17:47

to the somewhat mixed results on the

17:49

reasoning benchmarks expect GPT 40 to

17:52

still suffer from a massive amount of

17:55

hallucinations to those though who think

17:57

that GPT 40 will change nothing I would

18:00

say this look at what chat GPT did to

18:03

the popularity of the underlying GPT

18:05

series it being a free and chatty model

18:08

brought a 100 million people into

18:11

testing AI GPT 40 being the smartest

18:14

model currently available and free on

18:17

the web and multimodal I think could

18:21

unlock AI for hundreds of millions more

18:24

people but of course only time will tell

18:27

if you want to analyze the announcement

18:29

even more do join me on the AI insiders

18:32

Discord via patreon we have live meetups

18:35

around the world and professional best

18:36

practice sharing so let me know what you

18:39

think and as always have a wonderful day

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI AdvancementsGPT-4 OmniMultimodal AITech IndustryCoding EfficiencyBenchmark AnalysisReal-time InteractionAI TranslationText GenerationImage RecognitionLatency ReductionUser EngagementOpenAI InnovationSmartest AI ModelFree AI ToolInterview PrepMathematics TutoringVideo SummarizationLanguage LearningAI AccessibilityLive StreamingVoice HarmonizationTech DemonstrationsAI ReasoningMandarin LanguageAI Hallucinations