NEW GPT-4o: My Mind is Blown.

Joshua Chang
13 May 202406:28

Summary

TLDROpen AI has announced a new model, Chat GPT 40, which is faster and more capable than its predecessor, GPT 4. The new model is now free to use and retains features like Vision, Browse, and Memory, with improvements in response time and voice expressiveness. The voice feature allows for quick response times, averaging 320 milliseconds, and users can interrupt the conversation naturally. The model also has an emotional tone and can adjust its expressiveness, including singing. A new feature enables real-time object identification using a camera, and a desktop app has been introduced for productivity enhancement. The 'O' in GPT 40 signifies the integration of multimodal inputs into a single neural network, enhancing the model's understanding and response quality.

Takeaways

  • 🚀 OpenAI has announced a new model, Chat GPT 40, which is twice as fast and capable as its predecessor, GPT 4.
  • 🆓 Chat GPT 40 is now available for free, whereas GPT 4 previously required a $20 monthly subscription.
  • 👀 Chat GPT 40 retains features like Vision, Browse, Memory, and complex data analysis, which were present in GPT 4.
  • 🎤 A significant update in GPT 40 is the voice feature, which now has quicker response times, averaging 320 milliseconds.
  • 💬 Users can interrupt the AI mid-sentence by speaking, making the interaction more natural and intuitive.
  • 🎭 The AI's voice has been enhanced with more expressiveness and energy, although the tone can be adjusted upon request.
  • 🎶 The AI can now sing, as demonstrated in the presentation, adding another layer of expressiveness to its capabilities.
  • 📷 A new feature allows the AI to process real-time visual information by using the camera to point at objects and ask questions.
  • 💻 OpenAI introduced a desktop app that supports text and speech input, image uploads, and screen sharing for enhanced productivity.
  • 📈 The app can analyze visual data such as graphs directly from the user's screen, aiding in research and providing immediate insights.
  • 🔄 The 'O' in GPT 40 signifies the integration of multimodal inputs (text, speech, and vision) into a single neural network, improving the richness of interaction.
  • 🔍 The new Omni model processes voice inputs directly, capturing emotional and tonal nuances, unlike previous models that transcribed voice to text.

Q & A

  • What is the latest model announced by Open AI?

    -Open AI has announced Chat GPT 40, which is their new flagship model.

  • How does GPT 40 compare to GPT 4 in terms of speed and capability?

    -GPT 40 is twice as fast and more capable than GPT 4.

  • What was the previous cost associated with using GPT 4?

    -GPT 4 was previously available as a $20 monthly subscription.

  • What are the features that GPT 40 will be incorporating from GPT 4?

    -GPT 40 will include features like Vision for image analysis, Browse for real-time internet data, Memory for remembering facts about users, and the ability to analyze complex data such as Excel spreadsheets.

  • What is the average response time for GPT 40?

    -The average response time for GPT 40 is around 320 milliseconds, which is close to the average human response rate in a conversation.

  • What is special about the voice feature in GPT 40?

    -The voice feature in GPT 40 is more expressive and energetic, with the ability to change tones and even sing.

  • How does the new Omni model in GPT 40 handle multimodal inputs?

    -The Omni model in GPT 40 processes text, speech, and vision inputs all together in the same neural network, as opposed to previous models that transcribed speech to text first, thus capturing more emotional and tonal information.

  • What is the significance of the 'O' in GPT 40?

    -The 'O' in GPT 40 signifies that it takes multimodal inputs—text, speech, and vision—and processes them together in one neural network, rather than separately.

  • What is the new feature that allows real-time analysis of objects through a camera?

    -The new feature is a subset of Vision that enables users to point their camera at objects and ask questions about them in real time.

  • What additional capability was announced with the new desktop app?

    -The new desktop app allows for text and speech input, image uploading, and also includes a screen-sharing feature for the AI to analyze content on the user's screen.

  • How does the screen-sharing feature in the desktop app enhance productivity?

    -The screen-sharing feature allows users to have the AI analyze and provide insights on the content they are currently viewing on their computer, which can be beneficial for research and idea generation.

  • What is the presenter's opinion on the expressiveness of the voice in GPT 40?

    -The presenter feels that the voice in GPT 40 is overly energetic, like speaking to a hyper caffeinated friend, and suggests that a future option to customize the voice would be a smart move.

Outlines

00:00

🚀 Introducing ChatGPT-40: A Leap in AI Technology

Josh introduces ChatGPT-40, an advanced model from OpenAI that promises to be twice as fast as its predecessor, GPT-4, and will be available for free. Unlike GPT-4's subscription model, GPT-40 enhances the existing features like image uploads, internet browsing, memory, and complex data analysis without any cost. The highlight of the presentation was a demonstration showcasing the model's capabilities in answering various queries, including mathematical problems and storytelling. Significant improvements were noted in voice interaction, allowing conversations with minimal delays, akin to human response times. This development also integrates emotional expressiveness and voice modulation, adding a dynamic and personalized touch to interactions.

05:00

🧠 GPT-40's Omniscient Upgrade and Desktop App

The second paragraph discusses the omnimodal capabilities of GPT-40, which now processes text, speech, and vision inputs simultaneously through the same neural network, enhancing the AI's responsiveness and accuracy. This integration marks an improvement over previous models that processed these inputs separately, potentially losing out on nuances like tone and emotion. Additionally, a new desktop application is announced, which supports text and speech inputs, image uploads, and screen sharing. This app is aimed at boosting productivity by allowing the AI to interact with and analyze content directly from the user's screen, promising a versatile tool for both professional and personal use.

Mindmap

Keywords

💡Open AI

Open AI is a research and deployment company that aims to develop artificial general intelligence (AGI) in a way that benefits humanity as a whole. In the video, it is mentioned as the organization that has announced a new model, GPT 40, which is significant for its advancements in AI capabilities.

💡GPT 40

GPT 40 refers to the latest flagship model developed by Open AI. It is described as being twice as fast and more capable than its predecessor, GPT 4. The model is highlighted for its speed, voice feature enhancements, and multimodal input capabilities, which are central to the video's discussion on advancements in AI.

💡Free to use

This term refers to the fact that GPT 40 is available for use without any subscription fees, contrasting with the previous model, GPT 4, which required a $20 monthly subscription. This change is significant as it allows for wider accessibility to the advanced features of the AI model.

💡Vision

Vision is a feature of GPT 40 that allows the AI to process and understand images. Users can upload images and ask questions about them, which the AI can then respond to based on its analysis. This feature is part of the multimodal input capabilities discussed in the video.

💡Browse

The Browse feature enables GPT 40 to search the internet for real-time and up-to-date data. This is important as it allows the AI to provide information that is current and relevant, enhancing its utility as an informational resource.

💡Memory

Memory, in the context of GPT 40, refers to the AI's ability to remember facts about users. This personalization feature allows the AI to provide more tailored responses and is a step towards more individualized interactions.

💡Analyzing complex data

This capability allows GPT 40 to process and analyze complex datasets, such as Excel spreadsheets. Users can ask questions about the data, and the AI can provide insights, making it a powerful tool for data analysis.

💡Voice feature

The voice feature of GPT 40 is a significant update that allows for more natural and interactive communication. It includes quick response times, the ability to interrupt the AI by speaking, and expressiveness in the AI's voice, making the interaction more human-like.

💡Expressiveness

Expressiveness in the context of GPT 40's voice feature refers to the AI's ability to convey emotion and energy through its voice. This is demonstrated in the video through the AI's storytelling and singing, aiming to make interactions more engaging and personal.

💡Desktop app

The new desktop app announced for GPT 40 allows users to interact with the AI through text, speech, and image inputs. Additionally, it introduces screen sharing, which enables the AI to analyze content on the user's screen in real-time, potentially boosting productivity and research capabilities.

💡Multimodal inputs

Multimodal inputs refer to the AI's ability to process different types of inputs simultaneously, such as text, speech, and vision. This is a key advancement in GPT 40, as it allows the AI to consider all forms of input together, leading to more comprehensive and contextually aware responses.

Highlights

Open AI has announced Chat GPT 40, a new flagship model that is twice as fast and more capable than GPT 4.

GPT 40 will be free to use, a change from the previous $20 monthly subscription for GPT 4.

GPT 40 retains the features of GPT 4, including Vision for image analysis, Browse for internet data, and memory for personalization.

The new model will also include the ability to analyze complex data, such as Excel spreadsheets.

GPT 40 demonstrated impressive response times, averaging 320 milliseconds, close to the average human response rate.

Users can now interrupt the conversation by speaking, making interactions more intuitive.

The expressiveness and energy of the assistant's voice have been enhanced, making it feel more like a caffeinated friend.

GPT 40 can change its tone on command, such as being more dramatic or adopting a robotic voice.

A new feature allows the AI to analyze real-time visual input from a camera, giving it a form of 'vision'.

A new desktop app has been announced, enabling text and speech input, image uploads, and screen sharing for productivity.

The 'O' in GPT 40 signifies the integration of multimodal inputs into a single neural network, improving the richness of responses.

The Omni model processes voice, text, and vision together, capturing more emotional and tonal information than previous models.

The update aims to improve upon the shortcomings of the Humane AI pin, which was criticized for its slow response times.

The potential for customization of the voice in future updates is suggested as a smart move for user satisfaction.

The new model's capabilities are expected to significantly enhance productivity for computer-based tasks and research.

The integration of voice, text, and vision in GPT 40 is a major step forward in conversational AI technology.

The announcement raises curiosity about the upcoming response from Google, hinting at a competitive landscape in AI advancements.

The video presentation showcases the practical applications and potential impact of GPT 40 in a variety of use cases.

Transcripts

00:00

what's up Josh here so in case you

00:01

missed it open AI has just announced

00:03

chat GPT 40 which is their brand new

00:06

flagship model that is 2 times faster

00:08

and more capable than GPT 4 and good

00:11

news for all of us is going to be free

00:13

to use now GPT 4 was previously a $20

00:16

month subscription but now with 40 being

00:19

completely free uh we also get the

00:21

benefits of everything that we got with

00:23

gp4 there's Vision where you can upload

00:26

images and ask it questions about those

00:28

images there's also browse where can

00:30

scrub the internet for more real time

00:32

and upto-date data there's also memory

00:34

where it can actually remember facts

00:36

about you and then lastly there's

00:38

analyzing complex data so you can

00:40

actually give it like an Excel

00:42

spreadsheet and ask it questions about

00:43

that so all of those features are going

00:45

to be coming to 40 in the next couple of

00:47

weeks but yeah first of all let's just

00:49

start with everything that's going to be

00:50

new with GPT 40 so in the presentation

00:53

the most impressive part was obviously

00:56

the demo so they did a bunch of stuff uh

00:58

they asked it all kinds of questions

01:00

gave it math equations and asked it to

01:02

read Bedtime Stories and for the most

01:04

part I think the intelligence level and

01:06

like the answers it's giving is pretty

01:07

similar to the current GPT 4 which is

01:10

why I don't think they updated the name

01:12

to GPT 5 but surprisingly the biggest

01:14

updates of 40 actually come in the voice

01:18

feature hey chbt how are you doing I'm

01:21

doing fantastic thanks for asking how

01:23

about you pretty good what's up so my

01:25

friend um Barrett here he's been having

01:27

trouble sleeping lately and uh I want

01:29

you to tell him a bedtime story about

01:31

robots and love oh a bedtime story about

01:35

robots and love I got you covered so now

01:38

we have response times as quick as 232

01:41

milliseconds and with an average of 320

01:44

milliseconds which is sort of the

01:46

average human response rate of a

01:48

conversation you can also now just

01:49

interrupt the conversation simply by

01:51

speaking which I think is pretty

01:53

intuitive they even put this disclaimer

01:55

on the website that all of their videos

01:56

are played at one time speed because

01:58

previously there was such a delay that

01:59

that now it just seems like such a

02:01

drastic improvement so yeah clearly some

02:03

very impressive stuff here that they are

02:05

able to pull off just millisecs for a

02:08

response time and you know what I was

02:10

thinking the Humane AI pin really would

02:12

have benefited from GPT 4 with its

02:15

faster response times because it was

02:17

largely flamed online for how slow it

02:19

took to respond and it was running on

02:21

gp4 which was much slower who designed

02:25

the Washington Monument

02:30

but yeah that is the first thing that I

02:31

noticed is the speed but the second

02:33

thing you might have picked up on

02:34

already is the emotion behind the voice

02:37

how are

02:39

you I'm doing well thanks for asking how

02:42

about you hey chat PT how are you doing

02:45

I'm doing fantastic thanks for asking

02:47

how about you me the announcement is

02:51

about me well color me intrigued are you

02:54

about to reveal something about AI so it

02:57

seems like open AI has really just

02:59

dialed up the expressiveness and just

03:01

the overall energy of this assistant

03:04

which I'm not sure how I feel about it

03:07

just feels like you're talking to a

03:08

friend who is just overly caffeinated

03:10

and overly energized all of the time

03:13

which I think for an assistant should

03:15

just honestly be a little bit more

03:17

straightforward and straight up

03:18

hopefully in the future we can have the

03:20

option to customize the voice I think

03:22

that would be a smart move um but also

03:24

you can ask it to change its tone so in

03:26

the demo they asked it to be a little

03:28

bit more dramatic when reading a bedtime

03:29

story and they also asked it to read it

03:31

in a robotic voice I really want maximal

03:34

emotion like maximal expressiveness much

03:36

more than you were doing before

03:38

understood let's amplify the drama once

03:42

upon a time in a world not too different

03:45

from ours initiating dramatic robotic

03:49

voice and then also apparently the robot

03:52

can sing which I'll let you be the judge

03:54

of that and so bite found another robot

03:58

friend and they live circly Ever After

04:03

there's also a new feature that is sort

04:06

of a subset of vision which is uh being

04:08

able to take your camera and just

04:10

pointing at something and asking it

04:11

questions about that in real time sort

04:13

of like this beta test of giving the AI

04:16

eyes what do you

04:18

see a I see I love chat chpt that's so

04:23

sweet of you now as if all of that

04:26

wasn't enough they also announced a

04:27

brand new desktop app where you can do

04:29

do all of those same things like text

04:31

input speech input as well as upload

04:34

images but also on top of that uh you

04:37

can also screen share so you can have it

04:39

sort of just look at your screen and

04:41

whatever you're looking at you can ask

04:42

it questions now I think this is going

04:43

to be a huge productivity feature for

04:46

anybody who works on their computer a

04:48

lot in the demo they sort of showed how

04:49

it could analyze a graph that you're

04:51

looking at um but also I think it would

04:53

be really helpful for research purposes

04:56

uh and just I don't know there's just so

04:58

many use cases where I'm on the computer

05:00

and it would be nice to almost have a

05:02

conversational like assistant or someone

05:05

to bounce ideas off of I think that

05:07

would be really helpful all right make

05:08

sure I can see our screen can you find

05:11

which one is the hypotenuse oh okay I

05:13

see so um I think the hypotenuse is this

05:17

really long side from A to B would that

05:20

be

05:20

correct exactly well done now just to

05:23

quickly touch on what the O in 40

05:26

actually really is pointing to it's not

05:28

pointing to so much the fact that it's

05:29

omniscient or omnipotent but rather the

05:31

fact that it is taking your multimodal

05:33

inputs which is text speech and now

05:36

Vision all into the same neural network

05:38

whereas before it was processing those

05:41

uh separately so before with a voice

05:43

feature on 3.5 and 4 it would actually

05:45

take your voice and transcribe it into

05:47

text and so that's how it was

05:48

recognizing your input which basically

05:50

strips a lot of information from that

05:53

llm so all of your emotion and the tone

05:55

that would be captured in an audio

05:57

format is now just boiled down into text

06:00

so you can think of it like texting a

06:02

friend versus calling a friend so now

06:04

with a new Omni model it is sort of

06:05

taking all of those things into

06:07

consideration with their response but

06:10

yeah that is the latest update with open

06:12

AI clearly some very impressive stuff

06:14

cooking under the hood um I'm curious to

06:17

see what Google's going to come out with

06:18

uh tomorrow so definitely get subscribed

06:20

for that and that video is already out

06:22

it's probably on the screen somewhere

06:24

hope you enjoyed the video I'll catch

06:25

you guys in the next one peace