NEW GPT-4o: My Mind is Blown.
Summary
TLDROpen AI has announced a new model, Chat GPT 40, which is faster and more capable than its predecessor, GPT 4. The new model is now free to use and retains features like Vision, Browse, and Memory, with improvements in response time and voice expressiveness. The voice feature allows for quick response times, averaging 320 milliseconds, and users can interrupt the conversation naturally. The model also has an emotional tone and can adjust its expressiveness, including singing. A new feature enables real-time object identification using a camera, and a desktop app has been introduced for productivity enhancement. The 'O' in GPT 40 signifies the integration of multimodal inputs into a single neural network, enhancing the model's understanding and response quality.
Takeaways
- 🚀 OpenAI has announced a new model, Chat GPT 40, which is twice as fast and capable as its predecessor, GPT 4.
- 🆓 Chat GPT 40 is now available for free, whereas GPT 4 previously required a $20 monthly subscription.
- 👀 Chat GPT 40 retains features like Vision, Browse, Memory, and complex data analysis, which were present in GPT 4.
- 🎤 A significant update in GPT 40 is the voice feature, which now has quicker response times, averaging 320 milliseconds.
- 💬 Users can interrupt the AI mid-sentence by speaking, making the interaction more natural and intuitive.
- 🎭 The AI's voice has been enhanced with more expressiveness and energy, although the tone can be adjusted upon request.
- 🎶 The AI can now sing, as demonstrated in the presentation, adding another layer of expressiveness to its capabilities.
- 📷 A new feature allows the AI to process real-time visual information by using the camera to point at objects and ask questions.
- 💻 OpenAI introduced a desktop app that supports text and speech input, image uploads, and screen sharing for enhanced productivity.
- 📈 The app can analyze visual data such as graphs directly from the user's screen, aiding in research and providing immediate insights.
- 🔄 The 'O' in GPT 40 signifies the integration of multimodal inputs (text, speech, and vision) into a single neural network, improving the richness of interaction.
- 🔍 The new Omni model processes voice inputs directly, capturing emotional and tonal nuances, unlike previous models that transcribed voice to text.
Q & A
What is the latest model announced by Open AI?
-Open AI has announced Chat GPT 40, which is their new flagship model.
How does GPT 40 compare to GPT 4 in terms of speed and capability?
-GPT 40 is twice as fast and more capable than GPT 4.
What was the previous cost associated with using GPT 4?
-GPT 4 was previously available as a $20 monthly subscription.
What are the features that GPT 40 will be incorporating from GPT 4?
-GPT 40 will include features like Vision for image analysis, Browse for real-time internet data, Memory for remembering facts about users, and the ability to analyze complex data such as Excel spreadsheets.
What is the average response time for GPT 40?
-The average response time for GPT 40 is around 320 milliseconds, which is close to the average human response rate in a conversation.
What is special about the voice feature in GPT 40?
-The voice feature in GPT 40 is more expressive and energetic, with the ability to change tones and even sing.
How does the new Omni model in GPT 40 handle multimodal inputs?
-The Omni model in GPT 40 processes text, speech, and vision inputs all together in the same neural network, as opposed to previous models that transcribed speech to text first, thus capturing more emotional and tonal information.
What is the significance of the 'O' in GPT 40?
-The 'O' in GPT 40 signifies that it takes multimodal inputs—text, speech, and vision—and processes them together in one neural network, rather than separately.
What is the new feature that allows real-time analysis of objects through a camera?
-The new feature is a subset of Vision that enables users to point their camera at objects and ask questions about them in real time.
What additional capability was announced with the new desktop app?
-The new desktop app allows for text and speech input, image uploading, and also includes a screen-sharing feature for the AI to analyze content on the user's screen.
How does the screen-sharing feature in the desktop app enhance productivity?
-The screen-sharing feature allows users to have the AI analyze and provide insights on the content they are currently viewing on their computer, which can be beneficial for research and idea generation.
What is the presenter's opinion on the expressiveness of the voice in GPT 40?
-The presenter feels that the voice in GPT 40 is overly energetic, like speaking to a hyper caffeinated friend, and suggests that a future option to customize the voice would be a smart move.
Outlines
🚀 Introducing ChatGPT-40: A Leap in AI Technology
Josh introduces ChatGPT-40, an advanced model from OpenAI that promises to be twice as fast as its predecessor, GPT-4, and will be available for free. Unlike GPT-4's subscription model, GPT-40 enhances the existing features like image uploads, internet browsing, memory, and complex data analysis without any cost. The highlight of the presentation was a demonstration showcasing the model's capabilities in answering various queries, including mathematical problems and storytelling. Significant improvements were noted in voice interaction, allowing conversations with minimal delays, akin to human response times. This development also integrates emotional expressiveness and voice modulation, adding a dynamic and personalized touch to interactions.
🧠 GPT-40's Omniscient Upgrade and Desktop App
The second paragraph discusses the omnimodal capabilities of GPT-40, which now processes text, speech, and vision inputs simultaneously through the same neural network, enhancing the AI's responsiveness and accuracy. This integration marks an improvement over previous models that processed these inputs separately, potentially losing out on nuances like tone and emotion. Additionally, a new desktop application is announced, which supports text and speech inputs, image uploads, and screen sharing. This app is aimed at boosting productivity by allowing the AI to interact with and analyze content directly from the user's screen, promising a versatile tool for both professional and personal use.
Mindmap
Keywords
💡Open AI
💡GPT 40
💡Free to use
💡Vision
💡Browse
💡Memory
💡Analyzing complex data
💡Voice feature
💡Expressiveness
💡Desktop app
💡Multimodal inputs
Highlights
Open AI has announced Chat GPT 40, a new flagship model that is twice as fast and more capable than GPT 4.
GPT 40 will be free to use, a change from the previous $20 monthly subscription for GPT 4.
GPT 40 retains the features of GPT 4, including Vision for image analysis, Browse for internet data, and memory for personalization.
The new model will also include the ability to analyze complex data, such as Excel spreadsheets.
GPT 40 demonstrated impressive response times, averaging 320 milliseconds, close to the average human response rate.
Users can now interrupt the conversation by speaking, making interactions more intuitive.
The expressiveness and energy of the assistant's voice have been enhanced, making it feel more like a caffeinated friend.
GPT 40 can change its tone on command, such as being more dramatic or adopting a robotic voice.
A new feature allows the AI to analyze real-time visual input from a camera, giving it a form of 'vision'.
A new desktop app has been announced, enabling text and speech input, image uploads, and screen sharing for productivity.
The 'O' in GPT 40 signifies the integration of multimodal inputs into a single neural network, improving the richness of responses.
The Omni model processes voice, text, and vision together, capturing more emotional and tonal information than previous models.
The update aims to improve upon the shortcomings of the Humane AI pin, which was criticized for its slow response times.
The potential for customization of the voice in future updates is suggested as a smart move for user satisfaction.
The new model's capabilities are expected to significantly enhance productivity for computer-based tasks and research.
The integration of voice, text, and vision in GPT 40 is a major step forward in conversational AI technology.
The announcement raises curiosity about the upcoming response from Google, hinting at a competitive landscape in AI advancements.
The video presentation showcases the practical applications and potential impact of GPT 40 in a variety of use cases.
Transcripts
what's up Josh here so in case you
missed it open AI has just announced
chat GPT 40 which is their brand new
flagship model that is 2 times faster
and more capable than GPT 4 and good
news for all of us is going to be free
to use now GPT 4 was previously a $20
month subscription but now with 40 being
completely free uh we also get the
benefits of everything that we got with
gp4 there's Vision where you can upload
images and ask it questions about those
images there's also browse where can
scrub the internet for more real time
and upto-date data there's also memory
where it can actually remember facts
about you and then lastly there's
analyzing complex data so you can
actually give it like an Excel
spreadsheet and ask it questions about
that so all of those features are going
to be coming to 40 in the next couple of
weeks but yeah first of all let's just
start with everything that's going to be
new with GPT 40 so in the presentation
the most impressive part was obviously
the demo so they did a bunch of stuff uh
they asked it all kinds of questions
gave it math equations and asked it to
read Bedtime Stories and for the most
part I think the intelligence level and
like the answers it's giving is pretty
similar to the current GPT 4 which is
why I don't think they updated the name
to GPT 5 but surprisingly the biggest
updates of 40 actually come in the voice
feature hey chbt how are you doing I'm
doing fantastic thanks for asking how
about you pretty good what's up so my
friend um Barrett here he's been having
trouble sleeping lately and uh I want
you to tell him a bedtime story about
robots and love oh a bedtime story about
robots and love I got you covered so now
we have response times as quick as 232
milliseconds and with an average of 320
milliseconds which is sort of the
average human response rate of a
conversation you can also now just
interrupt the conversation simply by
speaking which I think is pretty
intuitive they even put this disclaimer
on the website that all of their videos
are played at one time speed because
previously there was such a delay that
that now it just seems like such a
drastic improvement so yeah clearly some
very impressive stuff here that they are
able to pull off just millisecs for a
response time and you know what I was
thinking the Humane AI pin really would
have benefited from GPT 4 with its
faster response times because it was
largely flamed online for how slow it
took to respond and it was running on
gp4 which was much slower who designed
the Washington Monument
but yeah that is the first thing that I
noticed is the speed but the second
thing you might have picked up on
already is the emotion behind the voice
how are
you I'm doing well thanks for asking how
about you hey chat PT how are you doing
I'm doing fantastic thanks for asking
how about you me the announcement is
about me well color me intrigued are you
about to reveal something about AI so it
seems like open AI has really just
dialed up the expressiveness and just
the overall energy of this assistant
which I'm not sure how I feel about it
just feels like you're talking to a
friend who is just overly caffeinated
and overly energized all of the time
which I think for an assistant should
just honestly be a little bit more
straightforward and straight up
hopefully in the future we can have the
option to customize the voice I think
that would be a smart move um but also
you can ask it to change its tone so in
the demo they asked it to be a little
bit more dramatic when reading a bedtime
story and they also asked it to read it
in a robotic voice I really want maximal
emotion like maximal expressiveness much
more than you were doing before
understood let's amplify the drama once
upon a time in a world not too different
from ours initiating dramatic robotic
voice and then also apparently the robot
can sing which I'll let you be the judge
of that and so bite found another robot
friend and they live circly Ever After
there's also a new feature that is sort
of a subset of vision which is uh being
able to take your camera and just
pointing at something and asking it
questions about that in real time sort
of like this beta test of giving the AI
eyes what do you
see a I see I love chat chpt that's so
sweet of you now as if all of that
wasn't enough they also announced a
brand new desktop app where you can do
do all of those same things like text
input speech input as well as upload
images but also on top of that uh you
can also screen share so you can have it
sort of just look at your screen and
whatever you're looking at you can ask
it questions now I think this is going
to be a huge productivity feature for
anybody who works on their computer a
lot in the demo they sort of showed how
it could analyze a graph that you're
looking at um but also I think it would
be really helpful for research purposes
uh and just I don't know there's just so
many use cases where I'm on the computer
and it would be nice to almost have a
conversational like assistant or someone
to bounce ideas off of I think that
would be really helpful all right make
sure I can see our screen can you find
which one is the hypotenuse oh okay I
see so um I think the hypotenuse is this
really long side from A to B would that
be
correct exactly well done now just to
quickly touch on what the O in 40
actually really is pointing to it's not
pointing to so much the fact that it's
omniscient or omnipotent but rather the
fact that it is taking your multimodal
inputs which is text speech and now
Vision all into the same neural network
whereas before it was processing those
uh separately so before with a voice
feature on 3.5 and 4 it would actually
take your voice and transcribe it into
text and so that's how it was
recognizing your input which basically
strips a lot of information from that
llm so all of your emotion and the tone
that would be captured in an audio
format is now just boiled down into text
so you can think of it like texting a
friend versus calling a friend so now
with a new Omni model it is sort of
taking all of those things into
consideration with their response but
yeah that is the latest update with open
AI clearly some very impressive stuff
cooking under the hood um I'm curious to
see what Google's going to come out with
uh tomorrow so definitely get subscribed
for that and that video is already out
it's probably on the screen somewhere
hope you enjoyed the video I'll catch
you guys in the next one peace
5.0 / 5 (0 votes)