GPT-4o - Full Breakdown + Bonus Details
Summary
TLDRGPT-4 Omni是OpenAI最新推出的人工智能模型,它在多个领域取得了显著进步,包括编码、多模态输入输出、以及在数学和语言理解方面的性能。该模型在用户数量上从100万扩展到数百万,提供了更高的消息限制,并暗示了即将推出的更智能模型。GPT-4 Omni在文本、图像和视频处理方面展现了高准确度,能够设计电影海报、进行实时语音交互、甚至在视频中识别和响应动作。尽管在某些逻辑推理测试中表现仍有提升空间,但GPT-4 Omni在翻译和多语言处理方面表现出色,有望为非英语使用者带来更快捷、更经济的交流体验。OpenAI的这一新模型不仅在技术上取得了进步,还可能通过免费提供给公众,进一步推动人工智能的普及。
Takeaways
- 🚀 GPT-4 Omni 被描述为在多个方面更智能、更便宜、更快,并且在编码、多模态输入输出方面表现更好,且发布时机恰好在谷歌之前,吸引了大量关注。
- 📈 GPT-4 Omni 的发布暗示了 OpenAI 对扩展用户基础的承诺,可能预示着不久将推出更智能的模型。
- 📊 在性能基准测试中,GPT-4 Omni 在数学和谷歌证明研究生测试中的表现超过了之前的 GPT 模型和其他竞争对手。
- 📸 GPT-4 Omni 展示了在图像和文本生成方面的高准确度,包括从图片生成文本和设计电影海报的能力。
- 🗣️ GPT-4 Omni 能够进行实时的语音交互,包括模仿人类客服的对话,这表明了其在自然语言处理方面的进步。
- 🎨 GPT-4 Omni 提供了多种创新功能,如根据照片生成卡通画、文本到新字体的转换、会议转录和视频摘要。
- 🌐 GPT-4 Omni 在多语言性能上有所提升,尽管英语仍然是最适合的语言,但其对非英语语言的支持也有所增强。
- 💻 OpenAI 推出了桌面应用程序,作为一个实时编程助手,这可能会改变开发者与代码交互的方式。
- 📉 GPT-4 Omni 在某些基准测试中表现混合,例如在对抗性阅读理解方面,它的表现略逊于其他模型。
- 📹 GPT-4 Omni 展示了视频输入功能,尽管反应时间不如音频输入那样即时,但这一功能仍然令人印象深刻。
- 🔄 GPT-4 Omni 的发布可能会极大地推动 AI 的普及,特别是它免费且多模态的特性可能会吸引数亿新用户。
- ⏰ OpenAI 强调了 GPT-4 Omni 在降低延迟方面的创新,这使得模型的响应时间更接近人类,提高了交互的真实感。
Q & A
GPT-4 Omni 相对于之前的模型有哪些改进?
-GPT-4 Omni 在多个方面进行了改进,包括更智能、更便宜、更快、编码能力更强,以及支持多模态输入输出。它还提供了更好的时机来抢占 Google 的风头。
GPT-4 Omni 在用户规模上有什么计划?
-GPT-4 Omni 计划从100万用户扩展到数亿用户,这表明 OpenAI 对于扩大用户基础有着极大的承诺,或者他们即将推出一个更智能的模型。
GPT-4 Omni 在文本生成的准确性上有哪些进步?
-GPT-4 Omni 在文本生成的准确性上有显著提升,尽管不是完美无缺,但已经达到了前所未有的水平。
GPT-4 Omni 是否能够设计电影海报?
-是的,GPT-4 Omni 能够根据给定的文本要求设计电影海报,并且在经过改进后的输出中,文本更清晰,颜色更鲜明,整体图像得到了提升。
GPT-4 Omni 的多模态功能包括哪些?
-GPT-4 Omni 的多模态功能包括文本、图像和视频的输入和输出,尽管当前模型还没有视频输出功能,但预计将在未来几周内发布。
GPT-4 Omni 在数学基准测试上的表现如何?
-GPT-4 Omni 在数学基准测试上的表现有显著提升,尽管它在某些数学提示上仍然失败,但与原始的 GPT-4 相比,这仍然是一个巨大的进步。
GPT-4 Omni 的定价策略是什么?
-GPT-4 Omni 的定价为每100万个输入令牌5美元,每100万个输出令牌15美元,并且提供了128k令牌的上下文长度。
GPT-4 Omni 在对抗性阅读理解(DROP)基准测试上的表现如何?
-GPT-4 Omni 在 DROP 基准测试上的表现略好于原始的 GPT-4,但略逊于 Llama 3400b,显示出它在推理能力上仍有提升空间。
GPT-4 Omni 在翻译和视觉理解评估上有哪些优势?
-GPT-4 Omni 在翻译方面比 Gemini 模型更好,并且在视觉理解评估上取得了实质性的进步,比 Claude Opus 高出10分。
GPT-4 Omni 在多语言性能上有哪些提升?
-GPT-4 Omni 在多语言性能上相比原始的 GPT-4 有了提升,尽管英语仍然是最适合的语言。模型的改进对非英语使用者来说可能是革命性的,因为对于像 Gujarati、Hindi、Arabic 等语言,所需的令牌数量大大减少。
GPT-4 Omni 的视频输入功能如何?
-GPT-4 Omni 的视频输入功能允许用户将视频直播直接传输到背后的 Transformer 架构,虽然 GPC-40 对视频的反应时间没有音频那么即时,但这一功能仍然令人印象深刻。
GPT-4 Omni 是否有可能实现实时翻译功能?
-GPT-4 Omni 展示了实时翻译的潜力,能够将英语和西班牙语之间的对话实时翻译,这预示着未来可能很快就会有实时翻译功能。
Outlines
🚀 GP4 Omni的多模态能力和性能提升
本段落讨论了GP4 Omni的多项改进,包括其在编码、多模态输入输出、以及与Google竞争中的优势。提及了GP4 Omni在不同基准测试中的表现,以及它在文本、图像和视频处理上的准确性和能力。还提到了OpenAI对于用户数量的扩展计划,以及即将发布的更智能模型的暗示。此外,还探讨了GP4 Omni在设计、客户服务模拟和多语言处理上的能力。
📈 GP4 Omni的基准测试和性能对比
这部分内容聚焦于GP4 Omni在各种基准测试中的表现,特别是在数学问题处理上的提升。同时,还对比了GP4 Omni与Claude 3 Opus等其他模型的性能,并讨论了GP4 Omni在成本效益方面的优势。此外,还提到了GP4 Omni在翻译、视觉理解、多语言处理上的进步,以及它在非英语语言上可能带来的革命性变化。
🎭 GP4 Omni的实时交互和应用场景
此段落展示了GP4 Omni在实时交互方面的能力,包括它在对话、声音模拟、面试准备、数学辅导和视频理解上的演示。强调了GP4 Omni在提供实时反馈和个性化互动方面的潜力,以及它在辅助视觉和听觉障碍人士方面的潜在影响。
🌐 GP4 Omni的普及和未来展望
最后这部分讨论了GP4 Omni对AI普及的潜在影响,包括它作为免费模型对吸引新用户的作用,以及它在文本和图像输入上的能力。还提到了OpenAI未来可能的更新和改进,以及GP4 Omni在实时翻译和多模态交互上的潜力。最后,还提到了GP4 Omni可能对其他AI公司,如Apple和Google的潜在影响。
Mindmap
Keywords
💡GPT-4 Omni
💡多模态
💡基准测试
💡实时演示
💡智能代理
💡桌面应用
💡延迟
💡知识截止日期
💡定价
💡推理能力
💡自然语言处理
Highlights
GPT-4 Omni 被描述为在多个方面更智能、更便宜、更快,并且在编码、多模态输入输出方面表现更好,且发布时机完美,有望从谷歌那里夺取焦点。
GPT-4 Omni 的命名暗示了其多模态特性,OpenAI 计划将其用户规模从100万扩展到数百万。
GPT-4 Omni 在文本、图像和视频的生成准确度上取得了显著进步,即使是在非演示的场合也能生成高度准确的文本。
GPT-4 Omni 能够根据文本要求设计电影海报,并且在经过改进后的输出中,文本更清晰,颜色更鲜明,整体图像质量得到提升。
GPT-4 Omni 将在接下来的几周内发布,为儿童和成人提供新的互动功能。
GPT-4 Omni 展示了模仿谷歌多年前的演示的能力,但谷歌并未继续发展该技术。
GPT-4 Omni 在数学基准测试上的表现显著优于原始的 GPT-4,尽管它在处理我的数学提示上几乎总是失败。
GPT-4 Omni 在 Google Proof Graduate Test 上超越了 Claude 3 Opus,这是 Anthropic 的主要基准测试。
GPT-4 Omni 的定价为每100万个输入令牌5美元,每100万个输出令牌15美元,相比之下,Claude 3 Opus 的定价为1575美元。
GPT-4 Omni 在 DROP 基准测试中的表现略好于原始的 GPT-4,但略逊于 Llama 3400b。
GPT-4 Omni 在翻译方面优于 Gemini 模型,尽管 Gemini 2 可能在明天宣布并可能重新获得领先地位。
GPT-4 Omni 在视觉理解评估中取得了显著进步,比 Claude Opus 高出10分。
GPT-4 Omni 对非英语语言的改进可能对非英语使用者来说是革命性的,因为它减少了对话所需的令牌数量,使对话更便宜、更快捷。
GPT-4 Omni 在多语言性能上相比原始的 GPT-4 有了提升,尽管英语仍然是最适合的语言。
GPT-4 Omni 的视频输入功能令人印象深刻,尽管 GPC-40 对视频的反应时间并不像音频那样即时。
GPT-4 Omni 能够产生多种声音,并且能够尝试和谐地唱歌。
GPT-4 Omni 能够实时翻译,预示着不久的将来可能会出现实时翻译功能。
GPT-4 Omni 的发布可能会吸引更多人使用 AI,即使它并不比之前的模型更智能。
GPT-4 Omni 现在可以在 OpenAI Playground 中通过文本和图像进行提示。
尽管 GPT-4 Omni 在某些推理基准测试上表现混杂,但它仍可能改变人们对 AI 的看法。
GPT-4 Omni 被认为将极大地提高 AI 的普及度,尤其是它作为目前可用的最智能模型,且在网络中免费提供。
Transcripts
it's smarter in most ways cheaper faster
better at coding multimodal in and out
and perfectly timed to steal the
spotlight from Google it's gp4 Omni I've
gone through all the benchmarks and the
release videos to give you the
highlights my first reaction was it's
more flirtatious sigh than AGI but a
notable step forward nonetheless first
things first GPT 40 meaning Omni which
is all or everywhere referencing the
different modalities it's got is Free by
making GPT 43 they are either crazy
committed to scaling up from 100 million
users to hundreds of millions of users
or they have an even smarter model
coming soon and they did hint at that of
course it could be both but it does have
to be something just giving paid users
five times more in terms of message
limits doesn't seem enough to me next
open AI branded this as GPT 4 level
intelligence although in a way I think
they slightly underplayed it so before
we get to the video demos some of which
you may have already seen let me get to
some more under the radar announcements
take text image and look at the accuracy
of the text generated from this prompt
now I know it's not perfect there aren't
two question marks on the now there's
others that you can spot like the I
being capitalized but overall I've never
seen text generated with that much
accuracy and it wasn't even in the demo
or take this other example where two
openai researchers submitted their
photos then they asked GPT 40 to design
a movie poster and they gave the
requirements in text now when you see
the first output you're going to say
well that isn't that good but then they
asked GPT 40 something fascinating it
seemed to be almost reverse psychology
because they said here is the same
poster but cleaned up the text is
crisper and the colors Bolder and more
dramatic the whole image is now improved
this is the input don't forget the final
result in terms of the accuracy of the
photos and of the text was really quite
impressive I can imagine millions of
children and adults playing about with
this functionality of course they can't
do so immediately because open AI said
this would be released in the next few
weeks as another bonus here is a video
that open AI didn't put on their YouTube
channel it mimics a demo that Google
made years ago but never followed up
with the openai employee asked GPT 40 to
call customer service and ask for
something I've skipped ahead and the
customer service in this case is another
AI but here is the conclusion could you
provide Joe's email address for me sure
it's Joe example.com
awesome all right I've just sent the
email can you check if Joe received it
we'll check right now please hold sure
thing Hey Joe could you please check
your email to see if the shipping label
and return instructions have arrived
fingers crossed yes I got the
instructions perfect Joe has received
the email they call it a proof of
concept but it is a hint toward the
agents that are coming here are five
more quick things that didn't make it to
the demo how about a replacement for
lensa submit your photo and get a
caricature of yourself or what about
text to new font you just ask for a new
style of font and it will generate one
or what about meeting transcription the
meeting in this case had four speakers
and it was transcribed or video
summaries remember this model is
multimodal in and out now it doesn't
have video out but I'll get to that in a
moment here though was a demonstration
of a 45-minute video submitted to GPC 40
and a summary of that video we also got
character consistency across both woman
and dog almost like an entire cartoon
strep if those were the quick bonuses
what about the actual intelligence and
performance of the model before I get to
official benchmarks here is a human
grade leaderboard pitting one model
against another and yes I'm also a good
gpt2 chatbot is indeed GPT 40 so it
turns out I've actually been testing the
model for days overall you can see the
preference for GPT 40 compared to all
other models in coding specifically the
difference is quite Stark I would say
even here though we're not looking at an
entirely new tier of intelligence
remember that a 100 ELO Gap is is a win
rate of around 2/3 so 1/3 of the time
gp4 turbos outputs would be preferred
that's about the same gap between GPT 4
Turbo and last year's GPT 4 a huge step
forward but not completely night and day
I think one underrated announcement was
the desktop app a live coding co-pilot
okay so I'm going to open the chbt
desktop app like miror was talking about
before okay and to give a bit of
background of what's going on so here we
have um a computer and on the screen we
have some code and then the chat gbt
voice app is on the right so chat gbt
will be able to hear me but it can't see
anything on the screen so I'm going to
highlight the code command C it and then
that will send it to chat GPT and then
I'm going to talk about the code to chat
GPT okay so I just shared some code with
you could you give me a really brief
one- sentence description of what's
going on in the code this code fetches
daily weather data for a specific
location and time period Smooths the
temperature data using a rolling average
and Ates a significant weather event on
the resulting plot and then displays the
plot with the average minimum and
maximum temperatures over the year I've
delayed long enough here are the
benchmarks I was most impressed with
Gypsy 40's performance on the math
benchmark even though it fails pretty
much all of my math prompts that is
still a stark improvement from the
original GPT 4 on the Google proof
graduate test it beats Claude 3 Opus and
remember that was the headline Benchmark
for anthropic in fact speaking of
anthropic they are somewhat challenged
by this release GPT 40 costs $5 per 1
million tokens input and $15 per 1
million tokens output as a quick aside
it also has 128k token context and an
October knowledge cut off but remember
the pricing 5 and 15 Claude 3 Opus is
1575 and remember for Claude 3 Opus on
the web you have to sign up with a
subscription but GPT 40 will be free so
for claw Opus to be beaten in its
headline Benchmark is a concern for them
in fact I think the results are clear
enough to say that gp40 is the new
smartest AI however just before you get
carried away and type on Twitter the AGI
is here there are some more mixed
benchmarks take the drop Benchmark I dug
into this Benchmark and it's about
adversarial reading comprehension
questions they're designed to really
test the reasoning capabilities of
models if you give models difficult
passages and they've got to sort through
references do some counting and other
operations how do they Fair the drop by
the way is discrete reasoning over the
content of paragraphs it does slightly
better than the original GPT 4 but
slightly worse than llama 3400b and as
they note llama 3400b is still training
so it's just about the new smartist
model by a hairs breath however we're
not done yet it's better at translation
than Gemini models quick caveat there
Gemini 2 might be announced tomorrow and
that could regain the lead then there
are the vision understanding evaluations
it was a real step forward on the mm muu
as you can see a clear 10 points better
than Claude Opus again I'm curious if
Google Gemini can exceed it though the
improvements to the tokenizer could be
revolutionary for non-english speakers
the dramatically fewer tokens needed for
languages like Gujarati Hindi Arabic and
more don't just mean that conversations
are cheaper they're also quicker and
what about multilingual performance well
this time they didn't compare it to
other models but compared it to the
original GPT 4 definitely a step up
across languages but English is still by
far the most suited language indeed here
is a video of some of the models
mistakes ending with some dodgy language
tuition I know enough Mandarin to say it
wasn't perfect at churing let's root
root root for the home
what was
that sorry guys I got carried
away right for
round
Francisco I have feeling I'm very wrong
hello uh my name is NCH I'm here with my
coworker hi I'm sh I'm trying to teach
my coworker how to speaking manding we
want to start from simple words like
niow can you teach him how to pronounce
that of course hey Nao nice to meet you
KNE how is pretty straightforward to
pronounce it sounds like KN how just
make sure to keep the KNE part high and
then go down and pitch on the how give
it a try me how
that was great really good first try not
you you're natural it really wasn't he
needs to work on his tones and her face
was the giveaway there were a lot of
other interesting video demonstrations
but before them the GPT 40 blog post
from Sam opman put out tonight he made
the argument that putting this capable
AI tool in the hands of everyone for
free is justifying that open part of
open AI less about open weights or open
source more about open to to everyone
without ads or anything like that he
also draws our attention to the latency
of the models in many ways that is the
key Innovation with GPT 40 by dialing
down the latency you dial up the realism
as he said it feels like AI from the
movies getting to human level response
times and expressiveness turns out to be
a big change indeed I think I should get
a little credit for predicting her like
AI in a video from a month ago but now I
want to get to those demos the response
times were amazing and the model was
clearly designed to be flirtatious note
that in Senate testimony last year
samman said we try to design systems
that do not maximize for engagement so
let me know in the comments whether you
think these demos are designed to
maximize engagement okay so this is what
I wrote down what do you
see a I see I love chat chpt that's so
sweet of
you yeah well I really appreciate all
the
help so yeah as we can see
um wow that's quite the outfit you've
got on yeah as we can see you know we
can chat in real time I was impressed
though that it could speed up its
talking on demand of course one two
three hey actually that's um that's a
little slow could you count
faster Sure Thing 1 2 3 4 5 6 7 8 9 10
okay that was a little bit too fast of
course open AI must be given credit
because for blind people this could be
revolutionary right now the Royal
Standard flag is flying above Buckingham
Palace which is the signal that the
Monarch is present how exciting to be
there while the king is home try and
tell me exactly what they're doing right
now please um right now the Ducks are
gently gliding across the water it will
also help you with interview prep by
analyzing your appearance how about you
I'm awesome list listen I got some huge
news oh do tell I'm allers well in a few
minutes I'm going to be interviewing at
open aai have you heard of them open AI
huh sounds Vaguely
Familiar kidding of course that's
incredible Rocky what kind of interview
well it's for a software engineering
role but I just need to know do I look
presentable
professional well
Rocky you definitely have the I've been
coding all night look down which could
actually work in your favor maybe just
run a hand through your hair or lean
into the Mad genis SP Your Enthusiasm is
what's really going to shine through I
don't have a lot of time so I'm just
going to throw this on what do you
think oh Rocky that's quite a statement
piece I I mean you you'll definitely
stand out though maybe not in the way
you're hoping for an interview okay I
got it I got it I could definitely see
GPT 40 being used for mathematics
tutoring here is a 30second clip but
notice also the slight glitchiness at
certain moments relative to angle Alpha
all right so I'm pretty sure this is the
angle Alpha right here
right cor correct now looking at the
triangle which side do you think is a
hypotonus
um remember the hyot I'm not totally
sure I think I think it might be this
one but I really am not sure this side
AC you're close I also noticed quite a
few glitches in the demo sounds cutting
in and out the one feature though the
open AI wants you to really focus on is
the video in capacity you can live
stream video straight to the Transformer
architecture behind GPT 40 and so the
video in functionality is certainly
impressive note though that the reaction
time of GPC 40 isn't quite as immediate
as for audio Greg Brockman the
co-founder of open aai has to prompt it
to react to the Intruder overall stylish
feel they might be getting ready for a
conversation or a presentation given
their direct engagement with the
camera sounds like we're in for an
intriguing interaction I'm curious about
the lighting you mentioned could you
describe it a bit more is it natural
light are we talking unique fixtures or
colors that add to the
atmosphere the lighting here seems to be
a mix of natural and artificial there's
a noticeable bright light overhead
probably a fixture casting a focused
beam downwards which creates a spotlight
effect this adds a dramatic and modern
feel to the scene the rest of the room
is lit more softly possibly by natural
light filtering anything did anything
unusual happen
recently Yes actually uh just as you
asked another person came into view
behind the first person they playfully
made bunny ears behind the first
person's head and then quickly left the
frame it added a light-hearted and
unexpected moment to the scene now the
focus is back on the original person
with the leather jacket nevertheless gbt
40 can produce multiple voices that can
sing almost in harmony
and really try to harmonize
here San Francisco San Francisco in the
month of May but maybe make it more
dramatic and make the soprano
higher San Francisco in the month of May
San franisco in the month of May it's a
Friday C may we are harmonizing are
Harmon great thank you and I suspect
this real time translation could soon be
coming too Siri later for us so every
time I say something in English can you
repeat it back in Spanish and every time
he says something in Spanish can you
repeat it back in English sure I can do
that let's get this translation train
rolling um hey how's it been going have
you been up to anything interesting
recently
hey I've been good just a bit busy here
preparing for an event next week why do
I say that because Bloomberg reported
two days ago that apple is nearing a
deal with open AI to put chat GPT on
iPhone and in case you're wondering
about GPT 4.5 or even five samman said
we'll have more stuff to share soon and
Mira murati in the official presentation
said that would be soon updating us on
progress on the next big thing whether
that's empty hype or real you can decide
no word of course about openai
co-founder ilas Sask although he was
listed as a contributor under additional
leadership overall I think this model
will be massively more popular even if
it isn't massively more intelligent you
can prompt the model now with text and
images in the open AI playground all the
links will be in the description note
also that all the demos you saw were in
real time at 1X speed that I think was a
nod to Google's botch demo of course
let's see tomorrow what Google replies
with to those who think that GPT 40 is a
huge dry towards AGI I would Point them
to the somewhat mixed results on the
reasoning benchmarks expect GPT 40 to
still suffer from a massive amount of
hallucinations to those though who think
that GPT 40 will change nothing I would
say this look at what chat GPT did to
the popularity of the underlying GPT
series it being a free and chatty model
brought a 100 million people into
testing AI GPT 40 being the smartest
model currently available and free on
the web and multimodal I think could
unlock AI for hundreds of millions more
people but of course only time will tell
if you want to analyze the announcement
even more do join me on the AI insiders
Discord via patreon we have live meetups
around the world and professional best
practice sharing so let me know what you
think and as always have a wonderful day
5.0 / 5 (0 votes)
Microsoft's New PHI-3 AI Turns Your iPhone Into an AI Superpower! (Game Changer!)
【人工智能】万字通俗讲解大语言模型内部运行原理 | LLM | 词向量 | Transformer | 注意力机制 | 前馈网络 | 反向传播 | 心智理论
Introducing GPT-4o
6款工具帮你自动赚钱,轻松上手帮你打开全新的收入渠道,赚钱效率高出100倍,用好这几款AI人工智能工具,你会发现赚钱从来没如此简单过
INSANE OpenAI News: GPT-4o and your own AI partner
Google Keynote (Google I/O ‘24)