Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters
Summary
TLDRIn a thought-provoking interview, the speaker, presumably Mark Zuckerberg, discusses the future of AI with a focus on Meta AI's advancements. He highlights the release of Llama-3, an open-source AI model integrated with Google and Bing for real-time knowledge, emphasizing its capabilities in image generation and natural language processing. Zuckerberg also addresses the challenges of building large-scale data centers, the risks of centralized AI control, and the importance of open-source contributions. He stresses the potential of AI to revolutionize various sectors, including science and healthcare, and shares his vision of AI as a tool that enhances human productivity rather than replacing it. The conversation delves into the implications of AI development, the balance between innovation and safety, and the significance of open-source software in democratizing AI technology.
Takeaways
- š¤ The new version of Meta AI, Llama-3, is set to be the most intelligent, freely-available AI assistant, integrating with Google and Bing for real-time knowledge and featuring enhanced creation capabilities like animations and real-time image generation.
- š Meta is training multiple versions of the Llama model, including an 8 billion parameter model released for the developer community and a 405 billion parameter model still in training, aiming to push the boundaries of AI capabilities.
- š The release of Llama-3 is not global but will start in a few countries, with plans for a wider rollout in the coming months, reflecting a strategic approach to introducing advanced AI technologies.
- š Mark Zuckerberg emphasizes the importance of open-source AI, believing it to be beneficial for the community and for Meta, allowing for broader innovation and a more level playing field in the AI industry.
- š”ļø There is a commitment to responsible AI development, with considerations for not releasing certain models if they present irresolvable negative behaviors or risks, highlighting a cautious approach to AI's potential downsides.
- āļø Meta is investing in custom silicon to improve the efficiency of AI model training and inference, which could significantly reduce costs and improve performance for their AI-driven services.
- š Zuckerberg shares his passion for building new things and his belief in the potential of AI to enable creativity and productivity, reflecting his personal drive and the company's mission.
- š® The potential of AI is compared to the creation of computing itself, suggesting a fundamental shift in how people work and live, with AI becoming an integral part of various industries and aspects of life.
- š” Open source contributions, such as PyTorch and React, are considered powerful drivers of innovation and have possibly had a significant impact on the world, potentially rivaling the reach of Meta's social media products.
- āļø There's a discussion on the balance of power in AI development, with concerns about the risks of having a single entity with disproportionately strong AI capabilities, advocating for a decentralized approach.
- š Zuckerberg draws an analogy between historical shifts in understanding, like the concept of peace under Augustus, and current paradigm shifts in technology and business models, emphasizing the importance of challenging conventional thinking.
Q & A
What is the main update to Meta AI that Mark Zuckerberg discusses in the interview?
-The main update is the rollout of Llama-3, an AI model that is both open source and will power Meta AI. It is considered the most intelligent, freely-available AI assistant at the time of the interview.
How does Meta AI integrate with other search engines?
-Meta AI integrates with Google and Bing for real-time knowledge, making it more prominent across apps like Facebook and Messenger.
What new creation features does Meta AI introduce?
-Meta AI introduces features like animations, where any image can be animated, and real-time high-quality image generation as users type their queries.
What are the technical specifications of the Llama-3 model that Mark Zuckerberg finds exciting?
-Mark Zuckerberg is excited about the Llama-3 model, which includes an 8 billion parameter model and a 70 billion parameter model. There's also a 405 billion parameter model in training.
What is the roadmap for future releases of Meta AI?
-The roadmap includes new releases that will bring multimodality, more multi-linguality, and bigger context windows. There are plans to roll out the 405B model later in the year.
How does Mark Zuckerberg perceive the risk of having a few companies controlling closed AI models?
-He sees it as a significant risk, as it could lead to these companies dictating what others can build, creating a situation similar to the control exerted by Apple over app features.
What is the strategy behind Meta's acquisition of GPUs like the H100?
-The strategy was to ensure they had enough capacity to build something they couldn't foresee on the horizon yet, doubling the order to be prepared for future needs beyond the immediate requirements for Reels and content ranking.
Why did Mark Zuckerberg decide not to sell Facebook in 2006 for $1 billion?
-Mark felt a deep conviction in what they were building and believed that if he sold the company, he would just build another similar one. He also lacked the financial sophistication to engage in the billion-dollar valuation debate.
What is the role of Facebook AI Research (FAIR) in the development of Meta's AI?
-FAIR, established about 10 years prior, has been instrumental in creating innovations that improved Meta's products. It transitioned from a pure research group to a key player in integrating AI into Meta's products, with the creation of the gen AI group.
How does Meta plan to approach the development of more advanced AI models like Llama-4?
-Meta plans to continue training larger models, incorporating more capabilities like reasoning and memory, and focusing on multimodality and emotional understanding. They aim to make AI more integrated into various aspects of their products and services.
What are the potential future challenges in scaling AI models?
-Challenges include physical constraints like energy limitations for training large models, regulatory hurdles for building new power plants and transmission lines, and the balance between open sourcing models and potential risks associated with them.
How does Mark Zuckerberg view the future of AI and its impact on society?
-He sees AI as a fundamental shift, similar to the creation of computing, that will enable new applications and experiences. However, he also acknowledges the need for careful consideration of risks and the importance of a balanced approach to AI development and deployment.
Outlines
š AI Innovation and Meta AI's New Features
The speaker expresses an inherent drive to continually innovate and build new features, despite challenges from entities like Apple. The conversation introduces Meta AI's latest advancements, highlighting the release of Llama-3, an open-source AI model that integrates with Google and Bing for real-time knowledge. New features include image animation and real-time high-quality image generation based on user queries. The speaker emphasizes Meta AI's commitment to making AI more accessible and enhancing its capabilities across various applications.
š¤ The Future of AI and Meta's Strategic Investments
The discussion delves into the strategic foresight behind Meta's investment in GPUs for AI model training. The speaker reflects on the importance of capacity planning for unforeseen technological advancements, drawing parallels with past decisions that have shaped the company's direction. The conversation also touches on the speaker's personal philosophy on company valuation and the significance of Facebook AI Research (FAIR) in driving product innovation.
š§ AGI and the Evolution of Meta's AI Strategy
The speaker outlines the evolution of Meta's approach to AI, from the inception of FAIR to the current focus on general AI (AGI). The importance of coding and reasoning in training AI models is emphasized, highlighting how these capabilities enhance the AI's performance across various domains. The conversation explores the concept of AI as a progressive tool that augments human capabilities rather than replacing them.
š Multimodal AI and the Future of Interaction
The speaker envisions a future where AI capabilities become more integrated and sophisticated, covering emotional understanding and multimodal interactions. The potential for personalized AI models and the impact of AI on industrial-scale operations are discussed. The conversation also addresses the idea of AI agents representing businesses and creators, and the importance of open-source AI in maintaining a balanced technological landscape.
š Scaling AI Models and Meta's Computational Challenges
The speaker discusses the challenges and strategies related to scaling AI models, including the physical and computational constraints of training large models like Llama-3. The conversation explores the concept of using inference to generate synthetic data for training and the potential for smaller, fine-tuned models to play a significant role in various applications. The speaker also addresses the importance of community contributions in advancing AI technology.
š The Impact of Open Source on AI and Technology
The speaker reflects on the impact of open-source contributions from Meta, such as PyTorch and React, and their potential long-term significance. The conversation considers whether open-source efforts could have a more profound impact than Meta's social media products, given their widespread use across the internet. The speaker also discusses the future integration of Llama models with custom silicon for more efficient training.
š¤ Navigating Open Source Risks and Future AI Developments
The speaker addresses concerns about the potential risks of open sourcing powerful AI models, including the possibility of misuse. The conversation focuses on the importance of balancing theoretical risks with practical, everyday harms, and the responsibility to mitigate these risks. The speaker also shares thoughts on the future of AI, including the potential for AI to become a commodified training resource and the economic considerations of open sourcing high-value models.
š The Value of Focus and Meta's Management Strategy
The speaker discusses the concept of focus as a scarce commodity, especially for large companies, and its importance in driving the company's success. The conversation touches on the challenges of managing multiple projects and the need to maintain a sharp focus on key priorities. The speaker also reflects on the unpredictability of success in technology and the importance of trying new things.
Mindmap
Keywords
š”AI Assistant
š”Open Source
š”Data Center
š”Parameter
š”Multimodality
š”Benchmark
š”Inference
š”Meta AI
š”Training Cluster
š”Content Risks
š”Economic Constraints
Highlights
Meta AI is releasing an upgraded model called Llama-3, which is set to be the most intelligent, freely-available AI assistant.
Llama-3 will be available as open source for developers and will also power Meta AI, integrating with Google and Bing for real-time knowledge.
New creation features have been added, including the ability to animate any image and generate high-quality images in real time as you type your query.
Meta AI's new version is initially rolling out in a few countries, with plans for broader availability in the coming weeks and months.
Technically, Llama-3 comes in three versions: an 8 billion parameter model, a 70 billion parameter model released today, and a 405 billion parameter model still in training.
The 70 billion parameter model of Llama-3 has scored highly on benchmarks for math and reasoning, while the 405 billion parameter model is expected to lead in benchmarks upon completion.
Meta has a roadmap for future releases that include multimodality, more multilinguality, and larger context windows.
The decision to invest in GPUs for AI was driven by the need for more capacity to train models for content recommendation in services like Reels.
The capability of showing content from unconnected sources on platforms like Instagram and Facebook represents a significant unlock for user engagement.
The importance of open source in AI development, ensuring a balanced and competitive ecosystem, and the potential risks of concentrated AI power.
The potential for AI to surpass human intelligence in most domains progressively, and the focus on capabilities like emotional understanding and reasoning.
Meta's commitment to addressing the risks of misinformation and the importance of building AI systems to combat adversarial uses.
The vision of AI as a tool that enhances human capabilities rather than replacing them, aiming for increased productivity and creativity.
The significance of the metaverse in enabling realistic digital presence and its potential impact on socializing, working, and various industries.
Mark Zuckerberg's personal drive to continuously build new things and the philosophy behind investing in large-scale projects like AI and the metaverse.
The historical perspective on the development of peace and economy, drawing parallels to modern innovations in tech and the concept of open source.
The potential for custom silicon to revolutionize the training of large AI models and the strategic move to first optimize inference processes.
Transcripts
That's not even a question for me - whetherĀ we're going to go take a swing at buildingĀ Ā
the next thing. I'm just incapable of not doingĀ that. There's a bunch of times when we wanted toĀ Ā
launch features and then Apple's just likeĀ nope you're not launching that I was likeĀ Ā
that sucks. Are we set up for that with AI whereĀ you're going to get a handful of companies thatĀ Ā
run these closed models that are going to be inĀ control of the apis and therefore are going to beĀ Ā
able to tell you what you can build? Then whenĀ you start getting into building a data centerĀ Ā
that's like 300 Megawatts or 500 Megawatts or aĀ Gigawatt - just no one has built a single GigawattĀ Ā
data center yet. From wherever you sit there'sĀ going to be some actor who you don't trust - ifĀ Ā
they're the ones who have the super strong AI IĀ think that that's potentially a much bigger risk
Mark, welcome to the podcast. Thanks for having me. Big fan of your podcast.Ā
Thank you, that's very nice of you to say.Ā Let's start by talking about the releasesĀ Ā
that will go out when this interviewĀ goes out. Tell me about the models andĀ Ā
Meta AI. Whatās new and exciting about them? I think the main thing that most people in theĀ Ā
world are going to see is the new version ofĀ Meta AI. The most important thing that we'reĀ Ā
doing is the upgrade to the model. We'reĀ rolling out Llama-3. We're doing it bothĀ Ā
as open source for the dev community and it isĀ now going to be powering Meta AI. There's a lotĀ Ā
that I'm sure we'll get into around Llama-3,Ā but I think the bottom line on this is thatĀ Ā
we think now that Meta AI is the most intelligent,Ā freely-available AI assistant that people can use.Ā Ā
We're also integrating GoogleĀ and Bing for real-time knowledge.Ā
We're going to make it a lot more prominent acrossĀ our apps. At the top of Facebook and Messenger,Ā Ā
you'll be able to just use the search box rightĀ there to ask any question. There's a bunch of newĀ Ā
creation features that we added that I think areĀ pretty cool and that I think people will enjoy.Ā Ā
I think animations is a good one. You canĀ basically take any image and just animate it.Ā
One that people are going to find pretty wildĀ is that it now generates high quality imagesĀ Ā
so quickly that it actually generates it asĀ you're typing and updates it in real time.Ā Ā
āØSo you're typing your query and it's honingĀ in. Itās like āshow me a picture of a cow inĀ Ā
a field with mountains in the background, eatingĀ macadamia nuts, drinking beerā and it's updatingĀ Ā
the image in real time. It's pretty wild. IĀ think people are going to enjoy that. So IĀ Ā
think that's what most people are going to see inĀ the world. We're rolling that out, not everywhere,Ā Ā
but we're starting in a handful of countries andĀ we'll do more over the coming weeks and months.Ā Ā
I think thatās going to be a pretty big dealĀ and I'm really excited to get that in people'sĀ Ā
hands. It's a big step forward for Meta AI. But I think if you want to get under the hoodĀ Ā
a bit, the Llama-3 stuff is obviously the mostĀ technically interesting. We're training threeĀ Ā
versions: an 8 billion parameter model and a 70Ā billion, which we're releasing today, and a 405Ā Ā
billion dense model, which is still training. SoĀ we're not releasing that today, but I'm prettyĀ Ā
excited about how the 8B and the 70B turned out.Ā They're leading for their scale. We'll release aĀ Ā
blog post with all the benchmarks so people canĀ check it out themselves. Obviously it's openĀ Ā
source so people get a chance to play with it. We have a roadmap of new releases coming thatĀ Ā
are going to bring multimodality, moreĀ multi-linguality, and bigger contextĀ Ā
windows as well. Hopefully, sometime later in theĀ year we'll get to roll out the 405B. For where itĀ Ā
is right now in training, it is alreadyĀ at around 85 MMLU and we expect that it'sĀ Ā
going to have leading benchmarks on a bunch of theĀ benchmarks. I'm pretty excited about all of that.Ā Ā
The 70 billion is great too. We're releasing thatĀ today. It's around 82 MMLU and has leading scoresĀ Ā
on math and reasoning. I think just getting thisĀ in people's hands is going to be pretty wild. āØĀ
Oh, interesting. That's the first Iām hearingĀ of it as a benchmark. That's super impressive.Ā
The 8 billion is nearly as powerful as theĀ biggest version of Llama-2 that we released.Ā Ā
So the smallest Llama-3 is basicallyĀ as powerful as the biggest Llama-2.Ā
Before we dig into these models, I want to goĀ back in time. I'm assuming 2022 is when youĀ Ā
started acquiring these H100s, or you can tell meĀ when. The stock price is getting hammered. PeopleĀ Ā
are asking what's happening with all thisĀ capex. People aren't buying the metaverse.Ā Ā
Presumably you're spending that capex to getĀ these H100s. How did you know back then to get theĀ Ā
H100s? How did you know that youād need the GPUs? I think it was because we were working on Reels.Ā Ā
We always want to have enough capacity to buildĀ something that we can't quite see on the horizonĀ Ā
yet. We got into this position with Reels where weĀ needed more GPUs to train the models. It was thisĀ Ā
big evolution for our services. Instead of justĀ ranking content from people or pages you follow,Ā Ā
we made this big push to start recommending whatĀ we call unconnected content, content from peopleĀ Ā
or pages that you're not following. āØ The corpus of content candidates thatĀ Ā
we could potentially show you expanded fromĀ on the order of thousands to on the order ofĀ Ā
hundreds of millions. It needed a completelyĀ different infrastructure. We started workingĀ Ā
on doing that and we were constrained onĀ the infrastructure in catching up to whatĀ Ā
TikTok was doing as quickly as we wanted to. IĀ basically looked at that and I was like āhey,Ā Ā
we have to make sure that we're never in thisĀ situation again. So let's order enough GPUs to doĀ Ā
what we need to do on Reels and ranking contentĀ and feed. But let's also double that.ā Again,Ā Ā
our normal principle is that there's going to beĀ something on the horizon that we can't see yet.Ā
Did you know it would be AI? We thought it was going to be something thatĀ Ā
had to do with training large models. At the timeĀ I thought it was probably going to be somethingĀ Ā
that had to do with content. Itās just the patternĀ matching of running the company, there's alwaysĀ Ā
another thing. At that time I was so deep intoĀ trying to get the recommendations working forĀ Ā
Reels and other content. Thatās just such a bigĀ unlock for Instagram and Facebook now, beingĀ Ā
able to show people content that's interesting toĀ them from people that they're not even following.Ā
But that ended up being a very good decisionĀ in retrospect. And it came from being behind.Ā Ā
It wasn't like āoh, I was so far ahead.āĀ Actually, most of the times where we makeĀ Ā
some decision that ends up seeming goodĀ is because we messed something up beforeĀ Ā
and just didn't want to repeat the mistake. This is a total detour, but I want to askĀ Ā
about this while we're on this. We'll get backĀ to AI in a second. In 2006 you didn't sell forĀ Ā
$1 billion but presumably there's some amount youĀ would have sold for, right? Did you write downĀ Ā
in your head like āI think the actual valuationĀ of Facebook at the time is this and they're notĀ Ā
actually getting the valuation rightā? If theyādĀ offered you $5 trillion, of course you would haveĀ Ā
sold. So how did you think about that choice? āØ I think some of these things are just personal.Ā Ā
I don't know that at the time I was sophisticatedĀ enough to do that analysis. I had all these peopleĀ Ā
around me who were making all these arguments forĀ a billion dollars like āhere's the revenue thatĀ Ā
we need to make and here's how big we need to be.Ā It's clearly so many years in the future.ā It wasĀ Ā
very far ahead of where we were at the time. IĀ didn't really have the financial sophisticationĀ Ā
to really engage with that kind of debate. Deep down I believed in what we were doing.Ā Ā
āØI did some analysis like āwhat would I do if IĀ werenāt doing this? Well, I really like buildingĀ Ā
things and I like helping people communicate. IĀ like understanding what's going on with people andĀ Ā
the dynamics between people. So I think if I soldĀ this company, I'd just go build another companyĀ Ā
like this and I kind of like the one I have.Ā So why?ā I think a lot of the biggest bets thatĀ Ā
people make are often just based on conviction andĀ values. It's actually usually very hard to do theĀ Ā
analyses trying to connect the dots forward. You've had Facebook AI Research for a longĀ Ā
time. Now it's become seemingly central toĀ your company. At what point did making AGI,Ā Ā
or however you consider that mission,Ā become a key priority of what Meta is doing?Ā
It's been a big deal for a while. We startedĀ FAIR about 10 years ago. The idea was that,Ā Ā
along the way to general intelligence or whateverĀ you wanna call it, there are going to be all theseĀ Ā
different innovations and that's going toĀ just improve everything that we do. So weĀ Ā
didn't conceive of it as a product. It wasĀ more of a research group. Over the last 10Ā Ā
years it has created a lot of different thingsĀ that have improved all of our products. ItāsĀ Ā
advanced the field and allowed other people inĀ the field to create things that have improved ourĀ Ā
products too. I think that that's been great. There's obviously a big change in the lastĀ Ā
few years with ChatGPT and the diffusionĀ models around image creation coming out.Ā Ā
This is some pretty wild stuff that isĀ pretty clearly going to affect how peopleĀ Ā
interact with every app that's out there. At thatĀ point we started a second group, the gen AI group,Ā Ā
with the goal of bringing that stuff into ourĀ products and building leading foundation modelsĀ Ā
that would power all these different products. āØ When we started doing that the theory initiallyĀ Ā
was that a lot of the stuff we're doing isĀ pretty social. It's helping people interactĀ Ā
with creators, helping people interact withĀ businesses, helping businesses sell things orĀ Ā
do customer support. Thereās also basic assistantĀ functionality, whether it's for our apps or theĀ Ā
smart glasses or VR. So it wasn't completelyĀ clear at first that you were going to need fullĀ Ā
AGI to be able to support those use cases. But inĀ all these subtle ways, through working on them,Ā Ā
I think it's actually become clear that you do.Ā For example, when we were working on Llama-2,Ā Ā
we didn't prioritize coding because peopleĀ aren't going to ask Meta AI a lot of codingĀ Ā
questions in WhatsApp. Now they will, right?Ā
I don't know. I'm not sure that WhatsApp, orĀ Facebook or Instagram, is the UI where people areĀ Ā
going to be doing a lot of coding questions. MaybeĀ the website, meta.ai, that weāre launching. ButĀ Ā
the thing that has been a somewhat surprisingĀ result over the last 18 months is that it turnsĀ Ā
out that coding is important for a lot of domains,Ā not just coding. Even if people aren't askingĀ Ā
coding questions, training the models on codingĀ helps them become more rigorous in answering theĀ Ā
question and helps them reason across a lot ofĀ different types of domains. That's one exampleĀ Ā
where for Llama-3, we really focused on trainingĀ it with a lot of coding because that's goingĀ Ā
to make it better on all these things even ifĀ people aren't asking primarily coding questions.Ā
Reasoning is another example. Maybe you wantĀ to chat with a creator or you're a business andĀ Ā
you're trying to interact with a customer.Ā That interaction is not just like āokay,Ā Ā
the person sends you a message and youĀ just reply.ā It's a multi-step interactionĀ Ā
where you're trying to think through āhow do IĀ accomplish the person's goals?ā A lot of timesĀ Ā
when a customer comes, they don't necessarilyĀ know exactly what they're looking for or howĀ Ā
to ask their questions. So it's not really theĀ job of the AI to just respond to the question.Ā
You need to kind of think about itĀ more holistically. It really becomesĀ Ā
a reasoning problem. So if someone else solvesĀ reasoning, or makes good advances on reasoning,Ā Ā
and we're sitting here with a basic chat bot,Ā then our product is lame compared to what otherĀ Ā
people are building. At the end of the day, weĀ basically realized we've got to solve generalĀ Ā
intelligence and we just upped the ante and theĀ investment to make sure that we could do that.Ā
So the version ofāØLlama that's going to solveĀ all these use cases for users, is that theĀ Ā
version that will be powerful enough to replaceĀ a programmer you might have in this building?Ā
I just think that all this stuff isĀ going to be progressive over time. āØĀ
But in the end case: Llama-10. I think that there's a lot bakedĀ Ā
into that question. I'm not sure that we'reĀ replacing people as much as weāre givingĀ Ā
people tools to do more stuff. Is the programmer in this buildingĀ Ā
10x more productive after Llama-10? āØ I would hope more. I don't believe thatĀ Ā
there's a single threshold of intelligence forĀ humanity because people have different skills.Ā Ā
I think that at some point AI is probably going toĀ surpass people at most of those things, dependingĀ Ā
on how powerful the models are. But I think it'sĀ progressive and I don't think AGI is one thing.Ā Ā
You're basically adding different capabilities.Ā Multimodality is a key one that we're focused onĀ Ā
now, initially with photos and images and text butĀ eventually with videos. Because we're so focusedĀ Ā
on the metaverse, 3D type stuff is importantĀ too. One modality that I'm pretty focused on,Ā Ā
that I haven't seen as many other people in theĀ industry focus on, is emotional understanding. SoĀ Ā
much of the human brain is just dedicatedĀ to understanding people and understandingĀ Ā
expressions and emotions. I think that'sĀ its own whole modality, right? You couldĀ Ā
say that maybe it's just video or image, but it'sĀ clearly a very specialized version of those two.Ā
So there are all these different capabilitiesĀ that you want to train the models to focusĀ Ā
on, in addition to getting a lot better atĀ reasoning and memory, which is its own wholeĀ Ā
thing. I don't think in the future we're going toĀ be primarily shoving things into a query contextĀ Ā
window to ask more complicated questions. ThereĀ will be different stores of memory or differentĀ Ā
custom models that are more personalized toĀ people. These are all just different capabilities.Ā Ā
Obviously then thereās making them big and small.Ā We care about both. If you're running somethingĀ Ā
like Meta AI, that's pretty server-based. We alsoĀ want it running on smart glasses and there's notĀ Ā
a lot of space in smart glasses. So you want toĀ have something that's very efficient for that.Ā
If you're doing $10Bs worth ofĀ inference or even eventually $100Bs,Ā Ā
if you're using intelligence in an industrialĀ scale what is the use case? Is it simulations?Ā Ā
Is it the AIs that will be in the metaverse?Ā What will we be using the data centers for?Ā
Our bet is that it's going to basically changeĀ all of the products. I think that there's goingĀ Ā
to be a kind of Meta AI general assistantĀ product. I think that that will shift fromĀ Ā
something that feels more like a chatbot, whereĀ you ask a question and it formulates an answer,Ā Ā
to things where you're giving it more complicatedĀ tasks and then it goes away and does them. That'sĀ Ā
going to take a lot of inference and it's goingĀ to take a lot of compute in other ways too.Ā
Then I think interacting with other agents forĀ other people is going to be a big part of whatĀ Ā
we do, whether it's for businesses or creators. AĀ big part of my theory on this is that there's notĀ Ā
going to be just one singular AI that you interactĀ with. Every business is going to want an AI thatĀ Ā
represents their interests. They're not going toĀ want to primarily interact with you through an AIĀ Ā
that is going to sell their competitorsā products. I think creators is going to be a big one. ThereĀ Ā
are about 200 million creators on our platforms.Ā They basically all have the pattern where theyĀ Ā
want to engage their community but they're limitedĀ by the hours in the day. Their community generallyĀ Ā
wants to engage them, but they don't know thatĀ they're limited by the hours in the day. IfĀ Ā
you could create something where that creatorĀ can basically own the AI, train it in the wayĀ Ā
they want, and engage their community, I thinkĀ that's going to be super powerful. There's goingĀ Ā
to be a ton of engagement across all these things. These are just the consumer use cases. My wife andĀ Ā
I run our foundation, Chan Zuckerberg Initiative.Ā We're doing a bunch of stuff on science andĀ Ā
there's obviously a lot of AI work that is goingĀ to advance science and healthcare and all theseĀ Ā
things. So it will end up affecting basicallyĀ every area of the products and the economy.Ā
You mentioned AI that can just go out and doĀ something for you that's multi-step. Is thatĀ Ā
a bigger model? With Llama-4 for example, willĀ there still be a version that's 70B but you'llĀ Ā
just train it on the right data and that willĀ be super powerful? What does the progressionĀ Ā
look like? Is it scaling? Is it just the same sizeĀ but different banks like you were talking about?Ā
I don't know that we know the answer to that. IĀ think one thing that seems to be a pattern is thatĀ Ā
you have the Llama model and then you build someĀ kind of other application specific code around it.Ā Ā
Some of it is the fine-tuning for the use case,Ā but some of it is, for example, logic for howĀ Ā
Meta AI should work with tools like Google or BingĀ to bring in real-time knowledge. That's not partĀ Ā
of the base Llama model. For Llama-2, we had someĀ of that and it was a little more hand-engineered.Ā Ā
Part of our goal for Llama-3 was to bring moreĀ of that into the model itself. For Llama-3,Ā Ā
as we start getting into more of these agent-likeĀ behaviors, I think some of that is going to beĀ Ā
more hand-engineered. Our goal for Llama-4Ā will be to bring more of that into the model.Ā
At each step along the way you have a sense ofĀ what's going to be possible on the horizon. YouĀ Ā
start messing with it and hacking around it. IĀ think that helps you then hone your intuitionĀ Ā
for what you want to try to train into the nextĀ version of the model itself. That makes it moreĀ Ā
general because obviously for anything that you'reĀ hand-coding you can unlock some use cases, butĀ Ā
it's just inherently brittle and non-general. āØ When you say āinto the model itself,ā you train itĀ Ā
on the thing that you want in the model itself?Ā What do you mean by āinto the model itselfā?Ā
For Llama- 2, the tool use was very specific,Ā whereas Llama-3 has much better tool use. WeĀ Ā
don't have to hand code all the stuff to haveĀ it use Google and go do a search. It can just doĀ Ā
that. Similarly for coding and running code andĀ a bunch of stuff like that. Once you kind of getĀ Ā
that capability, then you get a peek at what weĀ can start doing next. We don't necessarily wantĀ Ā
to wait until Llama-4 is around to start buildingĀ those capabilities, so we can start hacking aroundĀ Ā
it. You do a bunch of hand coding and thatĀ makes the products better, if only for theĀ Ā
interim. That helps show the way then of what weĀ want to build into the next version of the model.Ā
What is the community fine tune of Llama-3Ā that you're most excited for? Maybe not theĀ Ā
one that will be most useful to you, but theĀ one you'll just enjoy playing with the most.Ā Ā
They fine-tune it on antiquity andĀ you'll just be talking to VirgilĀ Ā
or something. What are you excited about? I think the nature of the stuff is that youĀ Ā
get surprised. Any specific thing that I thoughtĀ would be valuable, we'd probably be building. IĀ Ā
think you'll get distilled versions. IĀ think you'll get smaller versions. OneĀ Ā
thing is that I think 8B isnāt quite smallĀ enough for a bunch of use cases. Over time I'dĀ Ā
love to get a 1-2B parameter model, or even a 500MĀ parameter model and see what you can do with that.Ā
If with 8B parameters weāre nearly asĀ powerful as the largest Llama-2 model,Ā Ā
then with a billion parameters you should be ableĀ to do something that's interesting, and faster.Ā Ā
Itād be good for classification, or a lot ofĀ basic things that people do before understandingĀ Ā
the intent of a user query and feeding itĀ to the most powerful model to hone in onĀ Ā
what the prompt should be. I think that's oneĀ thing that maybe the community can help fillĀ Ā
in. We're also thinking about getting around toĀ distilling some of these ourselves but right nowĀ Ā
the GPUs are pegged training the 405B. āØ So you have all these GPUs. I think youĀ Ā
said 350,000 by the end of the year. āØ That's the whole fleet. We built two,Ā Ā
I think 22,000 or 24,000 clusters that are theĀ single clusters that we have for training the bigĀ Ā
models, obviously across a lot of the stuff thatĀ we do. A lot of our stuff goes towards trainingĀ Ā
Reels models and Facebook News Feed and InstagramĀ Feed. Inference is a huge thing for us because weĀ Ā
serve a ton of people. Our ratio of inferenceĀ compute required to training is probably muchĀ Ā
higher than most other companies that are doingĀ this stuff just because of the sheer volume ofĀ Ā
the community that we're serving. In the material they shared withĀ Ā
me before, it was really interesting that youĀ trained it on more data than is compute optimalĀ Ā
just for training. The inference is such a bigĀ deal for you guys, and also for the community,Ā Ā
that it makes sense to just have this thingĀ and have trillions of tokens in there.Ā
Although one of the interestingĀ things about it, even with the 70B,Ā Ā
is that we thought it would get more saturated. WeĀ trained it on around 15 trillion tokens. I guessĀ Ā
our prediction going in was that it was goingĀ to asymptote more, but even by the end it wasĀ Ā
still learning.āØWe probably could have fed it moreĀ tokens and it would have gotten somewhat better.Ā
At some point you're running a company and youĀ need to do these meta reasoning questions. Do IĀ Ā
want to spend our GPUs on training the 70B modelĀ further? Do we want to get on with it so we canĀ Ā
start testing hypotheses for Llama-4? We neededĀ to make that call and I think we got a reasonableĀ Ā
balance for this version of the 70B. There'llĀ be others in the future, the 70B multimodal one,Ā Ā
that'll come over the next period. But thatĀ was fascinating that the architectures atĀ Ā
this point can just take so much data. That's really interesting. What does thisĀ Ā
imply about future models? You mentioned thatĀ the Llama-3 8B is better than the Llama-2 70B.Ā
No, no, it's nearly as good.Ā I donāt want to overstateĀ Ā
it. Itās in a similar order of magnitude. Does that mean the Llama-4 70B will beĀ Ā
as good as the Llama-3 405B? WhatĀ does the future of this look like?Ā
This is one of the great questions, right? I thinkĀ no one knows. One of the trickiest things in theĀ Ā
world to plan around is an exponentialĀ curve. How long does it keep going for?Ā Ā
I think it's likely enough that we'll keep going.Ā I think itās worth investing the $10Bs or $100B+Ā Ā
in building the infrastructure and assuming thatĀ if it keeps going you're going to get some reallyĀ Ā
amazing things that are going to make amazingĀ products. I don't think anyone in the industryĀ Ā
can really tell you that it will continue scalingĀ at that rate for sure. In general in history,Ā Ā
you hit bottlenecks at certain points.Ā Now there's so much energy on this thatĀ Ā
maybe those bottlenecks get knocked over prettyĀ quickly. I think thatās an interesting question.āØĀ
What does the world look like where there aren'tĀ these bottlenecks? Suppose progress just continuesĀ Ā
at this pace, which seems plausible.Ā Zooming out and forgetting about Llamasā¦Ā
Well, there are going to be different bottlenecks.Ā Over the last few years, I think there was thisĀ Ā
issue of GPU production. Even companies that hadĀ the money to pay for the GPUs couldn't necessarilyĀ Ā
get as many as they wanted because there were allĀ these supply constraints. Now I think that's sortĀ Ā
of getting less. So you're seeing a bunch ofĀ companies thinking now about investing a lotĀ Ā
of money in building out these things. I thinkĀ that that will go on for some period of time.Ā Ā
There is a capital question. At what point doesĀ it stop being worth it to put the capital in?Ā
I actually think before we hit that, you'reĀ going to run into energy constraints. I don'tĀ Ā
think anyone's built a gigawatt single trainingĀ cluster yet. You run into these things that justĀ Ā
end up being slower in the world. Getting energyĀ permitted is a very heavily regulated governmentĀ Ā
function. You're going from software, whichĀ is somewhat regulated and I'd argue itās moreĀ Ā
regulated than a lot of people in the techĀ community feel. Obviously itās different ifĀ Ā
you're starting a small company, maybe youĀ feel that less. We interact with differentĀ Ā
governments and regulators and we have lotsĀ of rules that we need to follow and make sureĀ Ā
we do a good job with around the world. ButĀ I think that there's no doubt about energy.Ā
If you're talking about building large newĀ power plants or large build-outs and thenĀ Ā
building transmission lines that cross otherĀ private or public land, thatās just a heavilyĀ Ā
regulated thing. You're talking about manyĀ years of lead time. If we wanted to stand upĀ Ā
some massive facility, powering that is a veryĀ long-term project. I think people do it but IĀ Ā
don't think this is something that can be quiteĀ as magical as just getting to a level of AI,Ā Ā
getting a bunch of capital and putting it in, andĀ then all of a sudden the models are just going toā¦Ā Ā
You do hit different bottlenecks along the way. Is there something, maybe an AI-related project orĀ Ā
maybe not, that even a company like Meta doesn'tĀ have the resources for? Something where if yourĀ Ā
R&D budget or capex budget were 10x what it isĀ now, then you could pursue it? Something thatāsĀ Ā
in the back of your mind but with Meta today,Ā you can't even issue stock or bonds for it?Ā Ā
It's just like 10x bigger than your budget? I think energy is one piece. I think weĀ Ā
would probably build out bigger clusters than weĀ currently can if we could get the energy to do it.Ā
That's fundamentally money-bottleneckedĀ in the limit? If you had $1 trillionā¦Ā
I think itās time. It depends on how far theĀ exponential curves go. Right now a lot ofĀ Ā
data centers are on the order of 50 megawatts orĀ 100MW, or a big one might be 150MW. Take a wholeĀ Ā
data center and fill it up with all the stuffĀ that you need to do for training and you buildĀ Ā
the biggest cluster you can. I think a bunchĀ of companies are running at stuff like that.Ā
But when you start getting into building aĀ data center that's like 300MW or 500MW or 1 GW,Ā Ā
no one has built a 1GW data center yet. I thinkĀ it will happen. This is only a matter of time butĀ Ā
it's not going to be next year. Some of theseĀ things will take some number of years to buildĀ Ā
out. Just to put this in perspective, I think aĀ gigawatt would be the size of a meaningful nuclearĀ Ā
power plant only going towards training a model. āØ Didn't Amazon do this? They have a 950MWāĀ
I'm not exactly sure what theyĀ did. You'd have to ask them. āØĀ
But it doesnāt have to be in theĀ same place, right? If distributedĀ Ā
training works, it can be distributed. Well, I think that is a big question, howĀ Ā
that's going to work. It seems quite possible thatĀ in the future, more of what we call training forĀ Ā
these big models is actually more along the linesĀ of inference generating synthetic data to then goĀ Ā
feed into the model. I don't know what that ratioĀ is going to be but I consider the generation ofĀ Ā
synthetic data to be more inference than trainingĀ today. Obviously if you're doing it in orderĀ Ā
to train a model, it's part of the broaderĀ training process. So that's an open question,Ā Ā
the balance of that and how that plays out. Would that potentially also be the case withĀ Ā
Llama-3, and maybe Llama-4 onwards? As in, youĀ put this out and if somebody has a ton of compute,Ā Ā
then they can just keep making these thingsĀ arbitrarily smarter using the models thatĀ Ā
you've put out. Letās say thereās someĀ random country, like Kuwait or the UAE,Ā Ā
that has a ton of compute and they can actuallyĀ just use Llama-4 to make something much smarter.Ā
I do think there are going to beĀ dynamics like that, but I also thinkĀ Ā
there is a fundamental limitation on the modelĀ architecture. I think like a 70B model that weĀ Ā
trained with a Llama-3 architecture can getĀ better, it can keep going. As I was saying,Ā Ā
we felt that if we kept on feeding it more dataĀ or rotated the high value tokens through again,Ā Ā
then it would continue getting better. We'veĀ seen a bunch of different companies aroundĀ Ā
the world basically take the Llama-2 70B modelĀ architecture and then build a new model. But it'sĀ Ā
still the case that when you make a generationalĀ improvement to something like the Llama-3 70B orĀ Ā
the Llama-3 405B, there isnāt anything likeĀ that open source today. I think that's a bigĀ Ā
step function. What people are going to be able toĀ build on top of that I think canāt go infinitelyĀ Ā
from there. There can be some optimization inĀ that until you get to the next step function.Ā
Let's zoom out a little bit from specificĀ models and even the multi-year lead timesĀ Ā
you would need to get energy approvals and soĀ on. Big picture, what's happening with AI theseĀ Ā
next couple of decades? Does it feel likeĀ another technology like the metaverse orĀ Ā
social, or does it feel like a fundamentallyĀ different thing in the course of human history?Ā
I think it's going to be pretty fundamental. IĀ think it's going to be more like the creationĀ Ā
of computing in the first place. You'll get allĀ these new apps in the same way as when you gotĀ Ā
the web or you got mobile phones. People basicallyĀ rethought all these experiences as a lot of thingsĀ Ā
that weren't possible before became possible.Ā So I think that will happen, but I think it'sĀ Ā
a much lower-level innovation. My sense isĀ that it's going to be more like people goingĀ Ā
from not having computers to having computers. Itās very hard to reason about exactly how thisĀ Ā
goes. In the cosmic scale obviously it'll happenĀ quickly, over a couple of decades or something.Ā Ā
There is some set of people who are afraid of itĀ really spinning out and going from being somewhatĀ Ā
intelligent to extremely intelligent overnight.Ā I just think that there's all these physicalĀ Ā
constraints that make that unlikely to happen. IĀ just don't really see that playing out. I thinkĀ Ā
we'll have time to acclimate a bit. But it willĀ really change the way that we work and give peopleĀ Ā
all these creative tools to do different things.Ā I think it's going to really enable people to doĀ Ā
the things that they want a lot more. So maybe not overnight, but is it yourĀ Ā
view that on a cosmic scale we can think ofĀ these milestones in this way? Humans evolved,Ā Ā
and then AI happened, and then they went outĀ into the galaxy. Maybe it takes many decades,Ā Ā
maybe it takes a century, but is that the grandĀ scheme of what's happening right now in history? āØĀ
Sorry, in what sense? In the sense that there wereĀ Ā
other technologies, like computers and evenĀ fire, but the development of AI itself is asĀ Ā
significant as humans evolving in the first place. I think that's tricky.āØThe history of humanityĀ Ā
has been people basically thinking that certainĀ aspects of humanity are really unique in differentĀ Ā
ways and then coming to grips with the fact thatĀ that's not true, but that humanity is actuallyĀ Ā
still super special. We thought that the earthĀ was the center of the universe and it's not,Ā Ā
but humans are still prettyĀ awesome and pretty unique, right?Ā
I think another bias that people tendĀ to have is thinking that intelligenceĀ Ā
is somehow fundamentally connected to life.Ā It's not actually clear that it is. I don'tĀ Ā
know that we have a clear enough definition ofĀ consciousness or life to fully interrogate this.Ā Ā
There's all this science fiction about creatingĀ intelligence where it starts to take on all theseĀ Ā
human-like behaviors and things like that. TheĀ current incarnation of all this stuff feels likeĀ Ā
it's going in a direction where intelligenceĀ can be pretty separated from consciousness,Ā Ā
agency, and things like that, which IĀ think just makes it a super valuable tool.Ā
Obviously it's very difficult to predictĀ what direction this stuff goes in over time,Ā Ā
which is why I don't think anyone should beĀ dogmatic about how they plan to develop itĀ Ā
or what they plan to do. You want to lookĀ at it with each release. We're obviouslyĀ Ā
very pro open source, but I haven't committedĀ to releasing every single thing that we do.Ā Ā
Iām basically very inclined to think thatĀ open sourcing is going to be good for theĀ Ā
community and also good for us because we'llĀ benefit from the innovations. If at some pointĀ Ā
however there's some qualitative change in whatĀ the thing is capable of, and we feel like it'sĀ Ā
not responsible to open source it, then weĀ won't. It's all very difficult to predict.Ā
What is a kind of specific qualitative changeĀ where you'd be training Llama-5 or Llama-4,Ā Ā
and if you see it, itād make you think āyou knowĀ what, I'm not sure about open sourcing itā?āØĀ
It's a little hard to answer that inĀ the abstract because there are negativeĀ Ā
behaviors that any product can exhibitĀ where as long as you can mitigate it,Ā Ā
it's okay. Thereās bad things about social mediaĀ that we work to mitigate. There's bad things aboutĀ Ā
Llama-2 where we spend a lot of time tryingĀ to make sure that it's not like helping peopleĀ Ā
commit violent acts or things like that. ThatĀ doesn't mean that it's a kind of autonomous orĀ Ā
intelligent agent. It just means that it's learnedĀ a lot about the world and it can answer a set ofĀ Ā
questions that we think would be unhelpful for itĀ to answer. I think the question isn't really whatĀ Ā
behaviors would it show, it's what things wouldĀ we not be able to mitigate after it shows that.Ā
I think that there's so many ways in whichĀ something can be good or bad that it's hardĀ Ā
to actually enumerate them all up front. Look atĀ what we've had to deal with in social media andĀ Ā
the different types of harms. We've basicallyĀ gotten to like 18 or 19 categories of harmfulĀ Ā
things that people do and we've basically builtĀ AI systems to identify what those things are andĀ Ā
to make sure that doesn't happen on our networkĀ as much as possible. Over time I think you'llĀ Ā
be able to break this down into more of aĀ taxonomy too. I think this is a thing thatĀ Ā
we spend time researching as well, because weĀ want to make sure that we understand that. āØĀ
It seems to me that it would be a good idea.Ā I would be disappointed in a future where AIĀ Ā
systems aren't broadly deployed and everybodyĀ doesn't have access to them. At the same time,Ā Ā
I want to better understand the mitigations.Ā If the mitigation is the fine-tuning,Ā Ā
the whole thing about open weights is that youĀ can then remove the fine-tuning, which is oftenĀ Ā
superficial on top of these capabilities. If it'sĀ like talking on Slack with a biology researcherā¦Ā Ā
I think models are very far from this. RightĀ now, theyāre like Google search. But if I canĀ Ā
show them my Petri dish and they can explain whyĀ my smallpox sample didnāt grow and what to change,Ā Ā
how do you mitigate that? Because somebodyĀ can just fine-tune that in there, right?Ā
That's true. I think a lot of people willĀ basically use the off-the-shelf model and someĀ Ā
people who have basically bad faith are going toĀ try to strip out all the bad stuff. So I do thinkĀ Ā
that's an issue. On the flip side, one of theĀ reasons why I'm philosophically so pro open sourceĀ Ā
is that I do think that a concentration of AI inĀ the future has the potential to be as dangerous asĀ Ā
it being widespread. I think a lot of people thinkĀ about the questions of āif we can do this stuff,Ā Ā
is it bad for it to be out in the wild and justĀ widely available?ā I think another version ofĀ Ā
this is that it's probably also pretty badĀ for one institution to have an AI that isĀ Ā
way more powerful than everyone else's AI. Thereās one security analogy that I thinkĀ Ā
of. There are so many security holes in so manyĀ different things. If you could travel back inĀ Ā
time a year or two years, let's say you just haveĀ one or two years more knowledge of the securityĀ Ā
holes. You can pretty much hack into any system.Ā Thatās not AI. So it's not that far-fetched toĀ Ā
believe that a very intelligent AI probably wouldĀ be able to identify some holes and basicallyĀ Ā
be like a human who could go back in time aĀ year or two and compromise all these systems.Ā
So how have we dealt with that as a society?Ā One big part is open source software thatĀ Ā
makes it so that when improvements are made toĀ the software, it doesn't just get stuck in oneĀ Ā
company's products but can be broadly deployed toĀ a lot of different systems, whether theyāre banksĀ Ā
or hospitals or government stuff. As the softwareĀ gets hardened, which happens because more peopleĀ Ā
can see it and more people can bang on it, thereĀ are standards on how this stuff works. The worldĀ Ā
can get upgraded together pretty quickly. I think that a world where AI is very widelyĀ Ā
deployed, in a way where it's gotten hardenedĀ progressively over time, is one where all theĀ Ā
different systems will be in check in a way. ThatĀ seems fundamentally more healthy to me than oneĀ Ā
where this is more concentrated. So there areĀ risks on all sides, but I think that's a riskĀ Ā
that I don't hear people talking about quite asĀ much. There's the risk of the AI system doingĀ Ā
something bad. But I stay up at night worryingĀ more about an untrustworthy actor having the superĀ Ā
strong AI, whether it's an adversarial governmentĀ or an untrustworthy company or whatever. I thinkĀ Ā
that that's potentially a much bigger risk. āØ As in, they could overthrow our government becauseĀ Ā
they have a weapon that nobody else has? Or just cause a lot of mayhem. I think theĀ Ā
intuition is that this stuff ends up beingĀ pretty important and valuable for bothĀ Ā
economic and security reasons and other things.Ā If someone whom you don't trust or an adversaryĀ Ā
gets something more powerful, then I think thatĀ that could be an issue. Probably the best wayĀ Ā
to mitigate that is to have good open sourceĀ AI that becomes the standard and in a lot ofĀ Ā
ways can become the leader. It just ensures thatĀ it's a much more even and balanced playing field.Ā
That seems plausible to me. If that works out,Ā that would be the future I prefer. I want toĀ Ā
understand mechanistically how the fact thatĀ there are open source AI systems in the worldĀ Ā
prevents somebody causing mayhem with their AIĀ system? With the specific example of somebodyĀ Ā
coming with a bioweapon, is it just that we'll doĀ a bunch of R&D in the rest of the world to figureĀ Ā
out vaccines really fast? What's happening? If you take the security one that I wasĀ Ā
talking about, I think someone withĀ a weaker AI trying to hack into aĀ Ā
system that is protected by a stronger AI willĀ succeed less. In terms of software securityāĀ
How do we know everything in the world is likeĀ that? What if bioweapons aren't like that? āØĀ
I mean, I don't know that everything in theĀ world is like that. Bioweapons are one of theĀ Ā
areas where the people who are most worried aboutĀ this stuff are focused and I think it makes a lotĀ Ā
of sense. There are certain mitigations. YouĀ can try to not train certain knowledge intoĀ Ā
the model. There are different things but atĀ some level if you get a sufficiently bad actor,Ā Ā
and you don't have other AI that can balanceĀ them and understand what the threats are,Ā Ā
then that could be a risk. That's one ofĀ the things that we need to watch out for.Ā
Is there something you could see in the deploymentĀ of these systems where you're training Llama-4 andĀ Ā
it lied to you because it thought you weren'tĀ noticing or something and you're like āwhoaĀ Ā
what's going on here?ā This is probably notĀ likely with a Llama-4 type system, but isĀ Ā
there something you can imagine like that whereĀ you'd be really concerned about deceptiveness andĀ Ā
billions of copies of this being out in the wild? I mean right now we see a lot of hallucinations.Ā Ā
It's more so that. I think it's an interestingĀ question, how you would tell the differenceĀ Ā
between hallucination and deception. There areĀ a lot of risks and things to think about. I try,Ā Ā
in running our company at least, to balanceĀ these longer-term theoretical risks withĀ Ā
what I actually think are quite real risks thatĀ exist today. So when you talk about deception,Ā Ā
the form of that that I worry about most isĀ people using this to generate misinformationĀ Ā
and then pump that through our networks orĀ others. The way that we've combated this typeĀ Ā
of harmful content is by building AI systemsĀ that are smarter than the adversarial ones.Ā
This informs part of my theory on this. If youĀ look at the different types of harm that peopleĀ Ā
do or try to do through social networks, there areĀ ones that are not very adversarial. For example,Ā Ā
hate speech is not super adversarial in the senseĀ that people aren't getting better at being racist.Ā Ā
That's one where I think the AIs are generallyĀ getting way more sophisticated faster than peopleĀ Ā
are at those issues. And we have issues bothĀ ways. People do bad things, whether they'reĀ Ā
trying to incite violence or something, butĀ we also have a lot of false positives where weĀ Ā
basically censor stuff that we shouldn't. I thinkĀ that understandably makes a lot of people annoyed.Ā Ā
So I think having an AI that gets increasinglyĀ precise on that is going to be good over time.Ā
But let me give you another example: nationĀ states trying to interfere in elections. That'sĀ Ā
an example where they absolutely have cutting edgeĀ technology and absolutely get better each year. SoĀ Ā
we block some technique, they learn what we didĀ and come at us with a different technique. It'sĀ Ā
not like a person trying to say mean things, TheyĀ have a goal. They're sophisticated. They have aĀ Ā
lot of technology. In those cases, I still thinkĀ about the ability to have our AI systems grow inĀ Ā
sophistication at a faster rate than theirs do.Ā It's an arms race but I think we're at leastĀ Ā
winning that arms race currently. This is a lotĀ of the stuff that I spend time thinking about.Ā
Yes, whether it's Llama-4 or Llama-6, we need toĀ think about what behaviors we're observing andĀ Ā
it's not just us. Part of the reason why you makeĀ this open source is that there are a lot of otherĀ Ā
people who study this too. So we want to see whatĀ other people are observing, what weāre observing,Ā Ā
what we can mitigate, and then we'll makeĀ our assessment on whether we can make itĀ Ā
open source. For the foreseeable future I'mĀ optimistic we will be able to. In the near term,Ā Ā
I don't want to take our eye off the ballĀ in terms of what are actual bad things thatĀ Ā
people are trying to use the models for today.Ā Even if they're not existential, there areĀ Ā
pretty bad day-to-day harms that we're familiarĀ with in running our services. That's actually aĀ Ā
lot of what we have to spend our time on as well. I found the synthetic data thing really curious.Ā Ā
With current models it makes sense why there mightĀ be an asymptote with just doing the synthetic dataĀ Ā
again and again. But letās say they get smarterĀ and you use the kinds of techniquesāyou talk aboutĀ Ā
in the paper or the blog posts that are coming outĀ on the day this will be releasedāwhere it goes toĀ Ā
the thought chain that is the most correct.Ā Why do you think this wouldn't lead to a loopĀ Ā
where it gets smarter, makes better output, getsĀ smarter and so forth. Of course it wouldn't beĀ Ā
overnight, but over many months or years ofĀ training potentially with a smarter model.Ā
I think it could, within the parameters ofĀ whatever the model architecture is. It's justĀ Ā
that with today's 8B parameter models, I don'tĀ think you're going to get to be as good as theĀ Ā
state-of-the-art multi-hundred billionĀ parameter models that are incorporatingĀ Ā
new research into the architecture itself. But those will be open source as well, right?Ā
Well yeah, subject to all the questions that weĀ just talked about but yes. We would hope thatĀ Ā
that'll be the case. But I think that at eachĀ point, when you're building software there's aĀ Ā
ton of stuff that you can do with software butĀ then at some level you're constrained by theĀ Ā
chips that it's running on. So there are alwaysĀ going to be different physical constraints. HowĀ Ā
big the models are is going to be constrainedĀ by how much energy you can get and use forĀ Ā
inference. I'm simultaneously very optimisticĀ that this stuff will continue to improve quicklyĀ Ā
and also a little more measured than I thinkĀ some people are about it. I donāt think theĀ Ā
runaway case is a particularly likely one. I think it makes sense to keep your optionsĀ Ā
open. There's so much we don't know. There's aĀ case in which it's really important to keep theĀ Ā
balance of power so nobody becomes a totalitarianĀ dictator. There's a case in which you don't wantĀ Ā
to open source the architecture because China canĀ use it to catch up to America's AIs and there isĀ Ā
an intelligence explosion and they win that. A lotĀ of things seem possible. Keeping your options openĀ Ā
considering all of them seems reasonable. Yeah.Ā
Let's talk about some other things. Metaverse.Ā What time period in human history would you beĀ Ā
most interested in going into? 100,000 BCE toĀ now, you just want to see what it was like?Ā
It has to be the past? Oh yeah, it has to be the past.āØĀ
I'm really interested in American history andĀ classical history. I'm really interested in theĀ Ā
history of science too. I actually think seeingĀ and trying to understand more about how some ofĀ Ā
the big advances came about would be interesting.Ā All we have are somewhat limited writings aboutĀ Ā
some of that stuff. I'm not sure the metaverseĀ is going to let you do that because it's goingĀ Ā
to be hard to go back in time for things thatĀ we don't have records of. I'm actually not sureĀ Ā
that going back in time is going to be thatĀ important of a thing. I think it's going toĀ Ā
be cool for like history classes and stuff,Ā but that's probably not the use case that I'mĀ Ā
most excited about for the metaverse overall. The main thing is just the ability to feelĀ Ā
present with people, no matter where you are.Ā I think that's going to be killer. In the AIĀ Ā
conversation that we were having, so much of itĀ is about physical constraints that underlie allĀ Ā
of this. I think one lesson of technology isĀ that you want to move things from the physicalĀ Ā
constraint realm into software as much as possibleĀ because software is so much easier to build andĀ Ā
evolve. You can democratize it more becauseĀ not everyone is going to have a data center butĀ Ā
a lot of people can write code and take openĀ source code and modify it. Ī¤he metaverseĀ Ā
version of this is enabling realistic digitalĀ presence. Thatās going to be an absolutely hugeĀ Ā
difference so people don't feel like they haveĀ to be physically together for as many things.Ā Ā
Now I think that there can be things that areĀ better about being physically together. TheseĀ Ā
things aren't binary. It's not going to be likeĀ āokay, now you don't need to do that anymore.āĀ Ā
But overall, I think it's just going to beĀ really powerful for socializing, for feelingĀ Ā
connected with people, for working, for partsĀ of industry, for medicine, for so many things. āØĀ
I want to go back to something you said at theĀ beginning of the conversation. You didn't sellĀ Ā
the company for a billion dollars. And withĀ the metaverse, you knew you were going toĀ Ā
do this even though the market was hammeringĀ you for it. I'm curious. What is the sourceĀ Ā
of that edge? You said āoh, values, I haveĀ this intuition,ā but everybody says that. IfĀ Ā
you had to say something that's specific toĀ you, how would you express what that is? WhyĀ Ā
were you so convinced about the metaverse?āØ I think that those are different questions.Ā Ā
What are the things that power me? We'veĀ talked about a bunch of the themes. I justĀ Ā
really like building things. I specifically likeĀ building things around how people communicate andĀ Ā
understanding how people express themselvesĀ and how people work. When I was in collegeĀ Ā
I studied computer science and psychology. IĀ think a lot of other people in the industryĀ Ā
studied computer science. So, it's always beenĀ the intersection of those two things for me.Ā
Itās also sort of this really deep drive. IĀ don't know how to explain it but I just feelĀ Ā
constitutionally that I'm doing something wrong ifĀ I'm not building something new. Even when we wereĀ Ā
putting together the business case for investingĀ a $100 billion in AI or some huge amount in theĀ Ā
metaverse, we have plans that I think madeĀ it pretty clear that if our stuff works,Ā Ā
it'll be a good investment. But you can't knowĀ for certain from the outset. There are all theseĀ Ā
arguments that people have, with advisorsĀ or different folks. It's like, āhow are youĀ Ā
confident enough to do this?ā Well the day I stopĀ trying to build new things, I'm just done. I'mĀ Ā
going to go build new things somewhere else. I'mĀ fundamentally incapable of running something,Ā Ā
or in my own life, and not trying to build newĀ things that I think are interesting. That's notĀ Ā
even a question for me, whether we're going toĀ take a swing at building the next thing. I'mĀ Ā
just incapable of not doing that. I don't know. I'm kind of like this in all the different aspectsĀ Ā
of my life. Our family built this ranch in KauaiĀ and I worked on designing all these buildings. WeĀ Ā
started raising cattle and I'm like āalright, IĀ want to make the best cattle in the world so howĀ Ā
do we architect this so that way we can figureĀ this out and build all the stuff up that weĀ Ā
need to try to do that.ā I don't know, that'sĀ me. What was the other part of the question?Ā
I'm not sure but I'm actually curiousĀ about something else. So a 19-year-oldĀ Ā
Mark reads a bunch of antiquity andĀ classics in high school and college.Ā Ā
What important lesson did you learn fromĀ it? Not just interesting things you found,Ā Ā
but there aren't that many tokens you consume byĀ the time you're 19. A bunch of them were about theĀ Ā
classics. Clearly that was important in some way. There aren't that many tokens you consume...Ā Ā
That's a good question. Hereās one of the thingsĀ I thought was really fascinating. Augustus becameĀ Ā
emperor and he was trying to establish peace.Ā There was no real conception of peace at theĀ Ā
time. The people's understanding of peace wasĀ peace as the temporary time between when yourĀ Ā
enemies inevitably attack you. So you get aĀ short rest. He had this view of changing theĀ Ā
economy from being something mercenary andĀ militaristic to this actually positive-sumĀ Ā
thing. It was a very novel idea at the time. Thatās something that's really fundamental:Ā Ā
the bounds on what people can conceiveĀ of at the time as rational ways to work.Ā Ā
This applies to both the metaverse and the AIĀ stuff. A lot of investors, and other people,Ā Ā
can't wrap their head around why we would openĀ source this. Itās like āI don't understand, itāsĀ Ā
open source. That must just be the temporary timeĀ between which you're making things proprietary,Ā Ā
right?ā I think it's this very profound thing inĀ tech that it actually creates a lot of winners.Ā
I don't want to strain the analogy tooĀ much but I do think that a lot of the time,Ā Ā
there are models for building things thatĀ people often can't even wrap their headĀ Ā
around. They canāt understand how that would be aĀ valuable thing for people to do or how it would beĀ Ā
a reasonable state of the world. I think thereĀ are more reasonable things than people think.Ā
That's super fascinating. Can I give you whatĀ I was thinking in terms of what you might haveĀ Ā
gotten from it? This is probably totally off,Ā but I think itās just how young some of theseĀ Ā
people are, who have very important rolesĀ in the empire. For example, Caesar Augustus,Ā Ā
by the time heās 19, is already one of the mostĀ important people in Roman politics. He's leadingĀ Ā
battles and forming the Second Triumvirate. IĀ wonder if the 19-year-old you was thinking āIĀ Ā
can do this because Caesar Augustus did this.ā That's an interesting example, both from a lotĀ Ā
of history and American history too. One of myĀ favorite quotes is this Picasso quote that allĀ Ā
children are artists and the challenge is toĀ remain an artist as you grow up. When youāreĀ Ā
younger, itās just easier to have wild ideas.Ā There are all these analogies to the innovatorāsĀ Ā
dilemma that exist in your life as well as forĀ your company or whatever youāve built. YouāreĀ Ā
earlier on in your trajectory so it's easier toĀ pivot and take in new ideas without disruptingĀ Ā
other commitments to different things.Ā I think that's an interesting part ofĀ Ā
running a company. How do you stay dynamic? Letās go back to the investors and open source.Ā Ā
The $10B model, suppose it's totally safe. You'veĀ done these evaluations and unlike in this caseĀ Ā
the evaluators can also fine-tune the model, whichĀ hopefully will be the case in future models. WouldĀ Ā
you open source the $10 billion model? As long as it's helping us then yeah.Ā
But would it? $10 billion ofĀ R&D and now it's open source.Ā
Thatās a question which weāll have to evaluateĀ as time goes on too. We have a long history ofĀ Ā
open sourcing software. We donāt tend to openĀ source our product. We don't take the code forĀ Ā
Instagram and make it open source. We takeĀ a lot of the low-level infrastructure andĀ Ā
we make that open source. Probably the biggestĀ one in our history was our Open Compute ProjectĀ Ā
where we took the designs for all of our servers,Ā network switches, and data centers, and made itĀ Ā
open source and it ended up being super helpful.Ā Although a lot of people can design servers theĀ Ā
industry now standardized on our design, whichĀ meant that the supply chains basically all gotĀ Ā
built out around our design. So volumes wentĀ up, it got cheaper for everyone, and it savedĀ Ā
us billions of dollars which was awesome. So there's multiple ways where open sourceĀ Ā
could be helpful for us. One is if people figureĀ out how to run the models more cheaply. We'reĀ Ā
going to be spending tens, or a hundred billionĀ dollars or more over time on all this stuff. SoĀ Ā
if we can do that 10% more efficiently, we'reĀ saving billions or tens of billions of dollars.Ā Ā
That's probably worth a lot by itself. EspeciallyĀ if there are other competitive models out there,Ā Ā
it's not like our thing is givingĀ away some kind of crazy advantage.Ā
So is your view that theĀ training will be commodified?Ā
I think there's a bunch of ways that this couldĀ play out and that's one. So ācommodityā impliesĀ Ā
that it's going to get very cheap because thereĀ are lots of options. The other direction that thisĀ Ā
could go in is qualitative improvements. YouĀ mentioned fine-tuning. Right now it's prettyĀ Ā
limited what you can do with fine-tuning majorĀ other models out there. There are some optionsĀ Ā
but generally not for the biggest models. ThereāsĀ being able to do that, different app specificĀ Ā
things or use case specific things or buildingĀ them into specific tool chains. I think that willĀ Ā
not only enable more efficient development, butĀ it could enable qualitatively different things.Ā
Here's one analogy on this. One thing that I thinkĀ generally sucks about the mobile ecosystem is thatĀ Ā
you have these two gatekeeper companies, Apple andĀ Google, that can tell you what you're allowed toĀ Ā
build. There's the economic version of that whichĀ is like when we build something and they justĀ Ā
take a bunch of your money. But then there's theĀ qualitative version, which is actually what upsetsĀ Ā
me more. There's a bunch of times when we'veĀ launched or wanted to launch features and Apple'sĀ Ā
just like ānope, you're not launching that.ā ThatĀ sucks, right? So the question is, are we set upĀ Ā
for a world like that with AI? You're going toĀ get a handful of companies that run these closedĀ Ā
models that are going to be in control of the APIsĀ and therefore able to tell you what you can build?Ā
For us I can say it is worth it to go buildĀ a model ourselves to make sure that we're notĀ Ā
in that position. I don't want any of thoseĀ other companies telling us what we can build.Ā Ā
From an open source perspective, I think a lot ofĀ developers don't want those companies telling themĀ Ā
what they can build either. So the question is,Ā what is the ecosystem that gets built out aroundĀ Ā
that? What are interesting new things? How muchĀ does that improve our products? I think thereĀ Ā
are lots of cases where if this ends up being likeĀ our databases or caching systems or architecture,Ā Ā
we'll get valuable contributions from theĀ community that will make our stuff better.Ā Ā
Our app specific work that we do will then stillĀ be so differentiated that it won't really matter.Ā Ā
We'll be able to do what we do. We'll benefitĀ and all the systems, ours and the communitiesā,Ā Ā
will be better because it's open source. There is one world where maybeĀ Ā
thatās not the case. Maybe the model ends upĀ being more of the product itself. I think it'sĀ Ā
a trickier economic calculation then, whetherĀ you open source that. You are commoditizingĀ Ā
yourself then a lot. But from what I can see soĀ far, it doesn't seem like we're in that zone.Ā
Do you expect to earn significant revenueĀ from licensing your model to the cloudĀ Ā
providers? So they have to pay youĀ a fee to actually serve the model.Ā
We want to have an arrangement like that butĀ I don't know how significant it'll be. This isĀ Ā
basically our license for Llama. In a lot of waysĀ it's a very permissive open source license, exceptĀ Ā
that we have a limit for the largest companiesĀ using it. This is why we put that limit in. We'reĀ Ā
not trying to prevent them from using it. We justĀ want them to come talk to us if they're going toĀ Ā
just basically take what we built and resell itĀ and make money off of it. If you're like MicrosoftĀ Ā
Azure or Amazon, if you're going to be resellingĀ the model then we should have some revenue shareĀ Ā
on that. So just come talk to us before youĀ go do that. That's how that's played out.Ā
So for Llama-2, we just have deals with basicallyĀ all these major cloud companies and Llama-2 isĀ Ā
available as a hosted service on all thoseĀ clouds. I assume that as we release biggerĀ Ā
and bigger models, that will become a biggerĀ thing. It's not the main thing that we're doing,Ā Ā
but I think if those companies are going to beĀ selling our models it just makes sense that weĀ Ā
should share the upside of that somehow. Regarding other open source dangers,Ā Ā
I think you have genuine legitimate points aboutĀ the balance of power stuff and potentially theĀ Ā
harms you can get rid of because we have betterĀ alignment techniques or something. I wish thereĀ Ā
were some sort of framework that Meta had. OtherĀ labs have this where they say āif we see thisĀ Ā
concrete thing, then that's a no go on the openĀ source or even potentially on deployment.ā JustĀ Ā
writing it down so the company is ready for it andĀ people have expectations around it and so forth. āØĀ
That's a fair point on the existential riskĀ side. Right now we focus more on the types ofĀ Ā
risks that we see today, which are more of theseĀ content risks. We don't want the model to be doingĀ Ā
things that are helping people commit violenceĀ or fraud or just harming people in differentĀ Ā
ways. While it is maybe more intellectuallyĀ interesting to talk about the existential risks,Ā Ā
I actually think the real harms that need moreĀ energy in being mitigated are things where someoneĀ Ā
takes a model and does something to hurt aĀ person. In practice for the current models,Ā Ā
and I would guess the next generationĀ and maybe even the generation after that,Ā Ā
those are the types of more mundane harms that weĀ see today, people committing fraud against eachĀ Ā
other or things like that. I just don't want toĀ shortchange that. I think we have a responsibilityĀ Ā
to make sure we do a good job on that. Meta's a big company. You can handle both.Ā
As far as open source goes, I'm actuallyĀ curious if you think the impact of open source,Ā Ā
from PyTorch, React, Open Compute and otherĀ things, has been bigger for the world thanĀ Ā
even the social media aspects of Meta. I'veĀ talked to people who use these servicesĀ Ā
and they think that it's plausible because aĀ big part of the internet runs on these things.Ā
It's an interesting question. I mean almostĀ half the world uses our consumer products soĀ Ā
it's hard to beat that. But I think openĀ source is really powerful as a new way ofĀ Ā
building things. I mean, it's possible. ItĀ may be one of these things like Bell Labs,Ā Ā
where they were working on the transistor becauseĀ they wanted to enable long-distance calling. TheyĀ Ā
did and it ended up being really profitable forĀ them that they were able to enable long-distanceĀ Ā
calling. 5 to 10 years out from that, if youĀ asked them what was the most useful thingĀ Ā
that they invented it's like āokay, we enabledĀ long distance calling and now all these peopleĀ Ā
are long-distance calling.ā But if you asked aĀ hundred years later maybe it's a different answer.Ā
I think that's true of a lot of the things thatĀ we're building: Reality Labs, some of the AIĀ Ā
stuff, some of the open source stuff. The specificĀ products evolve, and to some degree come and go,Ā Ā
but the advances for humanity persist andĀ that's a cool part of what we all get to do.Ā
By when will the Llama models beĀ trained on your own custom silicon? āØĀ
Soon, not Llama-4. The approach that we took isĀ we first built custom silicon that could handleĀ Ā
inference for our ranking and recommendationĀ type stuff, so Reels, News Feed ads, etc. ThatĀ Ā
was consuming a lot of GPUs. When we were ableĀ to move that to our own silicon, we're now ableĀ Ā
to use the more expensive NVIDIA GPUs only forĀ training. At some point we will hopefully haveĀ Ā
silicon ourselves that we can be using for atĀ first training some of the simpler things, thenĀ Ā
eventually training these really large models. InĀ the meantime, I'd say the program is going quiteĀ Ā
well and we're just rolling it out methodicallyĀ and we have a long-term roadmap for it. āØĀ
Final question. This is totally out ofĀ left field. If you were made CEO of Google+Ā Ā
could you have made it work? Google+? Oof. I don't know.Ā Ā
That's a very difficult counterfactual. āØ Okay, then the real final question will be:Ā Ā
when Gemini was launched, wasĀ there any chance that somebodyĀ Ā
in the office uttered: āCarthago delenda estā. No, I think we're tamer now. It's a good question.Ā Ā
The problem is there was no CEO of Google+. ItĀ was just a division within a company. You askedĀ Ā
before about what are the scarcest commoditiesĀ but you asked about it in terms of dollars. IĀ Ā
actually think for most companies, of this scaleĀ at least, it's focus. When you're a startup maybeĀ Ā
you're more constrained on capital. Youāre justĀ working on one idea and you might not have allĀ Ā
the resources. You cross some threshold at someĀ point with the nature of what you're doing. You'reĀ Ā
building multiple things. You're creatingĀ more value across them but you become moreĀ Ā
constrained on what you can direct to go well. There are always the cases where somethingĀ Ā
random awesome happens in the organization and IĀ don't even know about it. Those are great. But IĀ Ā
think in general, the organization's capacityĀ is largely limited by what the CEO and theĀ Ā
management team are able to oversee and manage.Ā That's been a big focus for us. As Ben HorowitzĀ Ā
says ākeep the main thing, the main thingā andĀ try to stay focused on your key priorities.Ā
Awesome,āØthat was excellent, Mark.Ā Thanks so much. That was a lot of fun.Ā
Yeah, really fun. Thanks for having me. Absolutely.
5.0 / 5 (0 votes)