Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters
Summary
TLDRIn a thought-provoking interview, the speaker, presumably Mark Zuckerberg, discusses the future of AI with a focus on Meta AI's advancements. He highlights the release of Llama-3, an open-source AI model integrated with Google and Bing for real-time knowledge, emphasizing its capabilities in image generation and natural language processing. Zuckerberg also addresses the challenges of building large-scale data centers, the risks of centralized AI control, and the importance of open-source contributions. He stresses the potential of AI to revolutionize various sectors, including science and healthcare, and shares his vision of AI as a tool that enhances human productivity rather than replacing it. The conversation delves into the implications of AI development, the balance between innovation and safety, and the significance of open-source software in democratizing AI technology.
Takeaways
- ð€ The new version of Meta AI, Llama-3, is set to be the most intelligent, freely-available AI assistant, integrating with Google and Bing for real-time knowledge and featuring enhanced creation capabilities like animations and real-time image generation.
- ð Meta is training multiple versions of the Llama model, including an 8 billion parameter model released for the developer community and a 405 billion parameter model still in training, aiming to push the boundaries of AI capabilities.
- ð The release of Llama-3 is not global but will start in a few countries, with plans for a wider rollout in the coming months, reflecting a strategic approach to introducing advanced AI technologies.
- ð Mark Zuckerberg emphasizes the importance of open-source AI, believing it to be beneficial for the community and for Meta, allowing for broader innovation and a more level playing field in the AI industry.
- ð¡ïž There is a commitment to responsible AI development, with considerations for not releasing certain models if they present irresolvable negative behaviors or risks, highlighting a cautious approach to AI's potential downsides.
- âïž Meta is investing in custom silicon to improve the efficiency of AI model training and inference, which could significantly reduce costs and improve performance for their AI-driven services.
- ð Zuckerberg shares his passion for building new things and his belief in the potential of AI to enable creativity and productivity, reflecting his personal drive and the company's mission.
- ð® The potential of AI is compared to the creation of computing itself, suggesting a fundamental shift in how people work and live, with AI becoming an integral part of various industries and aspects of life.
- ð¡ Open source contributions, such as PyTorch and React, are considered powerful drivers of innovation and have possibly had a significant impact on the world, potentially rivaling the reach of Meta's social media products.
- âïž There's a discussion on the balance of power in AI development, with concerns about the risks of having a single entity with disproportionately strong AI capabilities, advocating for a decentralized approach.
- ð Zuckerberg draws an analogy between historical shifts in understanding, like the concept of peace under Augustus, and current paradigm shifts in technology and business models, emphasizing the importance of challenging conventional thinking.
Q & A
What is the main update to Meta AI that Mark Zuckerberg discusses in the interview?
-The main update is the rollout of Llama-3, an AI model that is both open source and will power Meta AI. It is considered the most intelligent, freely-available AI assistant at the time of the interview.
How does Meta AI integrate with other search engines?
-Meta AI integrates with Google and Bing for real-time knowledge, making it more prominent across apps like Facebook and Messenger.
What new creation features does Meta AI introduce?
-Meta AI introduces features like animations, where any image can be animated, and real-time high-quality image generation as users type their queries.
What are the technical specifications of the Llama-3 model that Mark Zuckerberg finds exciting?
-Mark Zuckerberg is excited about the Llama-3 model, which includes an 8 billion parameter model and a 70 billion parameter model. There's also a 405 billion parameter model in training.
What is the roadmap for future releases of Meta AI?
-The roadmap includes new releases that will bring multimodality, more multi-linguality, and bigger context windows. There are plans to roll out the 405B model later in the year.
How does Mark Zuckerberg perceive the risk of having a few companies controlling closed AI models?
-He sees it as a significant risk, as it could lead to these companies dictating what others can build, creating a situation similar to the control exerted by Apple over app features.
What is the strategy behind Meta's acquisition of GPUs like the H100?
-The strategy was to ensure they had enough capacity to build something they couldn't foresee on the horizon yet, doubling the order to be prepared for future needs beyond the immediate requirements for Reels and content ranking.
Why did Mark Zuckerberg decide not to sell Facebook in 2006 for $1 billion?
-Mark felt a deep conviction in what they were building and believed that if he sold the company, he would just build another similar one. He also lacked the financial sophistication to engage in the billion-dollar valuation debate.
What is the role of Facebook AI Research (FAIR) in the development of Meta's AI?
-FAIR, established about 10 years prior, has been instrumental in creating innovations that improved Meta's products. It transitioned from a pure research group to a key player in integrating AI into Meta's products, with the creation of the gen AI group.
How does Meta plan to approach the development of more advanced AI models like Llama-4?
-Meta plans to continue training larger models, incorporating more capabilities like reasoning and memory, and focusing on multimodality and emotional understanding. They aim to make AI more integrated into various aspects of their products and services.
What are the potential future challenges in scaling AI models?
-Challenges include physical constraints like energy limitations for training large models, regulatory hurdles for building new power plants and transmission lines, and the balance between open sourcing models and potential risks associated with them.
How does Mark Zuckerberg view the future of AI and its impact on society?
-He sees AI as a fundamental shift, similar to the creation of computing, that will enable new applications and experiences. However, he also acknowledges the need for careful consideration of risks and the importance of a balanced approach to AI development and deployment.
Outlines
ð AI Innovation and Meta AI's New Features
The speaker expresses an inherent drive to continually innovate and build new features, despite challenges from entities like Apple. The conversation introduces Meta AI's latest advancements, highlighting the release of Llama-3, an open-source AI model that integrates with Google and Bing for real-time knowledge. New features include image animation and real-time high-quality image generation based on user queries. The speaker emphasizes Meta AI's commitment to making AI more accessible and enhancing its capabilities across various applications.
ð€ The Future of AI and Meta's Strategic Investments
The discussion delves into the strategic foresight behind Meta's investment in GPUs for AI model training. The speaker reflects on the importance of capacity planning for unforeseen technological advancements, drawing parallels with past decisions that have shaped the company's direction. The conversation also touches on the speaker's personal philosophy on company valuation and the significance of Facebook AI Research (FAIR) in driving product innovation.
ð§ AGI and the Evolution of Meta's AI Strategy
The speaker outlines the evolution of Meta's approach to AI, from the inception of FAIR to the current focus on general AI (AGI). The importance of coding and reasoning in training AI models is emphasized, highlighting how these capabilities enhance the AI's performance across various domains. The conversation explores the concept of AI as a progressive tool that augments human capabilities rather than replacing them.
ð Multimodal AI and the Future of Interaction
The speaker envisions a future where AI capabilities become more integrated and sophisticated, covering emotional understanding and multimodal interactions. The potential for personalized AI models and the impact of AI on industrial-scale operations are discussed. The conversation also addresses the idea of AI agents representing businesses and creators, and the importance of open-source AI in maintaining a balanced technological landscape.
ð Scaling AI Models and Meta's Computational Challenges
The speaker discusses the challenges and strategies related to scaling AI models, including the physical and computational constraints of training large models like Llama-3. The conversation explores the concept of using inference to generate synthetic data for training and the potential for smaller, fine-tuned models to play a significant role in various applications. The speaker also addresses the importance of community contributions in advancing AI technology.
ð The Impact of Open Source on AI and Technology
The speaker reflects on the impact of open-source contributions from Meta, such as PyTorch and React, and their potential long-term significance. The conversation considers whether open-source efforts could have a more profound impact than Meta's social media products, given their widespread use across the internet. The speaker also discusses the future integration of Llama models with custom silicon for more efficient training.
ð€ Navigating Open Source Risks and Future AI Developments
The speaker addresses concerns about the potential risks of open sourcing powerful AI models, including the possibility of misuse. The conversation focuses on the importance of balancing theoretical risks with practical, everyday harms, and the responsibility to mitigate these risks. The speaker also shares thoughts on the future of AI, including the potential for AI to become a commodified training resource and the economic considerations of open sourcing high-value models.
ð The Value of Focus and Meta's Management Strategy
The speaker discusses the concept of focus as a scarce commodity, especially for large companies, and its importance in driving the company's success. The conversation touches on the challenges of managing multiple projects and the need to maintain a sharp focus on key priorities. The speaker also reflects on the unpredictability of success in technology and the importance of trying new things.
Mindmap
Keywords
ð¡AI Assistant
ð¡Open Source
ð¡Data Center
ð¡Parameter
ð¡Multimodality
ð¡Benchmark
ð¡Inference
ð¡Meta AI
ð¡Training Cluster
ð¡Content Risks
ð¡Economic Constraints
Highlights
Meta AI is releasing an upgraded model called Llama-3, which is set to be the most intelligent, freely-available AI assistant.
Llama-3 will be available as open source for developers and will also power Meta AI, integrating with Google and Bing for real-time knowledge.
New creation features have been added, including the ability to animate any image and generate high-quality images in real time as you type your query.
Meta AI's new version is initially rolling out in a few countries, with plans for broader availability in the coming weeks and months.
Technically, Llama-3 comes in three versions: an 8 billion parameter model, a 70 billion parameter model released today, and a 405 billion parameter model still in training.
The 70 billion parameter model of Llama-3 has scored highly on benchmarks for math and reasoning, while the 405 billion parameter model is expected to lead in benchmarks upon completion.
Meta has a roadmap for future releases that include multimodality, more multilinguality, and larger context windows.
The decision to invest in GPUs for AI was driven by the need for more capacity to train models for content recommendation in services like Reels.
The capability of showing content from unconnected sources on platforms like Instagram and Facebook represents a significant unlock for user engagement.
The importance of open source in AI development, ensuring a balanced and competitive ecosystem, and the potential risks of concentrated AI power.
The potential for AI to surpass human intelligence in most domains progressively, and the focus on capabilities like emotional understanding and reasoning.
Meta's commitment to addressing the risks of misinformation and the importance of building AI systems to combat adversarial uses.
The vision of AI as a tool that enhances human capabilities rather than replacing them, aiming for increased productivity and creativity.
The significance of the metaverse in enabling realistic digital presence and its potential impact on socializing, working, and various industries.
Mark Zuckerberg's personal drive to continuously build new things and the philosophy behind investing in large-scale projects like AI and the metaverse.
The historical perspective on the development of peace and economy, drawing parallels to modern innovations in tech and the concept of open source.
The potential for custom silicon to revolutionize the training of large AI models and the strategic move to first optimize inference processes.
Transcripts
That's not even a question for me - whether we're going to go take a swing at building Â
the next thing. I'm just incapable of not doing that. There's a bunch of times when we wanted to Â
launch features and then Apple's just like nope you're not launching that I was like Â
that sucks. Are we set up for that with AI where you're going to get a handful of companies that Â
run these closed models that are going to be in control of the apis and therefore are going to be Â
able to tell you what you can build? Then when you start getting into building a data center Â
that's like 300 Megawatts or 500 Megawatts or a Gigawatt - just no one has built a single Gigawatt Â
data center yet. From wherever you sit there's going to be some actor who you don't trust - if Â
they're the ones who have the super strong AI IÂ think that that's potentially a much bigger risk
Mark, welcome to the podcast. Thanks for having me. Big fan of your podcast.Â
Thank you, that's very nice of you to say. Let's start by talking about the releases Â
that will go out when this interview goes out. Tell me about the models and Â
Meta AI. Whatâs new and exciting about them? I think the main thing that most people in the Â
world are going to see is the new version of Meta AI. The most important thing that we're Â
doing is the upgrade to the model. We're rolling out Llama-3. We're doing it both Â
as open source for the dev community and it is now going to be powering Meta AI. There's a lot Â
that I'm sure we'll get into around Llama-3, but I think the bottom line on this is that Â
we think now that Meta AI is the most intelligent, freely-available AI assistant that people can use. Â
We're also integrating Google and Bing for real-time knowledge.Â
We're going to make it a lot more prominent across our apps. At the top of Facebook and Messenger, Â
you'll be able to just use the search box right there to ask any question. There's a bunch of new Â
creation features that we added that I think are pretty cool and that I think people will enjoy. Â
I think animations is a good one. You can basically take any image and just animate it.Â
One that people are going to find pretty wild is that it now generates high quality images Â
so quickly that it actually generates it as you're typing and updates it in real time. Â
âšSo you're typing your query and it's honing in. Itâs like âshow me a picture of a cow in Â
a field with mountains in the background, eating macadamia nuts, drinking beerâ and it's updating Â
the image in real time. It's pretty wild. IÂ think people are going to enjoy that. So IÂ Â
think that's what most people are going to see in the world. We're rolling that out, not everywhere, Â
but we're starting in a handful of countries and we'll do more over the coming weeks and months. Â
I think thatâs going to be a pretty big deal and I'm really excited to get that in people's Â
hands. It's a big step forward for Meta AI. But I think if you want to get under the hood Â
a bit, the Llama-3 stuff is obviously the most technically interesting. We're training three Â
versions: an 8 billion parameter model and a 70Â billion, which we're releasing today, and a 405Â Â
billion dense model, which is still training. So we're not releasing that today, but I'm pretty Â
excited about how the 8B and the 70B turned out. They're leading for their scale. We'll release a Â
blog post with all the benchmarks so people can check it out themselves. Obviously it's open Â
source so people get a chance to play with it. We have a roadmap of new releases coming that Â
are going to bring multimodality, more multi-linguality, and bigger context Â
windows as well. Hopefully, sometime later in the year we'll get to roll out the 405B. For where it Â
is right now in training, it is already at around 85 MMLU and we expect that it's Â
going to have leading benchmarks on a bunch of the benchmarks. I'm pretty excited about all of that. Â
The 70 billion is great too. We're releasing that today. It's around 82 MMLU and has leading scores Â
on math and reasoning. I think just getting this in people's hands is going to be pretty wild. âšÂ
Oh, interesting. That's the first Iâm hearing of it as a benchmark. That's super impressive.Â
The 8 billion is nearly as powerful as the biggest version of Llama-2 that we released. Â
So the smallest Llama-3 is basically as powerful as the biggest Llama-2.Â
Before we dig into these models, I want to go back in time. I'm assuming 2022 is when you Â
started acquiring these H100s, or you can tell me when. The stock price is getting hammered. People Â
are asking what's happening with all this capex. People aren't buying the metaverse. Â
Presumably you're spending that capex to get these H100s. How did you know back then to get the Â
H100s? How did you know that youâd need the GPUs? I think it was because we were working on Reels. Â
We always want to have enough capacity to build something that we can't quite see on the horizon Â
yet. We got into this position with Reels where we needed more GPUs to train the models. It was this Â
big evolution for our services. Instead of just ranking content from people or pages you follow, Â
we made this big push to start recommending what we call unconnected content, content from people Â
or pages that you're not following. âš The corpus of content candidates that Â
we could potentially show you expanded from on the order of thousands to on the order of Â
hundreds of millions. It needed a completely different infrastructure. We started working Â
on doing that and we were constrained on the infrastructure in catching up to what Â
TikTok was doing as quickly as we wanted to. I basically looked at that and I was like âhey, Â
we have to make sure that we're never in this situation again. So let's order enough GPUs to do Â
what we need to do on Reels and ranking content and feed. But let's also double that.â Again, Â
our normal principle is that there's going to be something on the horizon that we can't see yet.Â
Did you know it would be AI? We thought it was going to be something that Â
had to do with training large models. At the time I thought it was probably going to be something Â
that had to do with content. Itâs just the pattern matching of running the company, there's always Â
another thing. At that time I was so deep into trying to get the recommendations working for Â
Reels and other content. Thatâs just such a big unlock for Instagram and Facebook now, being Â
able to show people content that's interesting to them from people that they're not even following.Â
But that ended up being a very good decision in retrospect. And it came from being behind. Â
It wasn't like âoh, I was so far ahead.â Actually, most of the times where we make Â
some decision that ends up seeming good is because we messed something up before Â
and just didn't want to repeat the mistake. This is a total detour, but I want to ask Â
about this while we're on this. We'll get back to AI in a second. In 2006 you didn't sell for Â
$1 billion but presumably there's some amount you would have sold for, right? Did you write down Â
in your head like âI think the actual valuation of Facebook at the time is this and they're not Â
actually getting the valuation rightâ? If theyâd offered you $5 trillion, of course you would have Â
sold. So how did you think about that choice? âš I think some of these things are just personal. Â
I don't know that at the time I was sophisticated enough to do that analysis. I had all these people Â
around me who were making all these arguments for a billion dollars like âhere's the revenue that Â
we need to make and here's how big we need to be. It's clearly so many years in the future.â It was Â
very far ahead of where we were at the time. I didn't really have the financial sophistication Â
to really engage with that kind of debate. Deep down I believed in what we were doing. Â
âšI did some analysis like âwhat would I do if I werenât doing this? Well, I really like building Â
things and I like helping people communicate. I like understanding what's going on with people and Â
the dynamics between people. So I think if I sold this company, I'd just go build another company Â
like this and I kind of like the one I have. So why?â I think a lot of the biggest bets that Â
people make are often just based on conviction and values. It's actually usually very hard to do the Â
analyses trying to connect the dots forward. You've had Facebook AI Research for a long Â
time. Now it's become seemingly central to your company. At what point did making AGI, Â
or however you consider that mission, become a key priority of what Meta is doing?Â
It's been a big deal for a while. We started FAIR about 10 years ago. The idea was that, Â
along the way to general intelligence or whatever you wanna call it, there are going to be all these Â
different innovations and that's going to just improve everything that we do. So we Â
didn't conceive of it as a product. It was more of a research group. Over the last 10 Â
years it has created a lot of different things that have improved all of our products. Itâs Â
advanced the field and allowed other people in the field to create things that have improved our Â
products too. I think that that's been great. There's obviously a big change in the last Â
few years with ChatGPT and the diffusion models around image creation coming out. Â
This is some pretty wild stuff that is pretty clearly going to affect how people Â
interact with every app that's out there. At that point we started a second group, the gen AI group, Â
with the goal of bringing that stuff into our products and building leading foundation models Â
that would power all these different products. âš When we started doing that the theory initially Â
was that a lot of the stuff we're doing is pretty social. It's helping people interact Â
with creators, helping people interact with businesses, helping businesses sell things or Â
do customer support. Thereâs also basic assistant functionality, whether it's for our apps or the Â
smart glasses or VR. So it wasn't completely clear at first that you were going to need full Â
AGI to be able to support those use cases. But in all these subtle ways, through working on them, Â
I think it's actually become clear that you do. For example, when we were working on Llama-2, Â
we didn't prioritize coding because people aren't going to ask Meta AI a lot of coding Â
questions in WhatsApp. Now they will, right?Â
I don't know. I'm not sure that WhatsApp, or Facebook or Instagram, is the UI where people are Â
going to be doing a lot of coding questions. Maybe the website, meta.ai, that weâre launching. But Â
the thing that has been a somewhat surprising result over the last 18 months is that it turns Â
out that coding is important for a lot of domains, not just coding. Even if people aren't asking Â
coding questions, training the models on coding helps them become more rigorous in answering the Â
question and helps them reason across a lot of different types of domains. That's one example Â
where for Llama-3, we really focused on training it with a lot of coding because that's going Â
to make it better on all these things even if people aren't asking primarily coding questions.Â
Reasoning is another example. Maybe you want to chat with a creator or you're a business and Â
you're trying to interact with a customer. That interaction is not just like âokay, Â
the person sends you a message and you just reply.â It's a multi-step interaction Â
where you're trying to think through âhow do I accomplish the person's goals?â A lot of times Â
when a customer comes, they don't necessarily know exactly what they're looking for or how Â
to ask their questions. So it's not really the job of the AI to just respond to the question.Â
You need to kind of think about it more holistically. It really becomes Â
a reasoning problem. So if someone else solves reasoning, or makes good advances on reasoning, Â
and we're sitting here with a basic chat bot, then our product is lame compared to what other Â
people are building. At the end of the day, we basically realized we've got to solve general Â
intelligence and we just upped the ante and the investment to make sure that we could do that.Â
So the version ofâšLlama that's going to solve all these use cases for users, is that the Â
version that will be powerful enough to replace a programmer you might have in this building?Â
I just think that all this stuff is going to be progressive over time. âšÂ
But in the end case: Llama-10. I think that there's a lot baked Â
into that question. I'm not sure that we're replacing people as much as weâre giving Â
people tools to do more stuff. Is the programmer in this building Â
10x more productive after Llama-10? âš I would hope more. I don't believe that Â
there's a single threshold of intelligence for humanity because people have different skills. Â
I think that at some point AI is probably going to surpass people at most of those things, depending Â
on how powerful the models are. But I think it's progressive and I don't think AGI is one thing. Â
You're basically adding different capabilities. Multimodality is a key one that we're focused on Â
now, initially with photos and images and text but eventually with videos. Because we're so focused Â
on the metaverse, 3D type stuff is important too. One modality that I'm pretty focused on, Â
that I haven't seen as many other people in the industry focus on, is emotional understanding. So Â
much of the human brain is just dedicated to understanding people and understanding Â
expressions and emotions. I think that's its own whole modality, right? You could Â
say that maybe it's just video or image, but it's clearly a very specialized version of those two.Â
So there are all these different capabilities that you want to train the models to focus Â
on, in addition to getting a lot better at reasoning and memory, which is its own whole Â
thing. I don't think in the future we're going to be primarily shoving things into a query context Â
window to ask more complicated questions. There will be different stores of memory or different Â
custom models that are more personalized to people. These are all just different capabilities. Â
Obviously then thereâs making them big and small. We care about both. If you're running something Â
like Meta AI, that's pretty server-based. We also want it running on smart glasses and there's not Â
a lot of space in smart glasses. So you want to have something that's very efficient for that.Â
If you're doing $10Bs worth of inference or even eventually $100Bs, Â
if you're using intelligence in an industrial scale what is the use case? Is it simulations? Â
Is it the AIs that will be in the metaverse? What will we be using the data centers for?Â
Our bet is that it's going to basically change all of the products. I think that there's going Â
to be a kind of Meta AI general assistant product. I think that that will shift from Â
something that feels more like a chatbot, where you ask a question and it formulates an answer, Â
to things where you're giving it more complicated tasks and then it goes away and does them. That's Â
going to take a lot of inference and it's going to take a lot of compute in other ways too.Â
Then I think interacting with other agents for other people is going to be a big part of what Â
we do, whether it's for businesses or creators. A big part of my theory on this is that there's not Â
going to be just one singular AI that you interact with. Every business is going to want an AI that Â
represents their interests. They're not going to want to primarily interact with you through an AI Â
that is going to sell their competitorsâ products. I think creators is going to be a big one. There Â
are about 200 million creators on our platforms. They basically all have the pattern where they Â
want to engage their community but they're limited by the hours in the day. Their community generally Â
wants to engage them, but they don't know that they're limited by the hours in the day. If Â
you could create something where that creator can basically own the AI, train it in the way Â
they want, and engage their community, I think that's going to be super powerful. There's going Â
to be a ton of engagement across all these things. These are just the consumer use cases. My wife and Â
I run our foundation, Chan Zuckerberg Initiative. We're doing a bunch of stuff on science and Â
there's obviously a lot of AI work that is going to advance science and healthcare and all these Â
things. So it will end up affecting basically every area of the products and the economy.Â
You mentioned AI that can just go out and do something for you that's multi-step. Is that Â
a bigger model? With Llama-4 for example, will there still be a version that's 70B but you'll Â
just train it on the right data and that will be super powerful? What does the progression Â
look like? Is it scaling? Is it just the same size but different banks like you were talking about?Â
I don't know that we know the answer to that. I think one thing that seems to be a pattern is that Â
you have the Llama model and then you build some kind of other application specific code around it. Â
Some of it is the fine-tuning for the use case, but some of it is, for example, logic for how Â
Meta AI should work with tools like Google or Bing to bring in real-time knowledge. That's not part Â
of the base Llama model. For Llama-2, we had some of that and it was a little more hand-engineered. Â
Part of our goal for Llama-3 was to bring more of that into the model itself. For Llama-3, Â
as we start getting into more of these agent-like behaviors, I think some of that is going to be Â
more hand-engineered. Our goal for Llama-4Â will be to bring more of that into the model.Â
At each step along the way you have a sense of what's going to be possible on the horizon. You Â
start messing with it and hacking around it. I think that helps you then hone your intuition Â
for what you want to try to train into the next version of the model itself. That makes it more Â
general because obviously for anything that you're hand-coding you can unlock some use cases, but Â
it's just inherently brittle and non-general. âš When you say âinto the model itself,â you train it Â
on the thing that you want in the model itself? What do you mean by âinto the model itselfâ?Â
For Llama- 2, the tool use was very specific, whereas Llama-3 has much better tool use. We Â
don't have to hand code all the stuff to have it use Google and go do a search. It can just do Â
that. Similarly for coding and running code and a bunch of stuff like that. Once you kind of get Â
that capability, then you get a peek at what we can start doing next. We don't necessarily want Â
to wait until Llama-4 is around to start building those capabilities, so we can start hacking around Â
it. You do a bunch of hand coding and that makes the products better, if only for the Â
interim. That helps show the way then of what we want to build into the next version of the model.Â
What is the community fine tune of Llama-3 that you're most excited for? Maybe not the Â
one that will be most useful to you, but the one you'll just enjoy playing with the most. Â
They fine-tune it on antiquity and you'll just be talking to Virgil Â
or something. What are you excited about? I think the nature of the stuff is that you Â
get surprised. Any specific thing that I thought would be valuable, we'd probably be building. I Â
think you'll get distilled versions. I think you'll get smaller versions. One Â
thing is that I think 8B isnât quite small enough for a bunch of use cases. Over time I'd Â
love to get a 1-2B parameter model, or even a 500MÂ parameter model and see what you can do with that.Â
If with 8B parameters weâre nearly as powerful as the largest Llama-2 model, Â
then with a billion parameters you should be able to do something that's interesting, and faster. Â
Itâd be good for classification, or a lot of basic things that people do before understanding Â
the intent of a user query and feeding it to the most powerful model to hone in on Â
what the prompt should be. I think that's one thing that maybe the community can help fill Â
in. We're also thinking about getting around to distilling some of these ourselves but right now Â
the GPUs are pegged training the 405B. âš So you have all these GPUs. I think you Â
said 350,000 by the end of the year. âš That's the whole fleet. We built two, Â
I think 22,000 or 24,000 clusters that are the single clusters that we have for training the big Â
models, obviously across a lot of the stuff that we do. A lot of our stuff goes towards training Â
Reels models and Facebook News Feed and Instagram Feed. Inference is a huge thing for us because we Â
serve a ton of people. Our ratio of inference compute required to training is probably much Â
higher than most other companies that are doing this stuff just because of the sheer volume of Â
the community that we're serving. In the material they shared with Â
me before, it was really interesting that you trained it on more data than is compute optimal Â
just for training. The inference is such a big deal for you guys, and also for the community, Â
that it makes sense to just have this thing and have trillions of tokens in there.Â
Although one of the interesting things about it, even with the 70B, Â
is that we thought it would get more saturated. We trained it on around 15 trillion tokens. I guess Â
our prediction going in was that it was going to asymptote more, but even by the end it was Â
still learning.âšWe probably could have fed it more tokens and it would have gotten somewhat better.Â
At some point you're running a company and you need to do these meta reasoning questions. Do I Â
want to spend our GPUs on training the 70B model further? Do we want to get on with it so we can Â
start testing hypotheses for Llama-4? We needed to make that call and I think we got a reasonable Â
balance for this version of the 70B. There'll be others in the future, the 70B multimodal one, Â
that'll come over the next period. But that was fascinating that the architectures at Â
this point can just take so much data. That's really interesting. What does this Â
imply about future models? You mentioned that the Llama-3 8B is better than the Llama-2 70B.Â
No, no, it's nearly as good. I donât want to overstate Â
it. Itâs in a similar order of magnitude. Does that mean the Llama-4 70B will be Â
as good as the Llama-3 405B? What does the future of this look like?Â
This is one of the great questions, right? I think no one knows. One of the trickiest things in the Â
world to plan around is an exponential curve. How long does it keep going for? Â
I think it's likely enough that we'll keep going. I think itâs worth investing the $10Bs or $100B+ Â
in building the infrastructure and assuming that if it keeps going you're going to get some really Â
amazing things that are going to make amazing products. I don't think anyone in the industry Â
can really tell you that it will continue scaling at that rate for sure. In general in history, Â
you hit bottlenecks at certain points. Now there's so much energy on this that Â
maybe those bottlenecks get knocked over pretty quickly. I think thatâs an interesting question.âšÂ
What does the world look like where there aren't these bottlenecks? Suppose progress just continues Â
at this pace, which seems plausible. Zooming out and forgetting about LlamasâŠÂ
Well, there are going to be different bottlenecks. Over the last few years, I think there was this Â
issue of GPU production. Even companies that had the money to pay for the GPUs couldn't necessarily Â
get as many as they wanted because there were all these supply constraints. Now I think that's sort Â
of getting less. So you're seeing a bunch of companies thinking now about investing a lot Â
of money in building out these things. I think that that will go on for some period of time. Â
There is a capital question. At what point does it stop being worth it to put the capital in?Â
I actually think before we hit that, you're going to run into energy constraints. I don't Â
think anyone's built a gigawatt single training cluster yet. You run into these things that just Â
end up being slower in the world. Getting energy permitted is a very heavily regulated government Â
function. You're going from software, which is somewhat regulated and I'd argue itâs more Â
regulated than a lot of people in the tech community feel. Obviously itâs different if Â
you're starting a small company, maybe you feel that less. We interact with different Â
governments and regulators and we have lots of rules that we need to follow and make sure Â
we do a good job with around the world. But I think that there's no doubt about energy.Â
If you're talking about building large new power plants or large build-outs and then Â
building transmission lines that cross other private or public land, thatâs just a heavily Â
regulated thing. You're talking about many years of lead time. If we wanted to stand up Â
some massive facility, powering that is a very long-term project. I think people do it but I Â
don't think this is something that can be quite as magical as just getting to a level of AI, Â
getting a bunch of capital and putting it in, and then all of a sudden the models are just going toâŠÂ Â
You do hit different bottlenecks along the way. Is there something, maybe an AI-related project or Â
maybe not, that even a company like Meta doesn't have the resources for? Something where if your Â
R&D budget or capex budget were 10x what it is now, then you could pursue it? Something thatâs Â
in the back of your mind but with Meta today, you can't even issue stock or bonds for it? Â
It's just like 10x bigger than your budget? I think energy is one piece. I think we Â
would probably build out bigger clusters than we currently can if we could get the energy to do it.Â
That's fundamentally money-bottlenecked in the limit? If you had $1 trillionâŠÂ
I think itâs time. It depends on how far the exponential curves go. Right now a lot of Â
data centers are on the order of 50 megawatts or 100MW, or a big one might be 150MW. Take a whole Â
data center and fill it up with all the stuff that you need to do for training and you build Â
the biggest cluster you can. I think a bunch of companies are running at stuff like that.Â
But when you start getting into building a data center that's like 300MW or 500MW or 1 GW, Â
no one has built a 1GW data center yet. I think it will happen. This is only a matter of time but Â
it's not going to be next year. Some of these things will take some number of years to build Â
out. Just to put this in perspective, I think a gigawatt would be the size of a meaningful nuclear Â
power plant only going towards training a model. âš Didn't Amazon do this? They have a 950MWâÂ
I'm not exactly sure what they did. You'd have to ask them. âšÂ
But it doesnât have to be in the same place, right? If distributed Â
training works, it can be distributed. Well, I think that is a big question, how Â
that's going to work. It seems quite possible that in the future, more of what we call training for Â
these big models is actually more along the lines of inference generating synthetic data to then go Â
feed into the model. I don't know what that ratio is going to be but I consider the generation of Â
synthetic data to be more inference than training today. Obviously if you're doing it in order Â
to train a model, it's part of the broader training process. So that's an open question, Â
the balance of that and how that plays out. Would that potentially also be the case with Â
Llama-3, and maybe Llama-4 onwards? As in, you put this out and if somebody has a ton of compute, Â
then they can just keep making these things arbitrarily smarter using the models that Â
you've put out. Letâs say thereâs some random country, like Kuwait or the UAE, Â
that has a ton of compute and they can actually just use Llama-4 to make something much smarter.Â
I do think there are going to be dynamics like that, but I also think Â
there is a fundamental limitation on the model architecture. I think like a 70B model that we Â
trained with a Llama-3 architecture can get better, it can keep going. As I was saying, Â
we felt that if we kept on feeding it more data or rotated the high value tokens through again, Â
then it would continue getting better. We've seen a bunch of different companies around Â
the world basically take the Llama-2 70B model architecture and then build a new model. But it's Â
still the case that when you make a generational improvement to something like the Llama-3 70B or Â
the Llama-3 405B, there isnât anything like that open source today. I think that's a big Â
step function. What people are going to be able to build on top of that I think canât go infinitely Â
from there. There can be some optimization in that until you get to the next step function.Â
Let's zoom out a little bit from specific models and even the multi-year lead times Â
you would need to get energy approvals and so on. Big picture, what's happening with AI these Â
next couple of decades? Does it feel like another technology like the metaverse or Â
social, or does it feel like a fundamentally different thing in the course of human history?Â
I think it's going to be pretty fundamental. I think it's going to be more like the creation Â
of computing in the first place. You'll get all these new apps in the same way as when you got Â
the web or you got mobile phones. People basically rethought all these experiences as a lot of things Â
that weren't possible before became possible. So I think that will happen, but I think it's Â
a much lower-level innovation. My sense is that it's going to be more like people going Â
from not having computers to having computers. Itâs very hard to reason about exactly how this Â
goes. In the cosmic scale obviously it'll happen quickly, over a couple of decades or something. Â
There is some set of people who are afraid of it really spinning out and going from being somewhat Â
intelligent to extremely intelligent overnight. I just think that there's all these physical Â
constraints that make that unlikely to happen. I just don't really see that playing out. I think Â
we'll have time to acclimate a bit. But it will really change the way that we work and give people Â
all these creative tools to do different things. I think it's going to really enable people to do Â
the things that they want a lot more. So maybe not overnight, but is it your Â
view that on a cosmic scale we can think of these milestones in this way? Humans evolved, Â
and then AI happened, and then they went out into the galaxy. Maybe it takes many decades, Â
maybe it takes a century, but is that the grand scheme of what's happening right now in history? âšÂ
Sorry, in what sense? In the sense that there were Â
other technologies, like computers and even fire, but the development of AI itself is as Â
significant as humans evolving in the first place. I think that's tricky.âšThe history of humanity Â
has been people basically thinking that certain aspects of humanity are really unique in different Â
ways and then coming to grips with the fact that that's not true, but that humanity is actually Â
still super special. We thought that the earth was the center of the universe and it's not, Â
but humans are still pretty awesome and pretty unique, right?Â
I think another bias that people tend to have is thinking that intelligence Â
is somehow fundamentally connected to life. It's not actually clear that it is. I don't Â
know that we have a clear enough definition of consciousness or life to fully interrogate this. Â
There's all this science fiction about creating intelligence where it starts to take on all these Â
human-like behaviors and things like that. The current incarnation of all this stuff feels like Â
it's going in a direction where intelligence can be pretty separated from consciousness, Â
agency, and things like that, which IÂ think just makes it a super valuable tool.Â
Obviously it's very difficult to predict what direction this stuff goes in over time, Â
which is why I don't think anyone should be dogmatic about how they plan to develop it Â
or what they plan to do. You want to look at it with each release. We're obviously Â
very pro open source, but I haven't committed to releasing every single thing that we do. Â
Iâm basically very inclined to think that open sourcing is going to be good for the Â
community and also good for us because we'll benefit from the innovations. If at some point Â
however there's some qualitative change in what the thing is capable of, and we feel like it's Â
not responsible to open source it, then we won't. It's all very difficult to predict.Â
What is a kind of specific qualitative change where you'd be training Llama-5 or Llama-4, Â
and if you see it, itâd make you think âyou know what, I'm not sure about open sourcing itâ?âšÂ
It's a little hard to answer that in the abstract because there are negative Â
behaviors that any product can exhibit where as long as you can mitigate it, Â
it's okay. Thereâs bad things about social media that we work to mitigate. There's bad things about Â
Llama-2 where we spend a lot of time trying to make sure that it's not like helping people Â
commit violent acts or things like that. That doesn't mean that it's a kind of autonomous or Â
intelligent agent. It just means that it's learned a lot about the world and it can answer a set of Â
questions that we think would be unhelpful for it to answer. I think the question isn't really what Â
behaviors would it show, it's what things would we not be able to mitigate after it shows that.Â
I think that there's so many ways in which something can be good or bad that it's hard Â
to actually enumerate them all up front. Look at what we've had to deal with in social media and Â
the different types of harms. We've basically gotten to like 18 or 19 categories of harmful Â
things that people do and we've basically built AI systems to identify what those things are and Â
to make sure that doesn't happen on our network as much as possible. Over time I think you'll Â
be able to break this down into more of a taxonomy too. I think this is a thing that Â
we spend time researching as well, because we want to make sure that we understand that. âšÂ
It seems to me that it would be a good idea. I would be disappointed in a future where AI Â
systems aren't broadly deployed and everybody doesn't have access to them. At the same time, Â
I want to better understand the mitigations. If the mitigation is the fine-tuning, Â
the whole thing about open weights is that you can then remove the fine-tuning, which is often Â
superficial on top of these capabilities. If it's like talking on Slack with a biology researcherâŠÂ Â
I think models are very far from this. Right now, theyâre like Google search. But if I can Â
show them my Petri dish and they can explain why my smallpox sample didnât grow and what to change, Â
how do you mitigate that? Because somebody can just fine-tune that in there, right?Â
That's true. I think a lot of people will basically use the off-the-shelf model and some Â
people who have basically bad faith are going to try to strip out all the bad stuff. So I do think Â
that's an issue. On the flip side, one of the reasons why I'm philosophically so pro open source Â
is that I do think that a concentration of AI in the future has the potential to be as dangerous as Â
it being widespread. I think a lot of people think about the questions of âif we can do this stuff, Â
is it bad for it to be out in the wild and just widely available?â I think another version of Â
this is that it's probably also pretty bad for one institution to have an AI that is Â
way more powerful than everyone else's AI. Thereâs one security analogy that I think Â
of. There are so many security holes in so many different things. If you could travel back in Â
time a year or two years, let's say you just have one or two years more knowledge of the security Â
holes. You can pretty much hack into any system. Thatâs not AI. So it's not that far-fetched to Â
believe that a very intelligent AI probably would be able to identify some holes and basically Â
be like a human who could go back in time a year or two and compromise all these systems.Â
So how have we dealt with that as a society? One big part is open source software that Â
makes it so that when improvements are made to the software, it doesn't just get stuck in one Â
company's products but can be broadly deployed to a lot of different systems, whether theyâre banks Â
or hospitals or government stuff. As the software gets hardened, which happens because more people Â
can see it and more people can bang on it, there are standards on how this stuff works. The world Â
can get upgraded together pretty quickly. I think that a world where AI is very widely Â
deployed, in a way where it's gotten hardened progressively over time, is one where all the Â
different systems will be in check in a way. That seems fundamentally more healthy to me than one Â
where this is more concentrated. So there are risks on all sides, but I think that's a risk Â
that I don't hear people talking about quite as much. There's the risk of the AI system doing Â
something bad. But I stay up at night worrying more about an untrustworthy actor having the super Â
strong AI, whether it's an adversarial government or an untrustworthy company or whatever. I think Â
that that's potentially a much bigger risk. âš As in, they could overthrow our government because Â
they have a weapon that nobody else has? Or just cause a lot of mayhem. I think the Â
intuition is that this stuff ends up being pretty important and valuable for both Â
economic and security reasons and other things. If someone whom you don't trust or an adversary Â
gets something more powerful, then I think that that could be an issue. Probably the best way Â
to mitigate that is to have good open source AI that becomes the standard and in a lot of Â
ways can become the leader. It just ensures that it's a much more even and balanced playing field.Â
That seems plausible to me. If that works out, that would be the future I prefer. I want to Â
understand mechanistically how the fact that there are open source AI systems in the world Â
prevents somebody causing mayhem with their AI system? With the specific example of somebody Â
coming with a bioweapon, is it just that we'll do a bunch of R&D in the rest of the world to figure Â
out vaccines really fast? What's happening? If you take the security one that I was Â
talking about, I think someone with a weaker AI trying to hack into a Â
system that is protected by a stronger AI will succeed less. In terms of software securityâÂ
How do we know everything in the world is like that? What if bioweapons aren't like that? âšÂ
I mean, I don't know that everything in the world is like that. Bioweapons are one of the Â
areas where the people who are most worried about this stuff are focused and I think it makes a lot Â
of sense. There are certain mitigations. You can try to not train certain knowledge into Â
the model. There are different things but at some level if you get a sufficiently bad actor, Â
and you don't have other AI that can balance them and understand what the threats are, Â
then that could be a risk. That's one of the things that we need to watch out for.Â
Is there something you could see in the deployment of these systems where you're training Llama-4 and Â
it lied to you because it thought you weren't noticing or something and you're like âwhoa Â
what's going on here?â This is probably not likely with a Llama-4 type system, but is Â
there something you can imagine like that where you'd be really concerned about deceptiveness and Â
billions of copies of this being out in the wild? I mean right now we see a lot of hallucinations. Â
It's more so that. I think it's an interesting question, how you would tell the difference Â
between hallucination and deception. There are a lot of risks and things to think about. I try, Â
in running our company at least, to balance these longer-term theoretical risks with Â
what I actually think are quite real risks that exist today. So when you talk about deception, Â
the form of that that I worry about most is people using this to generate misinformation Â
and then pump that through our networks or others. The way that we've combated this type Â
of harmful content is by building AI systems that are smarter than the adversarial ones.Â
This informs part of my theory on this. If you look at the different types of harm that people Â
do or try to do through social networks, there are ones that are not very adversarial. For example, Â
hate speech is not super adversarial in the sense that people aren't getting better at being racist. Â
That's one where I think the AIs are generally getting way more sophisticated faster than people Â
are at those issues. And we have issues both ways. People do bad things, whether they're Â
trying to incite violence or something, but we also have a lot of false positives where we Â
basically censor stuff that we shouldn't. I think that understandably makes a lot of people annoyed. Â
So I think having an AI that gets increasingly precise on that is going to be good over time.Â
But let me give you another example: nation states trying to interfere in elections. That's Â
an example where they absolutely have cutting edge technology and absolutely get better each year. So Â
we block some technique, they learn what we did and come at us with a different technique. It's Â
not like a person trying to say mean things, They have a goal. They're sophisticated. They have a Â
lot of technology. In those cases, I still think about the ability to have our AI systems grow in Â
sophistication at a faster rate than theirs do. It's an arms race but I think we're at least Â
winning that arms race currently. This is a lot of the stuff that I spend time thinking about.Â
Yes, whether it's Llama-4 or Llama-6, we need to think about what behaviors we're observing and Â
it's not just us. Part of the reason why you make this open source is that there are a lot of other Â
people who study this too. So we want to see what other people are observing, what weâre observing, Â
what we can mitigate, and then we'll make our assessment on whether we can make it Â
open source. For the foreseeable future I'm optimistic we will be able to. In the near term, Â
I don't want to take our eye off the ball in terms of what are actual bad things that Â
people are trying to use the models for today. Even if they're not existential, there are Â
pretty bad day-to-day harms that we're familiar with in running our services. That's actually a Â
lot of what we have to spend our time on as well. I found the synthetic data thing really curious. Â
With current models it makes sense why there might be an asymptote with just doing the synthetic data Â
again and again. But letâs say they get smarter and you use the kinds of techniquesâyou talk about Â
in the paper or the blog posts that are coming out on the day this will be releasedâwhere it goes to Â
the thought chain that is the most correct. Why do you think this wouldn't lead to a loop Â
where it gets smarter, makes better output, gets smarter and so forth. Of course it wouldn't be Â
overnight, but over many months or years of training potentially with a smarter model.Â
I think it could, within the parameters of whatever the model architecture is. It's just Â
that with today's 8B parameter models, I don't think you're going to get to be as good as the Â
state-of-the-art multi-hundred billion parameter models that are incorporating Â
new research into the architecture itself. But those will be open source as well, right?Â
Well yeah, subject to all the questions that we just talked about but yes. We would hope that Â
that'll be the case. But I think that at each point, when you're building software there's a Â
ton of stuff that you can do with software but then at some level you're constrained by the Â
chips that it's running on. So there are always going to be different physical constraints. How Â
big the models are is going to be constrained by how much energy you can get and use for Â
inference. I'm simultaneously very optimistic that this stuff will continue to improve quickly Â
and also a little more measured than I think some people are about it. I donât think the Â
runaway case is a particularly likely one. I think it makes sense to keep your options Â
open. There's so much we don't know. There's a case in which it's really important to keep the Â
balance of power so nobody becomes a totalitarian dictator. There's a case in which you don't want Â
to open source the architecture because China can use it to catch up to America's AIs and there is Â
an intelligence explosion and they win that. A lot of things seem possible. Keeping your options open Â
considering all of them seems reasonable. Yeah.Â
Let's talk about some other things. Metaverse. What time period in human history would you be Â
most interested in going into? 100,000 BCE to now, you just want to see what it was like?Â
It has to be the past? Oh yeah, it has to be the past.âšÂ
I'm really interested in American history and classical history. I'm really interested in the Â
history of science too. I actually think seeing and trying to understand more about how some of Â
the big advances came about would be interesting. All we have are somewhat limited writings about Â
some of that stuff. I'm not sure the metaverse is going to let you do that because it's going Â
to be hard to go back in time for things that we don't have records of. I'm actually not sure Â
that going back in time is going to be that important of a thing. I think it's going to Â
be cool for like history classes and stuff, but that's probably not the use case that I'm Â
most excited about for the metaverse overall. The main thing is just the ability to feel Â
present with people, no matter where you are. I think that's going to be killer. In the AI Â
conversation that we were having, so much of it is about physical constraints that underlie all Â
of this. I think one lesson of technology is that you want to move things from the physical Â
constraint realm into software as much as possible because software is so much easier to build and Â
evolve. You can democratize it more because not everyone is going to have a data center but Â
a lot of people can write code and take open source code and modify it. ΀he metaverse Â
version of this is enabling realistic digital presence. Thatâs going to be an absolutely huge Â
difference so people don't feel like they have to be physically together for as many things. Â
Now I think that there can be things that are better about being physically together. These Â
things aren't binary. It's not going to be like âokay, now you don't need to do that anymore.â Â
But overall, I think it's just going to be really powerful for socializing, for feeling Â
connected with people, for working, for parts of industry, for medicine, for so many things. âšÂ
I want to go back to something you said at the beginning of the conversation. You didn't sell Â
the company for a billion dollars. And with the metaverse, you knew you were going to Â
do this even though the market was hammering you for it. I'm curious. What is the source Â
of that edge? You said âoh, values, I have this intuition,â but everybody says that. If Â
you had to say something that's specific to you, how would you express what that is? Why Â
were you so convinced about the metaverse?âš I think that those are different questions. Â
What are the things that power me? We've talked about a bunch of the themes. I just Â
really like building things. I specifically like building things around how people communicate and Â
understanding how people express themselves and how people work. When I was in college Â
I studied computer science and psychology. I think a lot of other people in the industry Â
studied computer science. So, it's always been the intersection of those two things for me.Â
Itâs also sort of this really deep drive. I don't know how to explain it but I just feel Â
constitutionally that I'm doing something wrong if I'm not building something new. Even when we were Â
putting together the business case for investing a $100 billion in AI or some huge amount in the Â
metaverse, we have plans that I think made it pretty clear that if our stuff works, Â
it'll be a good investment. But you can't know for certain from the outset. There are all these Â
arguments that people have, with advisors or different folks. It's like, âhow are you Â
confident enough to do this?â Well the day I stop trying to build new things, I'm just done. I'm Â
going to go build new things somewhere else. I'm fundamentally incapable of running something, Â
or in my own life, and not trying to build new things that I think are interesting. That's not Â
even a question for me, whether we're going to take a swing at building the next thing. I'm Â
just incapable of not doing that. I don't know. I'm kind of like this in all the different aspects Â
of my life. Our family built this ranch in Kauai and I worked on designing all these buildings. We Â
started raising cattle and I'm like âalright, I want to make the best cattle in the world so how Â
do we architect this so that way we can figure this out and build all the stuff up that we Â
need to try to do that.â I don't know, that's me. What was the other part of the question?Â
I'm not sure but I'm actually curious about something else. So a 19-year-old Â
Mark reads a bunch of antiquity and classics in high school and college. Â
What important lesson did you learn from it? Not just interesting things you found, Â
but there aren't that many tokens you consume by the time you're 19. A bunch of them were about the Â
classics. Clearly that was important in some way. There aren't that many tokens you consume... Â
That's a good question. Hereâs one of the things I thought was really fascinating. Augustus became Â
emperor and he was trying to establish peace. There was no real conception of peace at the Â
time. The people's understanding of peace was peace as the temporary time between when your Â
enemies inevitably attack you. So you get a short rest. He had this view of changing the Â
economy from being something mercenary and militaristic to this actually positive-sum Â
thing. It was a very novel idea at the time. Thatâs something that's really fundamental:Â Â
the bounds on what people can conceive of at the time as rational ways to work. Â
This applies to both the metaverse and the AI stuff. A lot of investors, and other people, Â
can't wrap their head around why we would open source this. Itâs like âI don't understand, itâs Â
open source. That must just be the temporary time between which you're making things proprietary, Â
right?â I think it's this very profound thing in tech that it actually creates a lot of winners.Â
I don't want to strain the analogy too much but I do think that a lot of the time, Â
there are models for building things that people often can't even wrap their head Â
around. They canât understand how that would be a valuable thing for people to do or how it would be Â
a reasonable state of the world. I think there are more reasonable things than people think.Â
That's super fascinating. Can I give you what I was thinking in terms of what you might have Â
gotten from it? This is probably totally off, but I think itâs just how young some of these Â
people are, who have very important roles in the empire. For example, Caesar Augustus, Â
by the time heâs 19, is already one of the most important people in Roman politics. He's leading Â
battles and forming the Second Triumvirate. IÂ wonder if the 19-year-old you was thinking âIÂ Â
can do this because Caesar Augustus did this.â That's an interesting example, both from a lot Â
of history and American history too. One of my favorite quotes is this Picasso quote that all Â
children are artists and the challenge is to remain an artist as you grow up. When youâre Â
younger, itâs just easier to have wild ideas. There are all these analogies to the innovatorâs Â
dilemma that exist in your life as well as for your company or whatever youâve built. Youâre Â
earlier on in your trajectory so it's easier to pivot and take in new ideas without disrupting Â
other commitments to different things. I think that's an interesting part of Â
running a company. How do you stay dynamic? Letâs go back to the investors and open source. Â
The $10B model, suppose it's totally safe. You've done these evaluations and unlike in this case Â
the evaluators can also fine-tune the model, which hopefully will be the case in future models. Would Â
you open source the $10 billion model? As long as it's helping us then yeah.Â
But would it? $10 billion of R&D and now it's open source.Â
Thatâs a question which weâll have to evaluate as time goes on too. We have a long history of Â
open sourcing software. We donât tend to open source our product. We don't take the code for Â
Instagram and make it open source. We take a lot of the low-level infrastructure and Â
we make that open source. Probably the biggest one in our history was our Open Compute Project Â
where we took the designs for all of our servers, network switches, and data centers, and made it Â
open source and it ended up being super helpful. Although a lot of people can design servers the Â
industry now standardized on our design, which meant that the supply chains basically all got Â
built out around our design. So volumes went up, it got cheaper for everyone, and it saved Â
us billions of dollars which was awesome. So there's multiple ways where open source Â
could be helpful for us. One is if people figure out how to run the models more cheaply. We're Â
going to be spending tens, or a hundred billion dollars or more over time on all this stuff. So Â
if we can do that 10% more efficiently, we're saving billions or tens of billions of dollars. Â
That's probably worth a lot by itself. Especially if there are other competitive models out there, Â
it's not like our thing is giving away some kind of crazy advantage.Â
So is your view that the training will be commodified?Â
I think there's a bunch of ways that this could play out and that's one. So âcommodityâ implies Â
that it's going to get very cheap because there are lots of options. The other direction that this Â
could go in is qualitative improvements. You mentioned fine-tuning. Right now it's pretty Â
limited what you can do with fine-tuning major other models out there. There are some options Â
but generally not for the biggest models. Thereâs being able to do that, different app specific Â
things or use case specific things or building them into specific tool chains. I think that will Â
not only enable more efficient development, but it could enable qualitatively different things.Â
Here's one analogy on this. One thing that I think generally sucks about the mobile ecosystem is that Â
you have these two gatekeeper companies, Apple and Google, that can tell you what you're allowed to Â
build. There's the economic version of that which is like when we build something and they just Â
take a bunch of your money. But then there's the qualitative version, which is actually what upsets Â
me more. There's a bunch of times when we've launched or wanted to launch features and Apple's Â
just like ânope, you're not launching that.â That sucks, right? So the question is, are we set up Â
for a world like that with AI? You're going to get a handful of companies that run these closed Â
models that are going to be in control of the APIs and therefore able to tell you what you can build?Â
For us I can say it is worth it to go build a model ourselves to make sure that we're not Â
in that position. I don't want any of those other companies telling us what we can build. Â
From an open source perspective, I think a lot of developers don't want those companies telling them Â
what they can build either. So the question is, what is the ecosystem that gets built out around Â
that? What are interesting new things? How much does that improve our products? I think there Â
are lots of cases where if this ends up being like our databases or caching systems or architecture, Â
we'll get valuable contributions from the community that will make our stuff better. Â
Our app specific work that we do will then still be so differentiated that it won't really matter. Â
We'll be able to do what we do. We'll benefit and all the systems, ours and the communitiesâ, Â
will be better because it's open source. There is one world where maybe Â
thatâs not the case. Maybe the model ends up being more of the product itself. I think it's Â
a trickier economic calculation then, whether you open source that. You are commoditizing Â
yourself then a lot. But from what I can see so far, it doesn't seem like we're in that zone.Â
Do you expect to earn significant revenue from licensing your model to the cloud Â
providers? So they have to pay you a fee to actually serve the model.Â
We want to have an arrangement like that but I don't know how significant it'll be. This is Â
basically our license for Llama. In a lot of ways it's a very permissive open source license, except Â
that we have a limit for the largest companies using it. This is why we put that limit in. We're Â
not trying to prevent them from using it. We just want them to come talk to us if they're going to Â
just basically take what we built and resell it and make money off of it. If you're like Microsoft Â
Azure or Amazon, if you're going to be reselling the model then we should have some revenue share Â
on that. So just come talk to us before you go do that. That's how that's played out.Â
So for Llama-2, we just have deals with basically all these major cloud companies and Llama-2 is Â
available as a hosted service on all those clouds. I assume that as we release bigger Â
and bigger models, that will become a bigger thing. It's not the main thing that we're doing, Â
but I think if those companies are going to be selling our models it just makes sense that we Â
should share the upside of that somehow. Regarding other open source dangers, Â
I think you have genuine legitimate points about the balance of power stuff and potentially the Â
harms you can get rid of because we have better alignment techniques or something. I wish there Â
were some sort of framework that Meta had. Other labs have this where they say âif we see this Â
concrete thing, then that's a no go on the open source or even potentially on deployment.â Just Â
writing it down so the company is ready for it and people have expectations around it and so forth. âšÂ
That's a fair point on the existential risk side. Right now we focus more on the types of Â
risks that we see today, which are more of these content risks. We don't want the model to be doing Â
things that are helping people commit violence or fraud or just harming people in different Â
ways. While it is maybe more intellectually interesting to talk about the existential risks, Â
I actually think the real harms that need more energy in being mitigated are things where someone Â
takes a model and does something to hurt a person. In practice for the current models, Â
and I would guess the next generation and maybe even the generation after that, Â
those are the types of more mundane harms that we see today, people committing fraud against each Â
other or things like that. I just don't want to shortchange that. I think we have a responsibility Â
to make sure we do a good job on that. Meta's a big company. You can handle both.Â
As far as open source goes, I'm actually curious if you think the impact of open source, Â
from PyTorch, React, Open Compute and other things, has been bigger for the world than Â
even the social media aspects of Meta. I've talked to people who use these services Â
and they think that it's plausible because a big part of the internet runs on these things.Â
It's an interesting question. I mean almost half the world uses our consumer products so Â
it's hard to beat that. But I think open source is really powerful as a new way of Â
building things. I mean, it's possible. It may be one of these things like Bell Labs, Â
where they were working on the transistor because they wanted to enable long-distance calling. They Â
did and it ended up being really profitable for them that they were able to enable long-distance Â
calling. 5 to 10 years out from that, if you asked them what was the most useful thing Â
that they invented it's like âokay, we enabled long distance calling and now all these people Â
are long-distance calling.â But if you asked a hundred years later maybe it's a different answer.Â
I think that's true of a lot of the things that we're building: Reality Labs, some of the AI Â
stuff, some of the open source stuff. The specific products evolve, and to some degree come and go, Â
but the advances for humanity persist and that's a cool part of what we all get to do.Â
By when will the Llama models be trained on your own custom silicon? âšÂ
Soon, not Llama-4. The approach that we took is we first built custom silicon that could handle Â
inference for our ranking and recommendation type stuff, so Reels, News Feed ads, etc. That Â
was consuming a lot of GPUs. When we were able to move that to our own silicon, we're now able Â
to use the more expensive NVIDIA GPUs only for training. At some point we will hopefully have Â
silicon ourselves that we can be using for at first training some of the simpler things, then Â
eventually training these really large models. In the meantime, I'd say the program is going quite Â
well and we're just rolling it out methodically and we have a long-term roadmap for it. âšÂ
Final question. This is totally out of left field. If you were made CEO of Google+ Â
could you have made it work? Google+? Oof. I don't know. Â
That's a very difficult counterfactual. âš Okay, then the real final question will be:Â Â
when Gemini was launched, was there any chance that somebody Â
in the office uttered: âCarthago delenda estâ. No, I think we're tamer now. It's a good question. Â
The problem is there was no CEO of Google+. It was just a division within a company. You asked Â
before about what are the scarcest commodities but you asked about it in terms of dollars. I Â
actually think for most companies, of this scale at least, it's focus. When you're a startup maybe Â
you're more constrained on capital. Youâre just working on one idea and you might not have all Â
the resources. You cross some threshold at some point with the nature of what you're doing. You're Â
building multiple things. You're creating more value across them but you become more Â
constrained on what you can direct to go well. There are always the cases where something Â
random awesome happens in the organization and IÂ don't even know about it. Those are great. But IÂ Â
think in general, the organization's capacity is largely limited by what the CEO and the Â
management team are able to oversee and manage. That's been a big focus for us. As Ben Horowitz Â
says âkeep the main thing, the main thingâ and try to stay focused on your key priorities.Â
Awesome,âšthat was excellent, Mark. Thanks so much. That was a lot of fun.Â
Yeah, really fun. Thanks for having me. Absolutely.
5.0 / 5 (0 votes)