Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters

Dwarkesh Podcast
18 Apr 202478:38


TLDRIn a thought-provoking interview, the speaker, presumably Mark Zuckerberg, discusses the future of AI with a focus on Meta AI's advancements. He highlights the release of Llama-3, an open-source AI model integrated with Google and Bing for real-time knowledge, emphasizing its capabilities in image generation and natural language processing. Zuckerberg also addresses the challenges of building large-scale data centers, the risks of centralized AI control, and the importance of open-source contributions. He stresses the potential of AI to revolutionize various sectors, including science and healthcare, and shares his vision of AI as a tool that enhances human productivity rather than replacing it. The conversation delves into the implications of AI development, the balance between innovation and safety, and the significance of open-source software in democratizing AI technology.


  • ūü§Ė The new version of Meta AI, Llama-3, is set to be the most intelligent, freely-available AI assistant, integrating with Google and Bing for real-time knowledge and featuring enhanced creation capabilities like animations and real-time image generation.
  • ūüöÄ Meta is training multiple versions of the Llama model, including an 8 billion parameter model released for the developer community and a 405 billion parameter model still in training, aiming to push the boundaries of AI capabilities.
  • ūüĆź The release of Llama-3 is not global but will start in a few countries, with plans for a wider rollout in the coming months, reflecting a strategic approach to introducing advanced AI technologies.
  • ūüďą Mark Zuckerberg emphasizes the importance of open-source AI, believing it to be beneficial for the community and for Meta, allowing for broader innovation and a more level playing field in the AI industry.
  • ūüõ°ÔłŹ There is a commitment to responsible AI development, with considerations for not releasing certain models if they present irresolvable negative behaviors or risks, highlighting a cautious approach to AI's potential downsides.
  • ‚öôÔłŹ Meta is investing in custom silicon to improve the efficiency of AI model training and inference, which could significantly reduce costs and improve performance for their AI-driven services.
  • ūüĆü Zuckerberg shares his passion for building new things and his belief in the potential of AI to enable creativity and productivity, reflecting his personal drive and the company's mission.
  • ūüĒģ The potential of AI is compared to the creation of computing itself, suggesting a fundamental shift in how people work and live, with AI becoming an integral part of various industries and aspects of life.
  • ūüí° Open source contributions, such as PyTorch and React, are considered powerful drivers of innovation and have possibly had a significant impact on the world, potentially rivaling the reach of Meta's social media products.
  • ‚öĖÔłŹ There's a discussion on the balance of power in AI development, with concerns about the risks of having a single entity with disproportionately strong AI capabilities, advocating for a decentralized approach.
  • ūüŹõ Zuckerberg draws an analogy between historical shifts in understanding, like the concept of peace under Augustus, and current paradigm shifts in technology and business models, emphasizing the importance of challenging conventional thinking.

Q & A

  • What is the main update to Meta AI that Mark Zuckerberg discusses in the interview?

    -The main update is the rollout of Llama-3, an AI model that is both open source and will power Meta AI. It is considered the most intelligent, freely-available AI assistant at the time of the interview.

  • How does Meta AI integrate with other search engines?

    -Meta AI integrates with Google and Bing for real-time knowledge, making it more prominent across apps like Facebook and Messenger.

  • What new creation features does Meta AI introduce?

    -Meta AI introduces features like animations, where any image can be animated, and real-time high-quality image generation as users type their queries.

  • What are the technical specifications of the Llama-3 model that Mark Zuckerberg finds exciting?

    -Mark Zuckerberg is excited about the Llama-3 model, which includes an 8 billion parameter model and a 70 billion parameter model. There's also a 405 billion parameter model in training.

  • What is the roadmap for future releases of Meta AI?

    -The roadmap includes new releases that will bring multimodality, more multi-linguality, and bigger context windows. There are plans to roll out the 405B model later in the year.

  • How does Mark Zuckerberg perceive the risk of having a few companies controlling closed AI models?

    -He sees it as a significant risk, as it could lead to these companies dictating what others can build, creating a situation similar to the control exerted by Apple over app features.

  • What is the strategy behind Meta's acquisition of GPUs like the H100?

    -The strategy was to ensure they had enough capacity to build something they couldn't foresee on the horizon yet, doubling the order to be prepared for future needs beyond the immediate requirements for Reels and content ranking.

  • Why did Mark Zuckerberg decide not to sell Facebook in 2006 for $1 billion?

    -Mark felt a deep conviction in what they were building and believed that if he sold the company, he would just build another similar one. He also lacked the financial sophistication to engage in the billion-dollar valuation debate.

  • What is the role of Facebook AI Research (FAIR) in the development of Meta's AI?

    -FAIR, established about 10 years prior, has been instrumental in creating innovations that improved Meta's products. It transitioned from a pure research group to a key player in integrating AI into Meta's products, with the creation of the gen AI group.

  • How does Meta plan to approach the development of more advanced AI models like Llama-4?

    -Meta plans to continue training larger models, incorporating more capabilities like reasoning and memory, and focusing on multimodality and emotional understanding. They aim to make AI more integrated into various aspects of their products and services.

  • What are the potential future challenges in scaling AI models?

    -Challenges include physical constraints like energy limitations for training large models, regulatory hurdles for building new power plants and transmission lines, and the balance between open sourcing models and potential risks associated with them.

  • How does Mark Zuckerberg view the future of AI and its impact on society?

    -He sees AI as a fundamental shift, similar to the creation of computing, that will enable new applications and experiences. However, he also acknowledges the need for careful consideration of risks and the importance of a balanced approach to AI development and deployment.



ūüöÄ AI Innovation and Meta AI's New Features

The speaker expresses an inherent drive to continually innovate and build new features, despite challenges from entities like Apple. The conversation introduces Meta AI's latest advancements, highlighting the release of Llama-3, an open-source AI model that integrates with Google and Bing for real-time knowledge. New features include image animation and real-time high-quality image generation based on user queries. The speaker emphasizes Meta AI's commitment to making AI more accessible and enhancing its capabilities across various applications.


ūü§Ė The Future of AI and Meta's Strategic Investments

The discussion delves into the strategic foresight behind Meta's investment in GPUs for AI model training. The speaker reflects on the importance of capacity planning for unforeseen technological advancements, drawing parallels with past decisions that have shaped the company's direction. The conversation also touches on the speaker's personal philosophy on company valuation and the significance of Facebook AI Research (FAIR) in driving product innovation.


ūü߆ AGI and the Evolution of Meta's AI Strategy

The speaker outlines the evolution of Meta's approach to AI, from the inception of FAIR to the current focus on general AI (AGI). The importance of coding and reasoning in training AI models is emphasized, highlighting how these capabilities enhance the AI's performance across various domains. The conversation explores the concept of AI as a progressive tool that augments human capabilities rather than replacing them.


ūüĆź Multimodal AI and the Future of Interaction

The speaker envisions a future where AI capabilities become more integrated and sophisticated, covering emotional understanding and multimodal interactions. The potential for personalized AI models and the impact of AI on industrial-scale operations are discussed. The conversation also addresses the idea of AI agents representing businesses and creators, and the importance of open-source AI in maintaining a balanced technological landscape.


ūüďą Scaling AI Models and Meta's Computational Challenges

The speaker discusses the challenges and strategies related to scaling AI models, including the physical and computational constraints of training large models like Llama-3. The conversation explores the concept of using inference to generate synthetic data for training and the potential for smaller, fine-tuned models to play a significant role in various applications. The speaker also addresses the importance of community contributions in advancing AI technology.


ūüĆü The Impact of Open Source on AI and Technology

The speaker reflects on the impact of open-source contributions from Meta, such as PyTorch and React, and their potential long-term significance. The conversation considers whether open-source efforts could have a more profound impact than Meta's social media products, given their widespread use across the internet. The speaker also discusses the future integration of Llama models with custom silicon for more efficient training.


ūü§Ē Navigating Open Source Risks and Future AI Developments

The speaker addresses concerns about the potential risks of open sourcing powerful AI models, including the possibility of misuse. The conversation focuses on the importance of balancing theoretical risks with practical, everyday harms, and the responsibility to mitigate these risks. The speaker also shares thoughts on the future of AI, including the potential for AI to become a commodified training resource and the economic considerations of open sourcing high-value models.


ūüĆü The Value of Focus and Meta's Management Strategy

The speaker discusses the concept of focus as a scarce commodity, especially for large companies, and its importance in driving the company's success. The conversation touches on the challenges of managing multiple projects and the need to maintain a sharp focus on key priorities. The speaker also reflects on the unpredictability of success in technology and the importance of trying new things.



ūüí°AI Assistant

An AI assistant is an artificial intelligence software that performs tasks or services for users, such as answering questions, setting reminders, or providing recommendations. In the script, the development of Meta AI's Llama-3 model is discussed, which is designed to be an intelligent, freely-available AI assistant that integrates with platforms like Facebook and Messenger, allowing users to interact with it through search boxes for real-time queries and responses.

ūüí°Open Source

Open source refers to a type of software where the source code is made available to the public, allowing anyone to view, use, modify, and distribute the software. The script discusses Meta's decision to release the Llama-3 model as open source, emphasizing the benefits of community contributions and the prevention of a single entity having control over advanced AI capabilities.

ūüí°Data Center

A data center is a facility that houses a large number of servers, storage systems, and other components connected through a network. The script mentions the construction of data centers with high energy consumption, such as 300 Megawatts or 1 Gigawatt, which are necessary for training large AI models like Llama-3.


In the context of AI, a parameter is a variable in a model that the machine learning algorithm can adjust to improve the model's performance. The script discusses different versions of the Llama model with varying numbers of parameters, such as an 8 billion parameter model and a 70 billion parameter model, highlighting the scale and complexity of these AI systems.


Multimodality in AI refers to the ability of a system to process and understand information from multiple different modes of input, such as text, images, and video. The script mentions Meta's focus on developing multimodal capabilities in their AI models to enhance their functionality and user interaction.


A benchmark is a standard or point of reference against which things may be compared or assessed. In AI, benchmarks are used to evaluate the performance of models against specific tasks. The script discusses the Llama-3 model's performance on benchmarks, indicating its effectiveness and reasoning capabilities.


In AI, inference is the process of deriving conclusions or making decisions based on known information. The script talks about the significant role of inference in serving a large user base, as it requires a substantial amount of computational resources to apply the trained AI models to new data or situations.

ūüí°Meta AI

Meta AI refers to the artificial intelligence division within the company Meta (formerly known as Facebook, Inc.). The script discusses the advancements in Meta AI, particularly the release of the Llama-3 model, which is intended to be the most intelligent AI assistant available to the public.

ūüí°Training Cluster

A training cluster is a group of interconnected computers that work together to train machine learning models. The script mentions the development and scaling of training clusters, which are essential for handling the large-scale computations required to train complex AI models like Llama-3.

ūüí°Content Risks

Content risks refer to the potential negative outcomes or harms that can arise from the use of AI systems, such as the spread of misinformation, promotion of harmful behavior, or facilitation of violence. The script emphasizes the importance of mitigating content risks associated with AI models, particularly in preventing the use of these models to cause harm to individuals or society.

ūüí°Economic Constraints

Economic constraints refer to the limitations or restrictions faced by an organization due to financial considerations. The script discusses how economic constraints, such as the cost of GPUs and energy, impact the development and scaling of AI models and data centers.


Meta AI is releasing an upgraded model called Llama-3, which is set to be the most intelligent, freely-available AI assistant.

Llama-3 will be available as open source for developers and will also power Meta AI, integrating with Google and Bing for real-time knowledge.

New creation features have been added, including the ability to animate any image and generate high-quality images in real time as you type your query.

Meta AI's new version is initially rolling out in a few countries, with plans for broader availability in the coming weeks and months.

Technically, Llama-3 comes in three versions: an 8 billion parameter model, a 70 billion parameter model released today, and a 405 billion parameter model still in training.

The 70 billion parameter model of Llama-3 has scored highly on benchmarks for math and reasoning, while the 405 billion parameter model is expected to lead in benchmarks upon completion.

Meta has a roadmap for future releases that include multimodality, more multilinguality, and larger context windows.

The decision to invest in GPUs for AI was driven by the need for more capacity to train models for content recommendation in services like Reels.

The capability of showing content from unconnected sources on platforms like Instagram and Facebook represents a significant unlock for user engagement.

The importance of open source in AI development, ensuring a balanced and competitive ecosystem, and the potential risks of concentrated AI power.

The potential for AI to surpass human intelligence in most domains progressively, and the focus on capabilities like emotional understanding and reasoning.

Meta's commitment to addressing the risks of misinformation and the importance of building AI systems to combat adversarial uses.

The vision of AI as a tool that enhances human capabilities rather than replacing them, aiming for increased productivity and creativity.

The significance of the metaverse in enabling realistic digital presence and its potential impact on socializing, working, and various industries.

Mark Zuckerberg's personal drive to continuously build new things and the philosophy behind investing in large-scale projects like AI and the metaverse.

The historical perspective on the development of peace and economy, drawing parallels to modern innovations in tech and the concept of open source.

The potential for custom silicon to revolutionize the training of large AI models and the strategic move to first optimize inference processes.



That's not even a question for me - whether  we're going to go take a swing at building  


the next thing. I'm just incapable of not doing  that. There's a bunch of times when we wanted to  


launch features and then Apple's just like  nope you're not launching that I was like  


that sucks. Are we set up for that with AI where  you're going to get a handful of companies that  


run these closed models that are going to be in  control of the apis and therefore are going to be  


able to tell you what you can build? Then when  you start getting into building a data center  


that's like 300 Megawatts or 500 Megawatts or a  Gigawatt - just no one has built a single Gigawatt  


data center yet. From wherever you sit there's  going to be some actor who you don't trust - if  


they're the ones who have the super strong AI I  think that that's potentially a much bigger risk


Mark, welcome to the podcast. Thanks for having me. Big fan of your podcast. 


Thank you, that's very nice of you to say.  Let's start by talking about the releases  


that will go out when this interview  goes out. Tell me about the models and  


Meta AI. What’s new and exciting about them? I think the main thing that most people in the  


world are going to see is the new version of  Meta AI. The most important thing that we're  


doing is the upgrade to the model. We're  rolling out Llama-3. We're doing it both  


as open source for the dev community and it is  now going to be powering Meta AI. There's a lot  


that I'm sure we'll get into around Llama-3,  but I think the bottom line on this is that  


we think now that Meta AI is the most intelligent,  freely-available AI assistant that people can use.  


We're also integrating Google  and Bing for real-time knowledge. 


We're going to make it a lot more prominent across  our apps. At the top of Facebook and Messenger,  


you'll be able to just use the search box right  there to ask any question. There's a bunch of new  


creation features that we added that I think are  pretty cool and that I think people will enjoy.  


I think animations is a good one. You can  basically take any image and just animate it. 


One that people are going to find pretty wild  is that it now generates high quality images  


so quickly that it actually generates it as  you're typing and updates it in real time.  


So you're typing your query and it's honing  in. It’s like “show me a picture of a cow in  


a field with mountains in the background, eating¬† macadamia nuts, drinking beer‚ÄĚ and it's updating¬†¬†


the image in real time. It's pretty wild. I  think people are going to enjoy that. So I  


think that's what most people are going to see in  the world. We're rolling that out, not everywhere,  


but we're starting in a handful of countries and  we'll do more over the coming weeks and months.  


I think that’s going to be a pretty big deal  and I'm really excited to get that in people's  


hands. It's a big step forward for Meta AI. But I think if you want to get under the hood  


a bit, the Llama-3 stuff is obviously the most  technically interesting. We're training three  


versions: an 8 billion parameter model and a 70  billion, which we're releasing today, and a 405  


billion dense model, which is still training. So  we're not releasing that today, but I'm pretty  


excited about how the 8B and the 70B turned out.  They're leading for their scale. We'll release a  


blog post with all the benchmarks so people can  check it out themselves. Obviously it's open  


source so people get a chance to play with it. We have a roadmap of new releases coming that  


are going to bring multimodality, more  multi-linguality, and bigger context  


windows as well. Hopefully, sometime later in the  year we'll get to roll out the 405B. For where it  


is right now in training, it is already  at around 85 MMLU and we expect that it's  


going to have leading benchmarks on a bunch of the  benchmarks. I'm pretty excited about all of that.  


The 70 billion is great too. We're releasing that  today. It's around 82 MMLU and has leading scores  


on math and reasoning. I think just getting this  in people's hands is going to be pretty wild. 


Oh, interesting. That's the first I’m hearing  of it as a benchmark. That's super impressive. 


The 8 billion is nearly as powerful as the  biggest version of Llama-2 that we released.  


So the smallest Llama-3 is basically  as powerful as the biggest Llama-2. 


Before we dig into these models, I want to go  back in time. I'm assuming 2022 is when you  


started acquiring these H100s, or you can tell me  when. The stock price is getting hammered. People  


are asking what's happening with all this  capex. People aren't buying the metaverse.  


Presumably you're spending that capex to get  these H100s. How did you know back then to get the  


H100s? How did you know that you’d need the GPUs? I think it was because we were working on Reels.  


We always want to have enough capacity to build  something that we can't quite see on the horizon  


yet. We got into this position with Reels where we  needed more GPUs to train the models. It was this  


big evolution for our services. Instead of just  ranking content from people or pages you follow,  


we made this big push to start recommending what  we call unconnected content, content from people  


or pages that you're not following. 
 The corpus of content candidates that  


we could potentially show you expanded from  on the order of thousands to on the order of  


hundreds of millions. It needed a completely  different infrastructure. We started working  


on doing that and we were constrained on  the infrastructure in catching up to what  


TikTok was doing as quickly as we wanted to. I  basically looked at that and I was like “hey,  


we have to make sure that we're never in this  situation again. So let's order enough GPUs to do  


what we need to do on Reels and ranking content¬† and feed. But let's also double that.‚ÄĚ Again,¬†¬†


our normal principle is that there's going to be  something on the horizon that we can't see yet. 


Did you know it would be AI? We thought it was going to be something that  


had to do with training large models. At the time  I thought it was probably going to be something  


that had to do with content. It’s just the pattern  matching of running the company, there's always  


another thing. At that time I was so deep into  trying to get the recommendations working for  


Reels and other content. That’s just such a big  unlock for Instagram and Facebook now, being  


able to show people content that's interesting to  them from people that they're not even following. 


But that ended up being a very good decision  in retrospect. And it came from being behind.  


It wasn't like ‚Äúoh, I was so far ahead.‚Ä̬† Actually, most of the times where we make¬†¬†


some decision that ends up seeming good  is because we messed something up before  


and just didn't want to repeat the mistake. This is a total detour, but I want to ask  


about this while we're on this. We'll get back  to AI in a second. In 2006 you didn't sell for  


$1 billion but presumably there's some amount you  would have sold for, right? Did you write down  


in your head like “I think the actual valuation  of Facebook at the time is this and they're not  


actually getting the valuation right‚ÄĚ? If they‚Äôd¬† offered you $5 trillion, of course you would have¬†¬†


sold. So how did you think about that choice? 
 I think some of these things are just personal.  


I don't know that at the time I was sophisticated  enough to do that analysis. I had all these people  


around me who were making all these arguments for  a billion dollars like “here's the revenue that  


we need to make and here's how big we need to be.¬† It's clearly so many years in the future.‚ÄĚ It was¬†¬†


very far ahead of where we were at the time. I  didn't really have the financial sophistication  


to really engage with that kind of debate. Deep down I believed in what we were doing.  


I did some analysis like “what would I do if I  weren’t doing this? Well, I really like building  


things and I like helping people communicate. I  like understanding what's going on with people and  


the dynamics between people. So I think if I sold  this company, I'd just go build another company  


like this and I kind of like the one I have.¬† So why?‚ÄĚ I think a lot of the biggest bets that¬†¬†


people make are often just based on conviction and  values. It's actually usually very hard to do the  


analyses trying to connect the dots forward. You've had Facebook AI Research for a long  


time. Now it's become seemingly central to  your company. At what point did making AGI,  


or however you consider that mission,  become a key priority of what Meta is doing? 


It's been a big deal for a while. We started  FAIR about 10 years ago. The idea was that,  


along the way to general intelligence or whatever  you wanna call it, there are going to be all these  


different innovations and that's going to  just improve everything that we do. So we  


didn't conceive of it as a product. It was  more of a research group. Over the last 10  


years it has created a lot of different things  that have improved all of our products. It’s  


advanced the field and allowed other people in  the field to create things that have improved our  


products too. I think that that's been great. There's obviously a big change in the last  


few years with ChatGPT and the diffusion  models around image creation coming out.  


This is some pretty wild stuff that is  pretty clearly going to affect how people  


interact with every app that's out there. At that  point we started a second group, the gen AI group,  


with the goal of bringing that stuff into our  products and building leading foundation models  


that would power all these different products. 
 When we started doing that the theory initially  


was that a lot of the stuff we're doing is  pretty social. It's helping people interact  


with creators, helping people interact with  businesses, helping businesses sell things or  


do customer support. There’s also basic assistant  functionality, whether it's for our apps or the  


smart glasses or VR. So it wasn't completely  clear at first that you were going to need full  


AGI to be able to support those use cases. But in  all these subtle ways, through working on them,  


I think it's actually become clear that you do.  For example, when we were working on Llama-2,  


we didn't prioritize coding because people  aren't going to ask Meta AI a lot of coding  


questions in WhatsApp. Now they will, right? 


I don't know. I'm not sure that WhatsApp, or  Facebook or Instagram, is the UI where people are  


going to be doing a lot of coding questions. Maybe  the website,, that we’re launching. But  


the thing that has been a somewhat surprising  result over the last 18 months is that it turns  


out that coding is important for a lot of domains,  not just coding. Even if people aren't asking  


coding questions, training the models on coding  helps them become more rigorous in answering the  


question and helps them reason across a lot of  different types of domains. That's one example  


where for Llama-3, we really focused on training  it with a lot of coding because that's going  


to make it better on all these things even if  people aren't asking primarily coding questions. 


Reasoning is another example. Maybe you want  to chat with a creator or you're a business and  


you're trying to interact with a customer.  That interaction is not just like “okay,  


the person sends you a message and you¬† just reply.‚ÄĚ It's a multi-step interaction¬†¬†


where you're trying to think through ‚Äúhow do I¬† accomplish the person's goals?‚ÄĚ A lot of times¬†¬†


when a customer comes, they don't necessarily  know exactly what they're looking for or how  


to ask their questions. So it's not really the  job of the AI to just respond to the question. 


You need to kind of think about it  more holistically. It really becomes  


a reasoning problem. So if someone else solves  reasoning, or makes good advances on reasoning,  


and we're sitting here with a basic chat bot,  then our product is lame compared to what other  


people are building. At the end of the day, we  basically realized we've got to solve general  


intelligence and we just upped the ante and the  investment to make sure that we could do that. 


So the version of
Llama that's going to solve  all these use cases for users, is that the  


version that will be powerful enough to replace  a programmer you might have in this building? 


I just think that all this stuff is  going to be progressive over time. 


But in the end case: Llama-10. I think that there's a lot baked  


into that question. I'm not sure that we're  replacing people as much as we’re giving  


people tools to do more stuff. Is the programmer in this building  


10x more productive after Llama-10? 
 I would hope more. I don't believe that  


there's a single threshold of intelligence for  humanity because people have different skills.  


I think that at some point AI is probably going to  surpass people at most of those things, depending  


on how powerful the models are. But I think it's  progressive and I don't think AGI is one thing.  


You're basically adding different capabilities.  Multimodality is a key one that we're focused on  


now, initially with photos and images and text but  eventually with videos. Because we're so focused  


on the metaverse, 3D type stuff is important  too. One modality that I'm pretty focused on,  


that I haven't seen as many other people in the  industry focus on, is emotional understanding. So  


much of the human brain is just dedicated  to understanding people and understanding  


expressions and emotions. I think that's  its own whole modality, right? You could  


say that maybe it's just video or image, but it's  clearly a very specialized version of those two. 


So there are all these different capabilities  that you want to train the models to focus  


on, in addition to getting a lot better at  reasoning and memory, which is its own whole  


thing. I don't think in the future we're going to  be primarily shoving things into a query context  


window to ask more complicated questions. There  will be different stores of memory or different  


custom models that are more personalized to  people. These are all just different capabilities.  


Obviously then there’s making them big and small.  We care about both. If you're running something  


like Meta AI, that's pretty server-based. We also  want it running on smart glasses and there's not  


a lot of space in smart glasses. So you want to  have something that's very efficient for that. 


If you're doing $10Bs worth of  inference or even eventually $100Bs,  


if you're using intelligence in an industrial  scale what is the use case? Is it simulations?  


Is it the AIs that will be in the metaverse?  What will we be using the data centers for? 


Our bet is that it's going to basically change  all of the products. I think that there's going  


to be a kind of Meta AI general assistant  product. I think that that will shift from  


something that feels more like a chatbot, where  you ask a question and it formulates an answer,  


to things where you're giving it more complicated  tasks and then it goes away and does them. That's  


going to take a lot of inference and it's going  to take a lot of compute in other ways too. 


Then I think interacting with other agents for  other people is going to be a big part of what  


we do, whether it's for businesses or creators. A  big part of my theory on this is that there's not  


going to be just one singular AI that you interact  with. Every business is going to want an AI that  


represents their interests. They're not going to  want to primarily interact with you through an AI  


that is going to sell their competitors’ products. I think creators is going to be a big one. There  


are about 200 million creators on our platforms.  They basically all have the pattern where they  


want to engage their community but they're limited  by the hours in the day. Their community generally  


wants to engage them, but they don't know that  they're limited by the hours in the day. If  


you could create something where that creator  can basically own the AI, train it in the way  


they want, and engage their community, I think  that's going to be super powerful. There's going  


to be a ton of engagement across all these things. These are just the consumer use cases. My wife and  


I run our foundation, Chan Zuckerberg Initiative.  We're doing a bunch of stuff on science and  


there's obviously a lot of AI work that is going  to advance science and healthcare and all these  


things. So it will end up affecting basically  every area of the products and the economy. 


You mentioned AI that can just go out and do  something for you that's multi-step. Is that  


a bigger model? With Llama-4 for example, will  there still be a version that's 70B but you'll  


just train it on the right data and that will  be super powerful? What does the progression  


look like? Is it scaling? Is it just the same size  but different banks like you were talking about? 


I don't know that we know the answer to that. I  think one thing that seems to be a pattern is that  


you have the Llama model and then you build some  kind of other application specific code around it.  


Some of it is the fine-tuning for the use case,  but some of it is, for example, logic for how  


Meta AI should work with tools like Google or Bing  to bring in real-time knowledge. That's not part  


of the base Llama model. For Llama-2, we had some  of that and it was a little more hand-engineered.  


Part of our goal for Llama-3 was to bring more  of that into the model itself. For Llama-3,  


as we start getting into more of these agent-like  behaviors, I think some of that is going to be  


more hand-engineered. Our goal for Llama-4  will be to bring more of that into the model. 


At each step along the way you have a sense of  what's going to be possible on the horizon. You  


start messing with it and hacking around it. I  think that helps you then hone your intuition  


for what you want to try to train into the next  version of the model itself. That makes it more  


general because obviously for anything that you're  hand-coding you can unlock some use cases, but  


it's just inherently brittle and non-general. ‚Ä® When you say ‚Äúinto the model itself,‚ÄĚ you train it¬†¬†


on the thing that you want in the model itself?¬† What do you mean by ‚Äúinto the model itself‚ÄĚ?¬†


For Llama- 2, the tool use was very specific,  whereas Llama-3 has much better tool use. We  


don't have to hand code all the stuff to have  it use Google and go do a search. It can just do  


that. Similarly for coding and running code and  a bunch of stuff like that. Once you kind of get  


that capability, then you get a peek at what we  can start doing next. We don't necessarily want  


to wait until Llama-4 is around to start building  those capabilities, so we can start hacking around  


it. You do a bunch of hand coding and that  makes the products better, if only for the  


interim. That helps show the way then of what we  want to build into the next version of the model. 


What is the community fine tune of Llama-3  that you're most excited for? Maybe not the  


one that will be most useful to you, but the  one you'll just enjoy playing with the most.  


They fine-tune it on antiquity and  you'll just be talking to Virgil  


or something. What are you excited about? I think the nature of the stuff is that you  


get surprised. Any specific thing that I thought  would be valuable, we'd probably be building. I  


think you'll get distilled versions. I  think you'll get smaller versions. One  


thing is that I think 8B isn’t quite small  enough for a bunch of use cases. Over time I'd  


love to get a 1-2B parameter model, or even a 500M  parameter model and see what you can do with that. 


If with 8B parameters we’re nearly as  powerful as the largest Llama-2 model,  


then with a billion parameters you should be able  to do something that's interesting, and faster.  


It’d be good for classification, or a lot of  basic things that people do before understanding  


the intent of a user query and feeding it  to the most powerful model to hone in on  


what the prompt should be. I think that's one  thing that maybe the community can help fill  


in. We're also thinking about getting around to  distilling some of these ourselves but right now  


the GPUs are pegged training the 405B. 
 So you have all these GPUs. I think you  


said 350,000 by the end of the year. 
 That's the whole fleet. We built two,  


I think 22,000 or 24,000 clusters that are the  single clusters that we have for training the big  


models, obviously across a lot of the stuff that  we do. A lot of our stuff goes towards training  


Reels models and Facebook News Feed and Instagram  Feed. Inference is a huge thing for us because we  


serve a ton of people. Our ratio of inference  compute required to training is probably much  


higher than most other companies that are doing  this stuff just because of the sheer volume of  


the community that we're serving. In the material they shared with  


me before, it was really interesting that you  trained it on more data than is compute optimal  


just for training. The inference is such a big  deal for you guys, and also for the community,  


that it makes sense to just have this thing  and have trillions of tokens in there. 


Although one of the interesting  things about it, even with the 70B,  


is that we thought it would get more saturated. We  trained it on around 15 trillion tokens. I guess  


our prediction going in was that it was going  to asymptote more, but even by the end it was  


still learning.
We probably could have fed it more  tokens and it would have gotten somewhat better. 


At some point you're running a company and you  need to do these meta reasoning questions. Do I  


want to spend our GPUs on training the 70B model  further? Do we want to get on with it so we can  


start testing hypotheses for Llama-4? We needed  to make that call and I think we got a reasonable  


balance for this version of the 70B. There'll  be others in the future, the 70B multimodal one,  


that'll come over the next period. But that  was fascinating that the architectures at  


this point can just take so much data. That's really interesting. What does this  


imply about future models? You mentioned that  the Llama-3 8B is better than the Llama-2 70B. 


No, no, it's nearly as good.  I don’t want to overstate  


it. It’s in a similar order of magnitude. Does that mean the Llama-4 70B will be  


as good as the Llama-3 405B? What  does the future of this look like? 


This is one of the great questions, right? I think  no one knows. One of the trickiest things in the  


world to plan around is an exponential  curve. How long does it keep going for?  


I think it's likely enough that we'll keep going.  I think it’s worth investing the $10Bs or $100B+  


in building the infrastructure and assuming that  if it keeps going you're going to get some really  


amazing things that are going to make amazing  products. I don't think anyone in the industry  


can really tell you that it will continue scaling  at that rate for sure. In general in history,  


you hit bottlenecks at certain points.  Now there's so much energy on this that  


maybe those bottlenecks get knocked over pretty  quickly. I think that’s an interesting question.


What does the world look like where there aren't  these bottlenecks? Suppose progress just continues  


at this pace, which seems plausible.  Zooming out and forgetting about Llamas… 


Well, there are going to be different bottlenecks.  Over the last few years, I think there was this  


issue of GPU production. Even companies that had  the money to pay for the GPUs couldn't necessarily  


get as many as they wanted because there were all  these supply constraints. Now I think that's sort  


of getting less. So you're seeing a bunch of  companies thinking now about investing a lot  


of money in building out these things. I think  that that will go on for some period of time.  


There is a capital question. At what point does  it stop being worth it to put the capital in? 


I actually think before we hit that, you're  going to run into energy constraints. I don't  


think anyone's built a gigawatt single training  cluster yet. You run into these things that just  


end up being slower in the world. Getting energy  permitted is a very heavily regulated government  


function. You're going from software, which  is somewhat regulated and I'd argue it’s more  


regulated than a lot of people in the tech  community feel. Obviously it’s different if  


you're starting a small company, maybe you  feel that less. We interact with different  


governments and regulators and we have lots  of rules that we need to follow and make sure  


we do a good job with around the world. But  I think that there's no doubt about energy. 


If you're talking about building large new  power plants or large build-outs and then  


building transmission lines that cross other  private or public land, that’s just a heavily  


regulated thing. You're talking about many  years of lead time. If we wanted to stand up  


some massive facility, powering that is a very  long-term project. I think people do it but I  


don't think this is something that can be quite  as magical as just getting to a level of AI,  


getting a bunch of capital and putting it in, and  then all of a sudden the models are just going to…  


You do hit different bottlenecks along the way. Is there something, maybe an AI-related project or  


maybe not, that even a company like Meta doesn't  have the resources for? Something where if your  


R&D budget or capex budget were 10x what it is  now, then you could pursue it? Something that’s  


in the back of your mind but with Meta today,  you can't even issue stock or bonds for it?  


It's just like 10x bigger than your budget? I think energy is one piece. I think we  


would probably build out bigger clusters than we  currently can if we could get the energy to do it. 


That's fundamentally money-bottlenecked  in the limit? If you had $1 trillion… 


I think it’s time. It depends on how far the  exponential curves go. Right now a lot of  


data centers are on the order of 50 megawatts or  100MW, or a big one might be 150MW. Take a whole  


data center and fill it up with all the stuff  that you need to do for training and you build  


the biggest cluster you can. I think a bunch  of companies are running at stuff like that. 


But when you start getting into building a  data center that's like 300MW or 500MW or 1 GW,  


no one has built a 1GW data center yet. I think  it will happen. This is only a matter of time but  


it's not going to be next year. Some of these  things will take some number of years to build  


out. Just to put this in perspective, I think a  gigawatt would be the size of a meaningful nuclear  


power plant only going towards training a model. ‚Ä® Didn't Amazon do this? They have a 950MW‚Äst


I'm not exactly sure what they  did. You'd have to ask them. 


But it doesn’t have to be in the  same place, right? If distributed  


training works, it can be distributed. Well, I think that is a big question, how  


that's going to work. It seems quite possible that  in the future, more of what we call training for  


these big models is actually more along the lines  of inference generating synthetic data to then go  


feed into the model. I don't know what that ratio  is going to be but I consider the generation of  


synthetic data to be more inference than training  today. Obviously if you're doing it in order  


to train a model, it's part of the broader  training process. So that's an open question,  


the balance of that and how that plays out. Would that potentially also be the case with  


Llama-3, and maybe Llama-4 onwards? As in, you  put this out and if somebody has a ton of compute,  


then they can just keep making these things  arbitrarily smarter using the models that  


you've put out. Let’s say there’s some  random country, like Kuwait or the UAE,  


that has a ton of compute and they can actually  just use Llama-4 to make something much smarter. 


I do think there are going to be  dynamics like that, but I also think  


there is a fundamental limitation on the model  architecture. I think like a 70B model that we  


trained with a Llama-3 architecture can get  better, it can keep going. As I was saying,  


we felt that if we kept on feeding it more data  or rotated the high value tokens through again,  


then it would continue getting better. We've  seen a bunch of different companies around  


the world basically take the Llama-2 70B model  architecture and then build a new model. But it's  


still the case that when you make a generational  improvement to something like the Llama-3 70B or  


the Llama-3 405B, there isn’t anything like  that open source today. I think that's a big  


step function. What people are going to be able to  build on top of that I think can’t go infinitely  


from there. There can be some optimization in  that until you get to the next step function. 


Let's zoom out a little bit from specific  models and even the multi-year lead times  


you would need to get energy approvals and so  on. Big picture, what's happening with AI these  


next couple of decades? Does it feel like  another technology like the metaverse or  


social, or does it feel like a fundamentally  different thing in the course of human history? 


I think it's going to be pretty fundamental. I  think it's going to be more like the creation  


of computing in the first place. You'll get all  these new apps in the same way as when you got  


the web or you got mobile phones. People basically  rethought all these experiences as a lot of things  


that weren't possible before became possible.  So I think that will happen, but I think it's  


a much lower-level innovation. My sense is  that it's going to be more like people going  


from not having computers to having computers. It’s very hard to reason about exactly how this  


goes. In the cosmic scale obviously it'll happen  quickly, over a couple of decades or something.  


There is some set of people who are afraid of it  really spinning out and going from being somewhat  


intelligent to extremely intelligent overnight.  I just think that there's all these physical  


constraints that make that unlikely to happen. I  just don't really see that playing out. I think  


we'll have time to acclimate a bit. But it will  really change the way that we work and give people  


all these creative tools to do different things.  I think it's going to really enable people to do  


the things that they want a lot more. So maybe not overnight, but is it your  


view that on a cosmic scale we can think of  these milestones in this way? Humans evolved,  


and then AI happened, and then they went out  into the galaxy. Maybe it takes many decades,  


maybe it takes a century, but is that the grand  scheme of what's happening right now in history? 


Sorry, in what sense? In the sense that there were  


other technologies, like computers and even  fire, but the development of AI itself is as  


significant as humans evolving in the first place. I think that's tricky.
The history of humanity  


has been people basically thinking that certain  aspects of humanity are really unique in different  


ways and then coming to grips with the fact that  that's not true, but that humanity is actually  


still super special. We thought that the earth  was the center of the universe and it's not,  


but humans are still pretty  awesome and pretty unique, right? 


I think another bias that people tend  to have is thinking that intelligence  


is somehow fundamentally connected to life.  It's not actually clear that it is. I don't  


know that we have a clear enough definition of  consciousness or life to fully interrogate this.  


There's all this science fiction about creating  intelligence where it starts to take on all these  


human-like behaviors and things like that. The  current incarnation of all this stuff feels like  


it's going in a direction where intelligence  can be pretty separated from consciousness,  


agency, and things like that, which I  think just makes it a super valuable tool. 


Obviously it's very difficult to predict  what direction this stuff goes in over time,  


which is why I don't think anyone should be  dogmatic about how they plan to develop it  


or what they plan to do. You want to look  at it with each release. We're obviously  


very pro open source, but I haven't committed  to releasing every single thing that we do.  


I’m basically very inclined to think that  open sourcing is going to be good for the  


community and also good for us because we'll  benefit from the innovations. If at some point  


however there's some qualitative change in what  the thing is capable of, and we feel like it's  


not responsible to open source it, then we  won't. It's all very difficult to predict. 


What is a kind of specific qualitative change  where you'd be training Llama-5 or Llama-4,  


and if you see it, it‚Äôd make you think ‚Äúyou know¬† what, I'm not sure about open sourcing it‚ÄĚ?‚Ä®¬†


It's a little hard to answer that in  the abstract because there are negative  


behaviors that any product can exhibit  where as long as you can mitigate it,  


it's okay. There’s bad things about social media  that we work to mitigate. There's bad things about  


Llama-2 where we spend a lot of time trying  to make sure that it's not like helping people  


commit violent acts or things like that. That  doesn't mean that it's a kind of autonomous or  


intelligent agent. It just means that it's learned  a lot about the world and it can answer a set of  


questions that we think would be unhelpful for it  to answer. I think the question isn't really what  


behaviors would it show, it's what things would  we not be able to mitigate after it shows that. 


I think that there's so many ways in which  something can be good or bad that it's hard  


to actually enumerate them all up front. Look at  what we've had to deal with in social media and  


the different types of harms. We've basically  gotten to like 18 or 19 categories of harmful  


things that people do and we've basically built  AI systems to identify what those things are and  


to make sure that doesn't happen on our network  as much as possible. Over time I think you'll  


be able to break this down into more of a  taxonomy too. I think this is a thing that  


we spend time researching as well, because we  want to make sure that we understand that. 


It seems to me that it would be a good idea.  I would be disappointed in a future where AI  


systems aren't broadly deployed and everybody  doesn't have access to them. At the same time,  


I want to better understand the mitigations.  If the mitigation is the fine-tuning,  


the whole thing about open weights is that you  can then remove the fine-tuning, which is often  


superficial on top of these capabilities. If it's  like talking on Slack with a biology researcher…  


I think models are very far from this. Right  now, they’re like Google search. But if I can  


show them my Petri dish and they can explain why  my smallpox sample didn’t grow and what to change,  


how do you mitigate that? Because somebody  can just fine-tune that in there, right? 


That's true. I think a lot of people will  basically use the off-the-shelf model and some  


people who have basically bad faith are going to  try to strip out all the bad stuff. So I do think  


that's an issue. On the flip side, one of the  reasons why I'm philosophically so pro open source  


is that I do think that a concentration of AI in  the future has the potential to be as dangerous as  


it being widespread. I think a lot of people think  about the questions of “if we can do this stuff,  


is it bad for it to be out in the wild and just¬† widely available?‚ÄĚ I think another version of¬†¬†


this is that it's probably also pretty bad  for one institution to have an AI that is  


way more powerful than everyone else's AI. There’s one security analogy that I think  


of. There are so many security holes in so many  different things. If you could travel back in  


time a year or two years, let's say you just have  one or two years more knowledge of the security  


holes. You can pretty much hack into any system.  That’s not AI. So it's not that far-fetched to  


believe that a very intelligent AI probably would  be able to identify some holes and basically  


be like a human who could go back in time a  year or two and compromise all these systems. 


So how have we dealt with that as a society?  One big part is open source software that  


makes it so that when improvements are made to  the software, it doesn't just get stuck in one