Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters

Dwarkesh Podcast
18 Apr 202478:38

Summary

TLDRIn a thought-provoking interview, the speaker, presumably Mark Zuckerberg, discusses the future of AI with a focus on Meta AI's advancements. He highlights the release of Llama-3, an open-source AI model integrated with Google and Bing for real-time knowledge, emphasizing its capabilities in image generation and natural language processing. Zuckerberg also addresses the challenges of building large-scale data centers, the risks of centralized AI control, and the importance of open-source contributions. He stresses the potential of AI to revolutionize various sectors, including science and healthcare, and shares his vision of AI as a tool that enhances human productivity rather than replacing it. The conversation delves into the implications of AI development, the balance between innovation and safety, and the significance of open-source software in democratizing AI technology.

Takeaways

  • šŸ¤– The new version of Meta AI, Llama-3, is set to be the most intelligent, freely-available AI assistant, integrating with Google and Bing for real-time knowledge and featuring enhanced creation capabilities like animations and real-time image generation.
  • šŸš€ Meta is training multiple versions of the Llama model, including an 8 billion parameter model released for the developer community and a 405 billion parameter model still in training, aiming to push the boundaries of AI capabilities.
  • šŸŒ The release of Llama-3 is not global but will start in a few countries, with plans for a wider rollout in the coming months, reflecting a strategic approach to introducing advanced AI technologies.
  • šŸ“ˆ Mark Zuckerberg emphasizes the importance of open-source AI, believing it to be beneficial for the community and for Meta, allowing for broader innovation and a more level playing field in the AI industry.
  • šŸ›”ļø There is a commitment to responsible AI development, with considerations for not releasing certain models if they present irresolvable negative behaviors or risks, highlighting a cautious approach to AI's potential downsides.
  • āš™ļø Meta is investing in custom silicon to improve the efficiency of AI model training and inference, which could significantly reduce costs and improve performance for their AI-driven services.
  • šŸŒŸ Zuckerberg shares his passion for building new things and his belief in the potential of AI to enable creativity and productivity, reflecting his personal drive and the company's mission.
  • šŸ”® The potential of AI is compared to the creation of computing itself, suggesting a fundamental shift in how people work and live, with AI becoming an integral part of various industries and aspects of life.
  • šŸ’” Open source contributions, such as PyTorch and React, are considered powerful drivers of innovation and have possibly had a significant impact on the world, potentially rivaling the reach of Meta's social media products.
  • āš–ļø There's a discussion on the balance of power in AI development, with concerns about the risks of having a single entity with disproportionately strong AI capabilities, advocating for a decentralized approach.
  • šŸ› Zuckerberg draws an analogy between historical shifts in understanding, like the concept of peace under Augustus, and current paradigm shifts in technology and business models, emphasizing the importance of challenging conventional thinking.

Q & A

  • What is the main update to Meta AI that Mark Zuckerberg discusses in the interview?

    -The main update is the rollout of Llama-3, an AI model that is both open source and will power Meta AI. It is considered the most intelligent, freely-available AI assistant at the time of the interview.

  • How does Meta AI integrate with other search engines?

    -Meta AI integrates with Google and Bing for real-time knowledge, making it more prominent across apps like Facebook and Messenger.

  • What new creation features does Meta AI introduce?

    -Meta AI introduces features like animations, where any image can be animated, and real-time high-quality image generation as users type their queries.

  • What are the technical specifications of the Llama-3 model that Mark Zuckerberg finds exciting?

    -Mark Zuckerberg is excited about the Llama-3 model, which includes an 8 billion parameter model and a 70 billion parameter model. There's also a 405 billion parameter model in training.

  • What is the roadmap for future releases of Meta AI?

    -The roadmap includes new releases that will bring multimodality, more multi-linguality, and bigger context windows. There are plans to roll out the 405B model later in the year.

  • How does Mark Zuckerberg perceive the risk of having a few companies controlling closed AI models?

    -He sees it as a significant risk, as it could lead to these companies dictating what others can build, creating a situation similar to the control exerted by Apple over app features.

  • What is the strategy behind Meta's acquisition of GPUs like the H100?

    -The strategy was to ensure they had enough capacity to build something they couldn't foresee on the horizon yet, doubling the order to be prepared for future needs beyond the immediate requirements for Reels and content ranking.

  • Why did Mark Zuckerberg decide not to sell Facebook in 2006 for $1 billion?

    -Mark felt a deep conviction in what they were building and believed that if he sold the company, he would just build another similar one. He also lacked the financial sophistication to engage in the billion-dollar valuation debate.

  • What is the role of Facebook AI Research (FAIR) in the development of Meta's AI?

    -FAIR, established about 10 years prior, has been instrumental in creating innovations that improved Meta's products. It transitioned from a pure research group to a key player in integrating AI into Meta's products, with the creation of the gen AI group.

  • How does Meta plan to approach the development of more advanced AI models like Llama-4?

    -Meta plans to continue training larger models, incorporating more capabilities like reasoning and memory, and focusing on multimodality and emotional understanding. They aim to make AI more integrated into various aspects of their products and services.

  • What are the potential future challenges in scaling AI models?

    -Challenges include physical constraints like energy limitations for training large models, regulatory hurdles for building new power plants and transmission lines, and the balance between open sourcing models and potential risks associated with them.

  • How does Mark Zuckerberg view the future of AI and its impact on society?

    -He sees AI as a fundamental shift, similar to the creation of computing, that will enable new applications and experiences. However, he also acknowledges the need for careful consideration of risks and the importance of a balanced approach to AI development and deployment.

Outlines

00:00

šŸš€ AI Innovation and Meta AI's New Features

The speaker expresses an inherent drive to continually innovate and build new features, despite challenges from entities like Apple. The conversation introduces Meta AI's latest advancements, highlighting the release of Llama-3, an open-source AI model that integrates with Google and Bing for real-time knowledge. New features include image animation and real-time high-quality image generation based on user queries. The speaker emphasizes Meta AI's commitment to making AI more accessible and enhancing its capabilities across various applications.

05:00

šŸ¤– The Future of AI and Meta's Strategic Investments

The discussion delves into the strategic foresight behind Meta's investment in GPUs for AI model training. The speaker reflects on the importance of capacity planning for unforeseen technological advancements, drawing parallels with past decisions that have shaped the company's direction. The conversation also touches on the speaker's personal philosophy on company valuation and the significance of Facebook AI Research (FAIR) in driving product innovation.

10:01

šŸ§  AGI and the Evolution of Meta's AI Strategy

The speaker outlines the evolution of Meta's approach to AI, from the inception of FAIR to the current focus on general AI (AGI). The importance of coding and reasoning in training AI models is emphasized, highlighting how these capabilities enhance the AI's performance across various domains. The conversation explores the concept of AI as a progressive tool that augments human capabilities rather than replacing them.

15:01

šŸŒ Multimodal AI and the Future of Interaction

The speaker envisions a future where AI capabilities become more integrated and sophisticated, covering emotional understanding and multimodal interactions. The potential for personalized AI models and the impact of AI on industrial-scale operations are discussed. The conversation also addresses the idea of AI agents representing businesses and creators, and the importance of open-source AI in maintaining a balanced technological landscape.

20:05

šŸ“ˆ Scaling AI Models and Meta's Computational Challenges

The speaker discusses the challenges and strategies related to scaling AI models, including the physical and computational constraints of training large models like Llama-3. The conversation explores the concept of using inference to generate synthetic data for training and the potential for smaller, fine-tuned models to play a significant role in various applications. The speaker also addresses the importance of community contributions in advancing AI technology.

25:06

šŸŒŸ The Impact of Open Source on AI and Technology

The speaker reflects on the impact of open-source contributions from Meta, such as PyTorch and React, and their potential long-term significance. The conversation considers whether open-source efforts could have a more profound impact than Meta's social media products, given their widespread use across the internet. The speaker also discusses the future integration of Llama models with custom silicon for more efficient training.

30:07

šŸ¤” Navigating Open Source Risks and Future AI Developments

The speaker addresses concerns about the potential risks of open sourcing powerful AI models, including the possibility of misuse. The conversation focuses on the importance of balancing theoretical risks with practical, everyday harms, and the responsibility to mitigate these risks. The speaker also shares thoughts on the future of AI, including the potential for AI to become a commodified training resource and the economic considerations of open sourcing high-value models.

35:17

šŸŒŸ The Value of Focus and Meta's Management Strategy

The speaker discusses the concept of focus as a scarce commodity, especially for large companies, and its importance in driving the company's success. The conversation touches on the challenges of managing multiple projects and the need to maintain a sharp focus on key priorities. The speaker also reflects on the unpredictability of success in technology and the importance of trying new things.

Mindmap

Keywords

šŸ’”AI Assistant

An AI assistant is an artificial intelligence software that performs tasks or services for users, such as answering questions, setting reminders, or providing recommendations. In the script, the development of Meta AI's Llama-3 model is discussed, which is designed to be an intelligent, freely-available AI assistant that integrates with platforms like Facebook and Messenger, allowing users to interact with it through search boxes for real-time queries and responses.

šŸ’”Open Source

Open source refers to a type of software where the source code is made available to the public, allowing anyone to view, use, modify, and distribute the software. The script discusses Meta's decision to release the Llama-3 model as open source, emphasizing the benefits of community contributions and the prevention of a single entity having control over advanced AI capabilities.

šŸ’”Data Center

A data center is a facility that houses a large number of servers, storage systems, and other components connected through a network. The script mentions the construction of data centers with high energy consumption, such as 300 Megawatts or 1 Gigawatt, which are necessary for training large AI models like Llama-3.

šŸ’”Parameter

In the context of AI, a parameter is a variable in a model that the machine learning algorithm can adjust to improve the model's performance. The script discusses different versions of the Llama model with varying numbers of parameters, such as an 8 billion parameter model and a 70 billion parameter model, highlighting the scale and complexity of these AI systems.

šŸ’”Multimodality

Multimodality in AI refers to the ability of a system to process and understand information from multiple different modes of input, such as text, images, and video. The script mentions Meta's focus on developing multimodal capabilities in their AI models to enhance their functionality and user interaction.

šŸ’”Benchmark

A benchmark is a standard or point of reference against which things may be compared or assessed. In AI, benchmarks are used to evaluate the performance of models against specific tasks. The script discusses the Llama-3 model's performance on benchmarks, indicating its effectiveness and reasoning capabilities.

šŸ’”Inference

In AI, inference is the process of deriving conclusions or making decisions based on known information. The script talks about the significant role of inference in serving a large user base, as it requires a substantial amount of computational resources to apply the trained AI models to new data or situations.

šŸ’”Meta AI

Meta AI refers to the artificial intelligence division within the company Meta (formerly known as Facebook, Inc.). The script discusses the advancements in Meta AI, particularly the release of the Llama-3 model, which is intended to be the most intelligent AI assistant available to the public.

šŸ’”Training Cluster

A training cluster is a group of interconnected computers that work together to train machine learning models. The script mentions the development and scaling of training clusters, which are essential for handling the large-scale computations required to train complex AI models like Llama-3.

šŸ’”Content Risks

Content risks refer to the potential negative outcomes or harms that can arise from the use of AI systems, such as the spread of misinformation, promotion of harmful behavior, or facilitation of violence. The script emphasizes the importance of mitigating content risks associated with AI models, particularly in preventing the use of these models to cause harm to individuals or society.

šŸ’”Economic Constraints

Economic constraints refer to the limitations or restrictions faced by an organization due to financial considerations. The script discusses how economic constraints, such as the cost of GPUs and energy, impact the development and scaling of AI models and data centers.

Highlights

Meta AI is releasing an upgraded model called Llama-3, which is set to be the most intelligent, freely-available AI assistant.

Llama-3 will be available as open source for developers and will also power Meta AI, integrating with Google and Bing for real-time knowledge.

New creation features have been added, including the ability to animate any image and generate high-quality images in real time as you type your query.

Meta AI's new version is initially rolling out in a few countries, with plans for broader availability in the coming weeks and months.

Technically, Llama-3 comes in three versions: an 8 billion parameter model, a 70 billion parameter model released today, and a 405 billion parameter model still in training.

The 70 billion parameter model of Llama-3 has scored highly on benchmarks for math and reasoning, while the 405 billion parameter model is expected to lead in benchmarks upon completion.

Meta has a roadmap for future releases that include multimodality, more multilinguality, and larger context windows.

The decision to invest in GPUs for AI was driven by the need for more capacity to train models for content recommendation in services like Reels.

The capability of showing content from unconnected sources on platforms like Instagram and Facebook represents a significant unlock for user engagement.

The importance of open source in AI development, ensuring a balanced and competitive ecosystem, and the potential risks of concentrated AI power.

The potential for AI to surpass human intelligence in most domains progressively, and the focus on capabilities like emotional understanding and reasoning.

Meta's commitment to addressing the risks of misinformation and the importance of building AI systems to combat adversarial uses.

The vision of AI as a tool that enhances human capabilities rather than replacing them, aiming for increased productivity and creativity.

The significance of the metaverse in enabling realistic digital presence and its potential impact on socializing, working, and various industries.

Mark Zuckerberg's personal drive to continuously build new things and the philosophy behind investing in large-scale projects like AI and the metaverse.

The historical perspective on the development of peace and economy, drawing parallels to modern innovations in tech and the concept of open source.

The potential for custom silicon to revolutionize the training of large AI models and the strategic move to first optimize inference processes.

Transcripts

00:00

That's not even a question for me - whetherĀ  we're going to go take a swing at buildingĀ Ā 

00:03

the next thing. I'm just incapable of not doingĀ  that. There's a bunch of times when we wanted toĀ Ā 

00:08

launch features and then Apple's just likeĀ  nope you're not launching that I was likeĀ Ā 

00:12

that sucks. Are we set up for that with AI whereĀ  you're going to get a handful of companies thatĀ Ā 

00:19

run these closed models that are going to be inĀ  control of the apis and therefore are going to beĀ Ā 

00:22

able to tell you what you can build? Then whenĀ  you start getting into building a data centerĀ Ā 

00:27

that's like 300 Megawatts or 500 Megawatts or aĀ  Gigawatt - just no one has built a single GigawattĀ Ā 

00:33

data center yet. From wherever you sit there'sĀ  going to be some actor who you don't trust - ifĀ Ā 

00:37

they're the ones who have the super strong AI IĀ  think that that's potentially a much bigger risk

00:43

Mark, welcome to the podcast. Thanks for having me. Big fan of your podcast.Ā 

00:47

Thank you, that's very nice of you to say.Ā  Let's start by talking about the releasesĀ Ā 

00:52

that will go out when this interviewĀ  goes out. Tell me about the models andĀ Ā 

00:57

Meta AI. Whatā€™s new and exciting about them? I think the main thing that most people in theĀ Ā 

01:02

world are going to see is the new version ofĀ  Meta AI. The most important thing that we'reĀ Ā 

01:08

doing is the upgrade to the model. We'reĀ  rolling out Llama-3. We're doing it bothĀ Ā 

01:12

as open source for the dev community and it isĀ  now going to be powering Meta AI. There's a lotĀ Ā 

01:19

that I'm sure we'll get into around Llama-3,Ā  but I think the bottom line on this is thatĀ Ā 

01:24

we think now that Meta AI is the most intelligent,Ā  freely-available AI assistant that people can use.Ā Ā 

01:30

We're also integrating GoogleĀ  and Bing for real-time knowledge.Ā 

01:34

We're going to make it a lot more prominent acrossĀ  our apps. At the top of Facebook and Messenger,Ā Ā 

01:42

you'll be able to just use the search box rightĀ  there to ask any question. There's a bunch of newĀ Ā 

01:48

creation features that we added that I think areĀ  pretty cool and that I think people will enjoy.Ā Ā 

01:54

I think animations is a good one. You canĀ  basically take any image and just animate it.Ā 

02:00

One that people are going to find pretty wildĀ  is that it now generates high quality imagesĀ Ā 

02:07

so quickly that it actually generates it asĀ  you're typing and updates it in real time.Ā Ā 

02:12

ā€ØSo you're typing your query and it's honingĀ  in. Itā€™s like ā€œshow me a picture of a cow inĀ Ā 

02:21

a field with mountains in the background, eatingĀ  macadamia nuts, drinking beerā€ and it's updatingĀ Ā 

02:29

the image in real time. It's pretty wild. IĀ  think people are going to enjoy that. So IĀ Ā 

02:35

think that's what most people are going to see inĀ  the world. We're rolling that out, not everywhere,Ā Ā 

02:39

but we're starting in a handful of countries andĀ  we'll do more over the coming weeks and months.Ā Ā 

02:46

I think thatā€™s going to be a pretty big dealĀ  and I'm really excited to get that in people'sĀ Ā 

02:50

hands. It's a big step forward for Meta AI. But I think if you want to get under the hoodĀ Ā 

02:57

a bit, the Llama-3 stuff is obviously the mostĀ  technically interesting. We're training threeĀ Ā 

03:05

versions: an 8 billion parameter model and a 70Ā  billion, which we're releasing today, and a 405Ā Ā 

03:11

billion dense model, which is still training. SoĀ  we're not releasing that today, but I'm prettyĀ Ā 

03:20

excited about how the 8B and the 70B turned out.Ā  They're leading for their scale. We'll release aĀ Ā 

03:31

blog post with all the benchmarks so people canĀ  check it out themselves. Obviously it's openĀ Ā 

03:34

source so people get a chance to play with it. We have a roadmap of new releases coming thatĀ Ā 

03:41

are going to bring multimodality, moreĀ  multi-linguality, and bigger contextĀ Ā 

03:46

windows as well. Hopefully, sometime later in theĀ  year we'll get to roll out the 405B. For where itĀ Ā 

03:59

is right now in training, it is alreadyĀ  at around 85 MMLU and we expect that it'sĀ Ā 

04:09

going to have leading benchmarks on a bunch of theĀ  benchmarks. I'm pretty excited about all of that.Ā Ā 

04:14

The 70 billion is great too. We're releasing thatĀ  today. It's around 82 MMLU and has leading scoresĀ Ā 

04:22

on math and reasoning. I think just getting thisĀ  in people's hands is going to be pretty wild. ā€ØĀ 

04:26

Oh, interesting. That's the first Iā€™m hearingĀ  of it as a benchmark. That's super impressive.Ā 

04:30

The 8 billion is nearly as powerful as theĀ  biggest version of Llama-2 that we released.Ā Ā 

04:38

So the smallest Llama-3 is basicallyĀ  as powerful as the biggest Llama-2.Ā 

04:43

Before we dig into these models, I want to goĀ  back in time. I'm assuming 2022 is when youĀ Ā 

04:49

started acquiring these H100s, or you can tell meĀ  when. The stock price is getting hammered. PeopleĀ Ā 

04:56

are asking what's happening with all thisĀ  capex. People aren't buying the metaverse.Ā Ā 

05:00

Presumably you're spending that capex to getĀ  these H100s. How did you know back then to get theĀ Ā 

05:04

H100s? How did you know that youā€™d need the GPUs? I think it was because we were working on Reels.Ā Ā 

05:14

We always want to have enough capacity to buildĀ  something that we can't quite see on the horizonĀ Ā 

05:23

yet. We got into this position with Reels where weĀ  needed more GPUs to train the models. It was thisĀ Ā 

05:31

big evolution for our services. Instead of justĀ  ranking content from people or pages you follow,Ā Ā 

05:41

we made this big push to start recommending whatĀ  we call unconnected content, content from peopleĀ Ā 

05:49

or pages that you're not following. ā€Ø The corpus of content candidates thatĀ Ā 

05:56

we could potentially show you expanded fromĀ  on the order of thousands to on the order ofĀ Ā 

06:01

hundreds of millions. It needed a completelyĀ  different infrastructure. We started workingĀ Ā 

06:08

on doing that and we were constrained onĀ  the infrastructure in catching up to whatĀ Ā 

06:14

TikTok was doing as quickly as we wanted to. IĀ  basically looked at that and I was like ā€œhey,Ā Ā 

06:19

we have to make sure that we're never in thisĀ  situation again. So let's order enough GPUs to doĀ Ā 

06:25

what we need to do on Reels and ranking contentĀ  and feed. But let's also double that.ā€ Again,Ā Ā 

06:31

our normal principle is that there's going to beĀ  something on the horizon that we can't see yet.Ā 

06:35

Did you know it would be AI? We thought it was going to be something thatĀ Ā 

06:40

had to do with training large models. At the timeĀ  I thought it was probably going to be somethingĀ Ā 

06:44

that had to do with content. Itā€™s just the patternĀ  matching of running the company, there's alwaysĀ Ā 

06:52

another thing. At that time I was so deep intoĀ  trying to get the recommendations working forĀ Ā 

07:00

Reels and other content. Thatā€™s just such a bigĀ  unlock for Instagram and Facebook now, beingĀ Ā 

07:05

able to show people content that's interesting toĀ  them from people that they're not even following.Ā 

07:09

But that ended up being a very good decisionĀ  in retrospect. And it came from being behind.Ā Ā 

07:18

It wasn't like ā€œoh, I was so far ahead.ā€Ā  Actually, most of the times where we makeĀ Ā 

07:25

some decision that ends up seeming goodĀ  is because we messed something up beforeĀ Ā 

07:29

and just didn't want to repeat the mistake. This is a total detour, but I want to askĀ Ā 

07:32

about this while we're on this. We'll get backĀ  to AI in a second. In 2006 you didn't sell forĀ Ā 

07:37

$1 billion but presumably there's some amount youĀ  would have sold for, right? Did you write downĀ Ā 

07:41

in your head like ā€œI think the actual valuationĀ  of Facebook at the time is this and they're notĀ Ā 

07:45

actually getting the valuation rightā€? If theyā€™dĀ  offered you $5 trillion, of course you would haveĀ Ā 

07:48

sold. So how did you think about that choice? ā€Ø I think some of these things are just personal.Ā Ā 

07:58

I don't know that at the time I was sophisticatedĀ  enough to do that analysis. I had all these peopleĀ Ā 

08:03

around me who were making all these arguments forĀ  a billion dollars like ā€œhere's the revenue thatĀ Ā 

08:10

we need to make and here's how big we need to be.Ā  It's clearly so many years in the future.ā€ It wasĀ Ā 

08:16

very far ahead of where we were at the time. IĀ  didn't really have the financial sophisticationĀ Ā 

08:23

to really engage with that kind of debate. Deep down I believed in what we were doing.Ā Ā 

08:30

ā€ØI did some analysis like ā€œwhat would I do if IĀ  werenā€™t doing this? Well, I really like buildingĀ Ā 

08:40

things and I like helping people communicate. IĀ  like understanding what's going on with people andĀ Ā 

08:46

the dynamics between people. So I think if I soldĀ  this company, I'd just go build another companyĀ Ā 

08:51

like this and I kind of like the one I have.Ā  So why?ā€ I think a lot of the biggest bets thatĀ Ā 

09:03

people make are often just based on conviction andĀ  values. It's actually usually very hard to do theĀ Ā 

09:12

analyses trying to connect the dots forward. You've had Facebook AI Research for a longĀ Ā 

09:18

time. Now it's become seemingly central toĀ  your company. At what point did making AGI,Ā Ā 

09:26

or however you consider that mission,Ā  become a key priority of what Meta is doing?Ā 

09:33

It's been a big deal for a while. We startedĀ  FAIR about 10 years ago. The idea was that,Ā Ā 

09:41

along the way to general intelligence or whateverĀ  you wanna call it, there are going to be all theseĀ Ā 

09:48

different innovations and that's going toĀ  just improve everything that we do. So weĀ Ā 

09:52

didn't conceive of it as a product. It wasĀ  more of a research group. Over the last 10Ā Ā 

10:00

years it has created a lot of different thingsĀ  that have improved all of our products. Itā€™sĀ Ā 

10:07

advanced the field and allowed other people inĀ  the field to create things that have improved ourĀ Ā 

10:11

products too. I think that that's been great. There's obviously a big change in the lastĀ Ā 

10:17

few years with ChatGPT and the diffusionĀ  models around image creation coming out.Ā Ā 

10:24

This is some pretty wild stuff that isĀ  pretty clearly going to affect how peopleĀ Ā 

10:29

interact with every app that's out there. At thatĀ  point we started a second group, the gen AI group,Ā Ā 

10:40

with the goal of bringing that stuff into ourĀ  products and building leading foundation modelsĀ Ā 

10:46

that would power all these different products. ā€Ø When we started doing that the theory initiallyĀ Ā 

10:54

was that a lot of the stuff we're doing isĀ  pretty social. It's helping people interactĀ Ā 

11:01

with creators, helping people interact withĀ  businesses, helping businesses sell things orĀ Ā 

11:07

do customer support. Thereā€™s also basic assistantĀ  functionality, whether it's for our apps or theĀ Ā 

11:13

smart glasses or VR. So it wasn't completelyĀ  clear at first that you were going to need fullĀ Ā 

11:24

AGI to be able to support those use cases. But inĀ  all these subtle ways, through working on them,Ā Ā 

11:29

I think it's actually become clear that you do.Ā  For example, when we were working on Llama-2,Ā Ā 

11:37

we didn't prioritize coding because peopleĀ  aren't going to ask Meta AI a lot of codingĀ Ā 

11:42

questions in WhatsApp. Now they will, right?Ā 

11:44

I don't know. I'm not sure that WhatsApp, orĀ  Facebook or Instagram, is the UI where people areĀ Ā 

11:47

going to be doing a lot of coding questions. MaybeĀ  the website, meta.ai, that weā€™re launching. ButĀ Ā 

12:00

the thing that has been a somewhat surprisingĀ  result over the last 18 months is that it turnsĀ Ā 

12:08

out that coding is important for a lot of domains,Ā  not just coding. Even if people aren't askingĀ Ā 

12:14

coding questions, training the models on codingĀ  helps them become more rigorous in answering theĀ Ā 

12:21

question and helps them reason across a lot ofĀ  different types of domains. That's one exampleĀ Ā 

12:26

where for Llama-3, we really focused on trainingĀ  it with a lot of coding because that's goingĀ Ā 

12:30

to make it better on all these things even ifĀ  people aren't asking primarily coding questions.Ā 

12:36

Reasoning is another example. Maybe you wantĀ  to chat with a creator or you're a business andĀ Ā 

12:43

you're trying to interact with a customer.Ā  That interaction is not just like ā€œokay,Ā Ā 

12:47

the person sends you a message and youĀ  just reply.ā€ It's a multi-step interactionĀ Ā 

12:53

where you're trying to think through ā€œhow do IĀ  accomplish the person's goals?ā€ A lot of timesĀ Ā 

12:57

when a customer comes, they don't necessarilyĀ  know exactly what they're looking for or howĀ Ā 

13:01

to ask their questions. So it's not really theĀ  job of the AI to just respond to the question.Ā 

13:06

You need to kind of think about itĀ  more holistically. It really becomesĀ Ā 

13:09

a reasoning problem. So if someone else solvesĀ  reasoning, or makes good advances on reasoning,Ā Ā 

13:14

and we're sitting here with a basic chat bot,Ā  then our product is lame compared to what otherĀ Ā 

13:19

people are building. At the end of the day, weĀ  basically realized we've got to solve generalĀ Ā 

13:26

intelligence and we just upped the ante and theĀ  investment to make sure that we could do that.Ā 

13:32

So the version ofā€ØLlama that's going to solveĀ  all these use cases for users, is that theĀ Ā 

13:41

version that will be powerful enough to replaceĀ  a programmer you might have in this building?Ā 

13:46

I just think that all this stuff isĀ  going to be progressive over time. ā€ØĀ 

13:49

But in the end case: Llama-10. I think that there's a lot bakedĀ Ā 

13:55

into that question. I'm not sure that we'reĀ  replacing people as much as weā€™re givingĀ Ā 

14:00

people tools to do more stuff. Is the programmer in this buildingĀ Ā 

14:03

10x more productive after Llama-10? ā€Ø I would hope more. I don't believe thatĀ Ā 

14:09

there's a single threshold of intelligence forĀ  humanity because people have different skills.Ā Ā 

14:14

I think that at some point AI is probably going toĀ  surpass people at most of those things, dependingĀ Ā 

14:21

on how powerful the models are. But I think it'sĀ  progressive and I don't think AGI is one thing.Ā Ā 

14:29

You're basically adding different capabilities.Ā  Multimodality is a key one that we're focused onĀ Ā 

14:34

now, initially with photos and images and text butĀ  eventually with videos. Because we're so focusedĀ Ā 

14:40

on the metaverse, 3D type stuff is importantĀ  too. One modality that I'm pretty focused on,Ā Ā 

14:46

that I haven't seen as many other people in theĀ  industry focus on, is emotional understanding. SoĀ Ā 

14:54

much of the human brain is just dedicatedĀ  to understanding people and understandingĀ Ā 

15:00

expressions and emotions. I think that'sĀ  its own whole modality, right? You couldĀ Ā 

15:06

say that maybe it's just video or image, but it'sĀ  clearly a very specialized version of those two.Ā 

15:10

So there are all these different capabilitiesĀ  that you want to train the models to focusĀ Ā 

15:17

on, in addition to getting a lot better atĀ  reasoning and memory, which is its own wholeĀ Ā 

15:22

thing. I don't think in the future we're going toĀ  be primarily shoving things into a query contextĀ Ā 

15:29

window to ask more complicated questions. ThereĀ  will be different stores of memory or differentĀ Ā 

15:35

custom models that are more personalized toĀ  people. These are all just different capabilities.Ā Ā 

15:42

Obviously then thereā€™s making them big and small.Ā  We care about both. If you're running somethingĀ Ā 

15:47

like Meta AI, that's pretty server-based. We alsoĀ  want it running on smart glasses and there's notĀ Ā 

15:55

a lot of space in smart glasses. So you want toĀ  have something that's very efficient for that.Ā 

16:01

If you're doing $10Bs worth ofĀ  inference or even eventually $100Bs,Ā Ā 

16:06

if you're using intelligence in an industrialĀ  scale what is the use case? Is it simulations?Ā Ā 

16:11

Is it the AIs that will be in the metaverse?Ā  What will we be using the data centers for?Ā 

16:19

Our bet is that it's going to basically changeĀ  all of the products. I think that there's goingĀ Ā 

16:24

to be a kind of Meta AI general assistantĀ  product. I think that that will shift fromĀ Ā 

16:32

something that feels more like a chatbot, whereĀ  you ask a question and it formulates an answer,Ā Ā 

16:37

to things where you're giving it more complicatedĀ  tasks and then it goes away and does them. That'sĀ Ā 

16:43

going to take a lot of inference and it's goingĀ  to take a lot of compute in other ways too.Ā 

16:48

Then I think interacting with other agents forĀ  other people is going to be a big part of whatĀ Ā 

16:56

we do, whether it's for businesses or creators. AĀ  big part of my theory on this is that there's notĀ Ā 

17:02

going to be just one singular AI that you interactĀ  with. Every business is going to want an AI thatĀ Ā 

17:09

represents their interests. They're not going toĀ  want to primarily interact with you through an AIĀ Ā 

17:13

that is going to sell their competitorsā€™ products. I think creators is going to be a big one. ThereĀ Ā 

17:25

are about 200 million creators on our platforms.Ā  They basically all have the pattern where theyĀ Ā 

17:31

want to engage their community but they're limitedĀ  by the hours in the day. Their community generallyĀ Ā 

17:35

wants to engage them, but they don't know thatĀ  they're limited by the hours in the day. IfĀ Ā 

17:40

you could create something where that creatorĀ  can basically own the AI, train it in the wayĀ Ā 

17:47

they want, and engage their community, I thinkĀ  that's going to be super powerful. There's goingĀ Ā 

17:55

to be a ton of engagement across all these things. These are just the consumer use cases. My wife andĀ Ā 

18:04

I run our foundation, Chan Zuckerberg Initiative.Ā  We're doing a bunch of stuff on science andĀ Ā 

18:12

there's obviously a lot of AI work that is goingĀ  to advance science and healthcare and all theseĀ Ā 

18:17

things. So it will end up affecting basicallyĀ  every area of the products and the economy.Ā 

18:25

You mentioned AI that can just go out and doĀ  something for you that's multi-step. Is thatĀ Ā 

18:30

a bigger model? With Llama-4 for example, willĀ  there still be a version that's 70B but you'llĀ Ā 

18:36

just train it on the right data and that willĀ  be super powerful? What does the progressionĀ Ā 

18:40

look like? Is it scaling? Is it just the same sizeĀ  but different banks like you were talking about?Ā 

18:49

I don't know that we know the answer to that. IĀ  think one thing that seems to be a pattern is thatĀ Ā 

18:56

you have the Llama model and then you build someĀ  kind of other application specific code around it.Ā Ā 

19:06

Some of it is the fine-tuning for the use case,Ā  but some of it is, for example, logic for howĀ Ā 

19:14

Meta AI should work with tools like Google or BingĀ  to bring in real-time knowledge. That's not partĀ Ā 

19:21

of the base Llama model. For Llama-2, we had someĀ  of that and it was a little more hand-engineered.Ā Ā 

19:30

Part of our goal for Llama-3 was to bring moreĀ  of that into the model itself. For Llama-3,Ā Ā 

19:36

as we start getting into more of these agent-likeĀ  behaviors, I think some of that is going to beĀ Ā 

19:41

more hand-engineered. Our goal for Llama-4Ā  will be to bring more of that into the model.Ā 

19:48

At each step along the way you have a sense ofĀ  what's going to be possible on the horizon. YouĀ Ā 

19:54

start messing with it and hacking around it. IĀ  think that helps you then hone your intuitionĀ Ā 

19:59

for what you want to try to train into the nextĀ  version of the model itself. That makes it moreĀ Ā 

20:04

general because obviously for anything that you'reĀ  hand-coding you can unlock some use cases, butĀ Ā 

20:10

it's just inherently brittle and non-general. ā€Ø When you say ā€œinto the model itself,ā€ you train itĀ Ā 

21:21

on the thing that you want in the model itself?Ā  What do you mean by ā€œinto the model itselfā€?Ā 

21:33

For Llama- 2, the tool use was very specific,Ā  whereas Llama-3 has much better tool use. WeĀ Ā 

21:41

don't have to hand code all the stuff to haveĀ  it use Google and go do a search. It can just doĀ Ā 

21:49

that. Similarly for coding and running code andĀ  a bunch of stuff like that. Once you kind of getĀ Ā 

22:00

that capability, then you get a peek at what weĀ  can start doing next. We don't necessarily wantĀ Ā 

22:06

to wait until Llama-4 is around to start buildingĀ  those capabilities, so we can start hacking aroundĀ Ā 

22:10

it. You do a bunch of hand coding and thatĀ  makes the products better, if only for theĀ Ā 

22:16

interim. That helps show the way then of what weĀ  want to build into the next version of the model.Ā 

22:21

What is the community fine tune of Llama-3Ā  that you're most excited for? Maybe not theĀ Ā 

22:25

one that will be most useful to you, but theĀ  one you'll just enjoy playing with the most.Ā Ā 

22:29

They fine-tune it on antiquity andĀ  you'll just be talking to VirgilĀ Ā 

22:32

or something. What are you excited about? I think the nature of the stuff is that youĀ Ā 

22:39

get surprised. Any specific thing that I thoughtĀ  would be valuable, we'd probably be building. IĀ Ā 

22:53

think you'll get distilled versions. IĀ  think you'll get smaller versions. OneĀ Ā 

22:58

thing is that I think 8B isnā€™t quite smallĀ  enough for a bunch of use cases. Over time I'dĀ Ā 

23:07

love to get a 1-2B parameter model, or even a 500MĀ  parameter model and see what you can do with that.Ā 

23:18

If with 8B parameters weā€™re nearly asĀ  powerful as the largest Llama-2 model,Ā Ā 

23:23

then with a billion parameters you should be ableĀ  to do something that's interesting, and faster.Ā Ā 

23:28

Itā€™d be good for classification, or a lot ofĀ  basic things that people do before understandingĀ Ā 

23:35

the intent of a user query and feeding itĀ  to the most powerful model to hone in onĀ Ā 

23:41

what the prompt should be. I think that's oneĀ  thing that maybe the community can help fillĀ Ā 

23:46

in. We're also thinking about getting around toĀ  distilling some of these ourselves but right nowĀ Ā 

23:52

the GPUs are pegged training the 405B. ā€Ø So you have all these GPUs. I think youĀ Ā 

24:00

said 350,000 by the end of the year. ā€Ø That's the whole fleet. We built two,Ā Ā 

24:06

I think 22,000 or 24,000 clusters that are theĀ  single clusters that we have for training the bigĀ Ā 

24:13

models, obviously across a lot of the stuff thatĀ  we do. A lot of our stuff goes towards trainingĀ Ā 

24:18

Reels models and Facebook News Feed and InstagramĀ  Feed. Inference is a huge thing for us because weĀ Ā 

24:24

serve a ton of people. Our ratio of inferenceĀ  compute required to training is probably muchĀ Ā 

24:33

higher than most other companies that are doingĀ  this stuff just because of the sheer volume ofĀ Ā 

24:37

the community that we're serving. In the material they shared withĀ Ā 

24:41

me before, it was really interesting that youĀ  trained it on more data than is compute optimalĀ Ā 

24:45

just for training. The inference is such a bigĀ  deal for you guys, and also for the community,Ā Ā 

24:49

that it makes sense to just have this thingĀ  and have trillions of tokens in there.Ā 

24:53

Although one of the interestingĀ  things about it, even with the 70B,Ā Ā 

24:57

is that we thought it would get more saturated. WeĀ  trained it on around 15 trillion tokens. I guessĀ Ā 

25:06

our prediction going in was that it was goingĀ  to asymptote more, but even by the end it wasĀ Ā 

25:12

still learning.ā€ØWe probably could have fed it moreĀ  tokens and it would have gotten somewhat better.Ā 

25:19

At some point you're running a company and youĀ  need to do these meta reasoning questions. Do IĀ Ā 

25:24

want to spend our GPUs on training the 70B modelĀ  further? Do we want to get on with it so we canĀ Ā 

25:31

start testing hypotheses for Llama-4? We neededĀ  to make that call and I think we got a reasonableĀ Ā 

25:39

balance for this version of the 70B. There'llĀ  be others in the future, the 70B multimodal one,Ā Ā 

25:45

that'll come over the next period. But thatĀ  was fascinating that the architectures atĀ Ā 

25:53

this point can just take so much data. That's really interesting. What does thisĀ Ā 

25:57

imply about future models? You mentioned thatĀ  the Llama-3 8B is better than the Llama-2 70B.Ā 

26:03

No, no, it's nearly as good.Ā  I donā€™t want to overstateĀ Ā 

26:06

it. Itā€™s in a similar order of magnitude. Does that mean the Llama-4 70B will beĀ Ā 

26:10

as good as the Llama-3 405B? WhatĀ  does the future of this look like?Ā 

26:14

This is one of the great questions, right? I thinkĀ  no one knows. One of the trickiest things in theĀ Ā 

26:22

world to plan around is an exponentialĀ  curve. How long does it keep going for?Ā Ā 

26:29

I think it's likely enough that we'll keep going.Ā  I think itā€™s worth investing the $10Bs or $100B+Ā Ā 

26:37

in building the infrastructure and assuming thatĀ  if it keeps going you're going to get some reallyĀ Ā 

26:43

amazing things that are going to make amazingĀ  products. I don't think anyone in the industryĀ Ā 

26:49

can really tell you that it will continue scalingĀ  at that rate for sure. In general in history,Ā Ā 

26:56

you hit bottlenecks at certain points.Ā  Now there's so much energy on this thatĀ Ā 

27:01

maybe those bottlenecks get knocked over prettyĀ  quickly. I think thatā€™s an interesting question.ā€ØĀ 

27:08

What does the world look like where there aren'tĀ  these bottlenecks? Suppose progress just continuesĀ Ā 

27:13

at this pace, which seems plausible.Ā  Zooming out and forgetting about Llamasā€¦Ā 

27:18

Well, there are going to be different bottlenecks.Ā  Over the last few years, I think there was thisĀ Ā 

27:28

issue of GPU production. Even companies that hadĀ  the money to pay for the GPUs couldn't necessarilyĀ Ā 

27:39

get as many as they wanted because there were allĀ  these supply constraints. Now I think that's sortĀ Ā 

27:44

of getting less. So you're seeing a bunch ofĀ  companies thinking now about investing a lotĀ Ā 

27:52

of money in building out these things. I thinkĀ  that that will go on for some period of time.Ā Ā 

28:00

There is a capital question. At what point doesĀ  it stop being worth it to put the capital in?Ā 

28:06

I actually think before we hit that, you'reĀ  going to run into energy constraints. I don'tĀ Ā 

28:14

think anyone's built a gigawatt single trainingĀ  cluster yet. You run into these things that justĀ Ā 

28:21

end up being slower in the world. Getting energyĀ  permitted is a very heavily regulated governmentĀ Ā 

28:30

function. You're going from software, whichĀ  is somewhat regulated and I'd argue itā€™s moreĀ Ā 

28:37

regulated than a lot of people in the techĀ  community feel. Obviously itā€™s different ifĀ Ā 

28:42

you're starting a small company, maybe youĀ  feel that less. We interact with differentĀ Ā 

28:47

governments and regulators and we have lotsĀ  of rules that we need to follow and make sureĀ Ā 

28:53

we do a good job with around the world. ButĀ  I think that there's no doubt about energy.Ā 

28:59

If you're talking about building large newĀ  power plants or large build-outs and thenĀ Ā 

29:04

building transmission lines that cross otherĀ  private or public land, thatā€™s just a heavilyĀ Ā 

29:11

regulated thing. You're talking about manyĀ  years of lead time. If we wanted to stand upĀ Ā 

29:17

some massive facility, powering that is a veryĀ  long-term project. I think people do it but IĀ Ā 

29:31

don't think this is something that can be quiteĀ  as magical as just getting to a level of AI,Ā Ā 

29:36

getting a bunch of capital and putting it in, andĀ  then all of a sudden the models are just going toā€¦Ā Ā 

29:42

You do hit different bottlenecks along the way. Is there something, maybe an AI-related project orĀ Ā 

29:47

maybe not, that even a company like Meta doesn'tĀ  have the resources for? Something where if yourĀ Ā 

29:51

R&D budget or capex budget were 10x what it isĀ  now, then you could pursue it? Something thatā€™sĀ Ā 

29:56

in the back of your mind but with Meta today,Ā  you can't even issue stock or bonds for it?Ā Ā 

30:01

It's just like 10x bigger than your budget? I think energy is one piece. I think weĀ Ā 

30:07

would probably build out bigger clusters than weĀ  currently can if we could get the energy to do it.Ā 

30:18

That's fundamentally money-bottleneckedĀ  in the limit? If you had $1 trillionā€¦Ā 

30:23

I think itā€™s time. It depends on how far theĀ  exponential curves go. Right now a lot ofĀ Ā 

30:36

data centers are on the order of 50 megawatts orĀ  100MW, or a big one might be 150MW. Take a wholeĀ Ā 

30:42

data center and fill it up with all the stuffĀ  that you need to do for training and you buildĀ Ā 

30:46

the biggest cluster you can. I think a bunchĀ  of companies are running at stuff like that.Ā 

30:53

But when you start getting into building aĀ  data center that's like 300MW or 500MW or 1 GW,Ā Ā 

31:04

no one has built a 1GW data center yet. I thinkĀ  it will happen. This is only a matter of time butĀ Ā 

31:09

it's not going to be next year. Some of theseĀ  things will take some number of years to buildĀ Ā 

31:18

out. Just to put this in perspective, I think aĀ  gigawatt would be the size of a meaningful nuclearĀ Ā 

31:31

power plant only going towards training a model. ā€Ø Didn't Amazon do this? They have a 950MWā€“Ā 

31:39

I'm not exactly sure what theyĀ  did. You'd have to ask them. ā€ØĀ 

31:44

But it doesnā€™t have to be in theĀ  same place, right? If distributedĀ Ā 

31:45

training works, it can be distributed. Well, I think that is a big question, howĀ Ā 

31:49

that's going to work. It seems quite possible thatĀ  in the future, more of what we call training forĀ Ā 

31:56

these big models is actually more along the linesĀ  of inference generating synthetic data to then goĀ Ā 

32:05

feed into the model. I don't know what that ratioĀ  is going to be but I consider the generation ofĀ Ā 

32:11

synthetic data to be more inference than trainingĀ  today. Obviously if you're doing it in orderĀ Ā 

32:16

to train a model, it's part of the broaderĀ  training process. So that's an open question,Ā Ā 

32:24

the balance of that and how that plays out. Would that potentially also be the case withĀ Ā 

32:30

Llama-3, and maybe Llama-4 onwards? As in, youĀ  put this out and if somebody has a ton of compute,Ā Ā 

32:36

then they can just keep making these thingsĀ  arbitrarily smarter using the models thatĀ Ā 

32:37

you've put out. Letā€™s say thereā€™s someĀ  random country, like Kuwait or the UAE,Ā Ā 

32:43

that has a ton of compute and they can actuallyĀ  just use Llama-4 to make something much smarter.Ā 

32:52

I do think there are going to beĀ  dynamics like that, but I also thinkĀ Ā 

32:59

there is a fundamental limitation on the modelĀ  architecture. I think like a 70B model that weĀ Ā 

33:13

trained with a Llama-3 architecture can getĀ  better, it can keep going. As I was saying,Ā Ā 

33:18

we felt that if we kept on feeding it more dataĀ  or rotated the high value tokens through again,Ā Ā 

33:24

then it would continue getting better. We'veĀ  seen a bunch of different companies aroundĀ Ā 

33:31

the world basically take the Llama-2 70B modelĀ  architecture and then build a new model. But it'sĀ Ā 

33:41

still the case that when you make a generationalĀ  improvement to something like the Llama-3 70B orĀ Ā 

33:46

the Llama-3 405B, there isnā€™t anything likeĀ  that open source today. I think that's a bigĀ Ā 

33:54

step function. What people are going to be able toĀ  build on top of that I think canā€™t go infinitelyĀ Ā 

33:59

from there. There can be some optimization inĀ  that until you get to the next step function.Ā 

34:05

Let's zoom out a little bit from specificĀ  models and even the multi-year lead timesĀ Ā 

34:11

you would need to get energy approvals and soĀ  on. Big picture, what's happening with AI theseĀ Ā 

34:15

next couple of decades? Does it feel likeĀ  another technology like the metaverse orĀ Ā 

34:21

social, or does it feel like a fundamentallyĀ  different thing in the course of human history?Ā 

34:29

I think it's going to be pretty fundamental. IĀ  think it's going to be more like the creationĀ Ā 

34:34

of computing in the first place. You'll get allĀ  these new apps in the same way as when you gotĀ Ā 

34:44

the web or you got mobile phones. People basicallyĀ  rethought all these experiences as a lot of thingsĀ Ā 

34:50

that weren't possible before became possible.Ā  So I think that will happen, but I think it'sĀ Ā 

34:56

a much lower-level innovation. My sense isĀ  that it's going to be more like people goingĀ Ā 

35:01

from not having computers to having computers. Itā€™s very hard to reason about exactly how thisĀ Ā 

35:16

goes. In the cosmic scale obviously it'll happenĀ  quickly, over a couple of decades or something.Ā Ā 

35:27

There is some set of people who are afraid of itĀ  really spinning out and going from being somewhatĀ Ā 

35:33

intelligent to extremely intelligent overnight.Ā  I just think that there's all these physicalĀ Ā 

35:37

constraints that make that unlikely to happen. IĀ  just don't really see that playing out. I thinkĀ Ā 

35:45

we'll have time to acclimate a bit. But it willĀ  really change the way that we work and give peopleĀ Ā 

35:51

all these creative tools to do different things.Ā  I think it's going to really enable people to doĀ Ā 

36:00

the things that they want a lot more. So maybe not overnight, but is it yourĀ Ā 

36:05

view that on a cosmic scale we can think ofĀ  these milestones in this way? Humans evolved,Ā Ā 

36:09

and then AI happened, and then they went outĀ  into the galaxy. Maybe it takes many decades,Ā Ā 

36:15

maybe it takes a century, but is that the grandĀ  scheme of what's happening right now in history? ā€ØĀ 

36:22

Sorry, in what sense? In the sense that there wereĀ Ā 

36:25

other technologies, like computers and evenĀ  fire, but the development of AI itself is asĀ Ā 

36:29

significant as humans evolving in the first place. I think that's tricky.ā€ØThe history of humanityĀ Ā 

36:39

has been people basically thinking that certainĀ  aspects of humanity are really unique in differentĀ Ā 

36:50

ways and then coming to grips with the fact thatĀ  that's not true, but that humanity is actuallyĀ Ā 

36:57

still super special. We thought that the earthĀ  was the center of the universe and it's not,Ā Ā 

37:06

but humans are still prettyĀ  awesome and pretty unique, right?Ā 

37:12

I think another bias that people tendĀ  to have is thinking that intelligenceĀ Ā 

37:17

is somehow fundamentally connected to life.Ā  It's not actually clear that it is. I don'tĀ Ā 

37:32

know that we have a clear enough definition ofĀ  consciousness or life to fully interrogate this.Ā Ā 

37:42

There's all this science fiction about creatingĀ  intelligence where it starts to take on all theseĀ Ā 

37:47

human-like behaviors and things like that. TheĀ  current incarnation of all this stuff feels likeĀ Ā 

37:54

it's going in a direction where intelligenceĀ  can be pretty separated from consciousness,Ā Ā 

37:59

agency, and things like that, which IĀ  think just makes it a super valuable tool.Ā 

38:06

Obviously it's very difficult to predictĀ  what direction this stuff goes in over time,Ā Ā 

38:10

which is why I don't think anyone should beĀ  dogmatic about how they plan to develop itĀ Ā 

38:16

or what they plan to do. You want to lookĀ  at it with each release. We're obviouslyĀ Ā 

38:20

very pro open source, but I haven't committedĀ  to releasing every single thing that we do.Ā Ā 

38:27

Iā€™m basically very inclined to think thatĀ  open sourcing is going to be good for theĀ Ā 

38:32

community and also good for us because we'llĀ  benefit from the innovations. If at some pointĀ Ā 

38:38

however there's some qualitative change in whatĀ  the thing is capable of, and we feel like it'sĀ Ā 

38:43

not responsible to open source it, then weĀ  won't. It's all very difficult to predict.Ā 

38:52

What is a kind of specific qualitative changeĀ  where you'd be training Llama-5 or Llama-4,Ā Ā 

38:57

and if you see it, itā€™d make you think ā€œyou knowĀ  what, I'm not sure about open sourcing itā€?ā€ØĀ 

39:05

It's a little hard to answer that inĀ  the abstract because there are negativeĀ Ā 

39:09

behaviors that any product can exhibitĀ  where as long as you can mitigate it,Ā Ā 

39:15

it's okay. Thereā€™s bad things about social mediaĀ  that we work to mitigate. There's bad things aboutĀ Ā 

39:23

Llama-2 where we spend a lot of time tryingĀ  to make sure that it's not like helping peopleĀ Ā 

39:28

commit violent acts or things like that. ThatĀ  doesn't mean that it's a kind of autonomous orĀ Ā 

39:34

intelligent agent. It just means that it's learnedĀ  a lot about the world and it can answer a set ofĀ Ā 

39:38

questions that we think would be unhelpful for itĀ  to answer. I think the question isn't really whatĀ Ā 

39:49

behaviors would it show, it's what things wouldĀ  we not be able to mitigate after it shows that.Ā 

39:59

I think that there's so many ways in whichĀ  something can be good or bad that it's hardĀ Ā 

40:03

to actually enumerate them all up front. Look atĀ  what we've had to deal with in social media andĀ Ā 

40:10

the different types of harms. We've basicallyĀ  gotten to like 18 or 19 categories of harmfulĀ Ā 

40:15

things that people do and we've basically builtĀ  AI systems to identify what those things are andĀ Ā 

40:23

to make sure that doesn't happen on our networkĀ  as much as possible. Over time I think you'llĀ Ā 

40:29

be able to break this down into more of aĀ  taxonomy too. I think this is a thing thatĀ Ā 

40:34

we spend time researching as well, because weĀ  want to make sure that we understand that. ā€ØĀ 

41:46

It seems to me that it would be a good idea.Ā  I would be disappointed in a future where AIĀ Ā 

41:50

systems aren't broadly deployed and everybodyĀ  doesn't have access to them. At the same time,Ā Ā 

41:55

I want to better understand the mitigations.Ā  If the mitigation is the fine-tuning,Ā Ā 

42:00

the whole thing about open weights is that youĀ  can then remove the fine-tuning, which is oftenĀ Ā 

42:06

superficial on top of these capabilities. If it'sĀ  like talking on Slack with a biology researcherā€¦Ā Ā 

42:12

I think models are very far from this. RightĀ  now, theyā€™re like Google search. But if I canĀ Ā 

42:17

show them my Petri dish and they can explain whyĀ  my smallpox sample didnā€™t grow and what to change,Ā Ā 

42:23

how do you mitigate that? Because somebodyĀ  can just fine-tune that in there, right?Ā 

42:29

That's true. I think a lot of people willĀ  basically use the off-the-shelf model and someĀ Ā 

42:35

people who have basically bad faith are going toĀ  try to strip out all the bad stuff. So I do thinkĀ Ā 

42:41

that's an issue. On the flip side, one of theĀ  reasons why I'm philosophically so pro open sourceĀ Ā 

42:52

is that I do think that a concentration of AI inĀ  the future has the potential to be as dangerous asĀ Ā 

43:02

it being widespread. I think a lot of people thinkĀ  about the questions of ā€œif we can do this stuff,Ā Ā 

43:08

is it bad for it to be out in the wild and justĀ  widely available?ā€ I think another version ofĀ Ā 

43:15

this is that it's probably also pretty badĀ  for one institution to have an AI that isĀ Ā 

43:25

way more powerful than everyone else's AI. Thereā€™s one security analogy that I thinkĀ Ā 

43:31

of. There are so many security holes in so manyĀ  different things. If you could travel back inĀ Ā 

43:42

time a year or two years, let's say you just haveĀ  one or two years more knowledge of the securityĀ Ā 

43:50

holes. You can pretty much hack into any system.Ā  Thatā€™s not AI. So it's not that far-fetched toĀ Ā 

43:55

believe that a very intelligent AI probably wouldĀ  be able to identify some holes and basicallyĀ Ā 

44:03

be like a human who could go back in time aĀ  year or two and compromise all these systems.Ā 

44:07

So how have we dealt with that as a society?Ā  One big part is open source software thatĀ Ā 

44:13

makes it so that when improvements are made toĀ  the software, it doesn't just get stuck in oneĀ Ā 

44:18

company's products but can be broadly deployed toĀ  a lot of different systems, whether theyā€™re banksĀ Ā 

44:24

or hospitals or government stuff. As the softwareĀ  gets hardened, which happens because more peopleĀ Ā 

44:31

can see it and more people can bang on it, thereĀ  are standards on how this stuff works. The worldĀ Ā 

44:37

can get upgraded together pretty quickly. I think that a world where AI is very widelyĀ Ā 

44:44

deployed, in a way where it's gotten hardenedĀ  progressively over time, is one where all theĀ Ā 

44:52

different systems will be in check in a way. ThatĀ  seems fundamentally more healthy to me than oneĀ Ā 

44:58

where this is more concentrated. So there areĀ  risks on all sides, but I think that's a riskĀ Ā 

45:05

that I don't hear people talking about quite asĀ  much. There's the risk of the AI system doingĀ Ā 

45:13

something bad. But I stay up at night worryingĀ  more about an untrustworthy actor having the superĀ Ā 

45:27

strong AI, whether it's an adversarial governmentĀ  or an untrustworthy company or whatever. I thinkĀ Ā 

45:39

that that's potentially a much bigger risk. ā€Ø As in, they could overthrow our government becauseĀ Ā 

45:47

they have a weapon that nobody else has? Or just cause a lot of mayhem. I think theĀ Ā 

45:55

intuition is that this stuff ends up beingĀ  pretty important and valuable for bothĀ Ā 

46:01

economic and security reasons and other things.Ā  If someone whom you don't trust or an adversaryĀ Ā 

46:11

gets something more powerful, then I think thatĀ  that could be an issue. Probably the best wayĀ Ā 

46:16

to mitigate that is to have good open sourceĀ  AI that becomes the standard and in a lot ofĀ Ā 

46:24

ways can become the leader. It just ensures thatĀ  it's a much more even and balanced playing field.Ā 

46:33

That seems plausible to me. If that works out,Ā  that would be the future I prefer. I want toĀ Ā 

46:38

understand mechanistically how the fact thatĀ  there are open source AI systems in the worldĀ Ā 

46:47

prevents somebody causing mayhem with their AIĀ  system? With the specific example of somebodyĀ Ā 

46:50

coming with a bioweapon, is it just that we'll doĀ  a bunch of R&D in the rest of the world to figureĀ Ā 

46:55

out vaccines really fast? What's happening? If you take the security one that I wasĀ Ā 

46:59

talking about, I think someone withĀ  a weaker AI trying to hack into aĀ Ā 

47:03

system that is protected by a stronger AI willĀ  succeed less. In terms of software securityā€“Ā 

47:12

How do we know everything in the world is likeĀ  that? What if bioweapons aren't like that? ā€ØĀ 

47:16

I mean, I don't know that everything in theĀ  world is like that. Bioweapons are one of theĀ Ā 

47:25

areas where the people who are most worried aboutĀ  this stuff are focused and I think it makes a lotĀ Ā 

47:33

of sense. There are certain mitigations. YouĀ  can try to not train certain knowledge intoĀ Ā 

47:42

the model. There are different things but atĀ  some level if you get a sufficiently bad actor,Ā Ā 

47:51

and you don't have other AI that can balanceĀ  them and understand what the threats are,Ā Ā 

48:00

then that could be a risk. That's one ofĀ  the things that we need to watch out for.Ā 

48:05

Is there something you could see in the deploymentĀ  of these systems where you're training Llama-4 andĀ Ā 

48:12

it lied to you because it thought you weren'tĀ  noticing or something and you're like ā€œwhoaĀ Ā 

48:17

what's going on here?ā€ This is probably notĀ  likely with a Llama-4 type system, but isĀ Ā 

48:22

there something you can imagine like that whereĀ  you'd be really concerned about deceptiveness andĀ Ā 

48:27

billions of copies of this being out in the wild? I mean right now we see a lot of hallucinations.Ā Ā 

48:37

It's more so that. I think it's an interestingĀ  question, how you would tell the differenceĀ Ā 

48:43

between hallucination and deception. There areĀ  a lot of risks and things to think about. I try,Ā Ā 

48:57

in running our company at least, to balanceĀ  these longer-term theoretical risks withĀ Ā 

49:07

what I actually think are quite real risks thatĀ  exist today. So when you talk about deception,Ā Ā 

49:14

the form of that that I worry about most isĀ  people using this to generate misinformationĀ Ā 

49:18

and then pump that through our networks orĀ  others. The way that we've combated this typeĀ Ā 

49:26

of harmful content is by building AI systemsĀ  that are smarter than the adversarial ones.Ā 

49:33

This informs part of my theory on this. If youĀ  look at the different types of harm that peopleĀ Ā 

49:38

do or try to do through social networks, there areĀ  ones that are not very adversarial. For example,Ā Ā 

49:50

hate speech is not super adversarial in the senseĀ  that people aren't getting better at being racist.Ā Ā 

50:03

That's one where I think the AIs are generallyĀ  getting way more sophisticated faster than peopleĀ Ā 

50:08

are at those issues. And we have issues bothĀ  ways. People do bad things, whether they'reĀ Ā 

50:15

trying to incite violence or something, butĀ  we also have a lot of false positives where weĀ Ā 

50:20

basically censor stuff that we shouldn't. I thinkĀ  that understandably makes a lot of people annoyed.Ā Ā 

50:25

So I think having an AI that gets increasinglyĀ  precise on that is going to be good over time.Ā 

50:30

But let me give you another example: nationĀ  states trying to interfere in elections. That'sĀ Ā 

50:35

an example where they absolutely have cutting edgeĀ  technology and absolutely get better each year. SoĀ Ā 

50:41

we block some technique, they learn what we didĀ  and come at us with a different technique. It'sĀ Ā 

50:46

not like a person trying to say mean things, TheyĀ  have a goal. They're sophisticated. They have aĀ Ā 

50:56

lot of technology. In those cases, I still thinkĀ  about the ability to have our AI systems grow inĀ Ā 

51:04

sophistication at a faster rate than theirs do.Ā  It's an arms race but I think we're at leastĀ Ā 

51:09

winning that arms race currently. This is a lotĀ  of the stuff that I spend time thinking about.Ā 

51:18

Yes, whether it's Llama-4 or Llama-6, we need toĀ  think about what behaviors we're observing andĀ Ā 

51:26

it's not just us. Part of the reason why you makeĀ  this open source is that there are a lot of otherĀ Ā 

51:29

people who study this too. So we want to see whatĀ  other people are observing, what weā€™re observing,Ā Ā 

51:35

what we can mitigate, and then we'll makeĀ  our assessment on whether we can make itĀ Ā 

51:40

open source. For the foreseeable future I'mĀ  optimistic we will be able to. In the near term,Ā Ā 

51:49

I don't want to take our eye off the ballĀ  in terms of what are actual bad things thatĀ Ā 

51:53

people are trying to use the models for today.Ā  Even if they're not existential, there areĀ Ā 

51:58

pretty bad day-to-day harms that we're familiarĀ  with in running our services. That's actually aĀ Ā 

52:05

lot of what we have to spend our time on as well. I found the synthetic data thing really curious.Ā Ā 

52:14

With current models it makes sense why there mightĀ  be an asymptote with just doing the synthetic dataĀ Ā 

52:19

again and again. But letā€™s say they get smarterĀ  and you use the kinds of techniquesā€”you talk aboutĀ Ā 

52:23

in the paper or the blog posts that are coming outĀ  on the day this will be releasedā€”where it goes toĀ Ā 

52:29

the thought chain that is the most correct.Ā  Why do you think this wouldn't lead to a loopĀ Ā 

52:36

where it gets smarter, makes better output, getsĀ  smarter and so forth. Of course it wouldn't beĀ Ā 

52:36

overnight, but over many months or years ofĀ  training potentially with a smarter model.Ā 

52:45

I think it could, within the parameters ofĀ  whatever the model architecture is. It's justĀ Ā 

52:49

that with today's 8B parameter models, I don'tĀ  think you're going to get to be as good as theĀ Ā 

53:04

state-of-the-art multi-hundred billionĀ  parameter models that are incorporatingĀ Ā 

53:08

new research into the architecture itself. But those will be open source as well, right?Ā 

53:15

Well yeah, subject to all the questions that weĀ  just talked about but yes. We would hope thatĀ Ā 

53:23

that'll be the case. But I think that at eachĀ  point, when you're building software there's aĀ Ā 

53:29

ton of stuff that you can do with software butĀ  then at some level you're constrained by theĀ Ā 

53:34

chips that it's running on. So there are alwaysĀ  going to be different physical constraints. HowĀ Ā 

53:42

big the models are is going to be constrainedĀ  by how much energy you can get and use forĀ Ā 

53:49

inference. I'm simultaneously very optimisticĀ  that this stuff will continue to improve quicklyĀ Ā 

53:59

and also a little more measured than I thinkĀ  some people are about it. I donā€™t think theĀ Ā 

54:11

runaway case is a particularly likely one. I think it makes sense to keep your optionsĀ Ā 

54:17

open. There's so much we don't know. There's aĀ  case in which it's really important to keep theĀ Ā 

54:22

balance of power so nobody becomes a totalitarianĀ  dictator. There's a case in which you don't wantĀ Ā 

54:26

to open source the architecture because China canĀ  use it to catch up to America's AIs and there isĀ Ā 

54:32

an intelligence explosion and they win that. A lotĀ  of things seem possible. Keeping your options openĀ Ā 

54:38

considering all of them seems reasonable. Yeah.Ā 

54:42

Let's talk about some other things. Metaverse.Ā  What time period in human history would you beĀ Ā 

54:48

most interested in going into? 100,000 BCE toĀ  now, you just want to see what it was like?Ā 

54:53

It has to be the past? Oh yeah, it has to be the past.ā€ØĀ 

55:04

I'm really interested in American history andĀ  classical history. I'm really interested in theĀ Ā 

55:10

history of science too. I actually think seeingĀ  and trying to understand more about how some ofĀ Ā 

55:19

the big advances came about would be interesting.Ā  All we have are somewhat limited writings aboutĀ Ā 

55:24

some of that stuff. I'm not sure the metaverseĀ  is going to let you do that because it's goingĀ Ā 

55:29

to be hard to go back in time for things thatĀ  we don't have records of. I'm actually not sureĀ Ā 

55:38

that going back in time is going to be thatĀ  important of a thing. I think it's going toĀ Ā 

55:42

be cool for like history classes and stuff,Ā  but that's probably not the use case that I'mĀ Ā 

55:47

most excited about for the metaverse overall. The main thing is just the ability to feelĀ Ā 

55:53

present with people, no matter where you are.Ā  I think that's going to be killer. In the AIĀ Ā 

56:00

conversation that we were having, so much of itĀ  is about physical constraints that underlie allĀ Ā 

56:08

of this. I think one lesson of technology isĀ  that you want to move things from the physicalĀ Ā 

56:14

constraint realm into software as much as possibleĀ  because software is so much easier to build andĀ Ā 

56:20

evolve. You can democratize it more becauseĀ  not everyone is going to have a data center butĀ Ā 

56:26

a lot of people can write code and take openĀ  source code and modify it. Ī¤he metaverseĀ Ā 

56:33

version of this is enabling realistic digitalĀ  presence. Thatā€™s going to be an absolutely hugeĀ Ā 

56:43

difference so people don't feel like they haveĀ  to be physically together for as many things.Ā Ā 

56:51

Now I think that there can be things that areĀ  better about being physically together. TheseĀ Ā 

56:57

things aren't binary. It's not going to be likeĀ  ā€œokay, now you don't need to do that anymore.ā€Ā Ā 

57:01

But overall, I think it's just going to beĀ  really powerful for socializing, for feelingĀ Ā 

57:11

connected with people, for working, for partsĀ  of industry, for medicine, for so many things. ā€ØĀ 

57:20

I want to go back to something you said at theĀ  beginning of the conversation. You didn't sellĀ Ā 

57:23

the company for a billion dollars. And withĀ  the metaverse, you knew you were going toĀ Ā 

57:26

do this even though the market was hammeringĀ  you for it. I'm curious. What is the sourceĀ Ā 

57:31

of that edge? You said ā€œoh, values, I haveĀ  this intuition,ā€ but everybody says that. IfĀ Ā 

57:37

you had to say something that's specific toĀ  you, how would you express what that is? WhyĀ Ā 

57:41

were you so convinced about the metaverse?ā€Ø I think that those are different questions.Ā Ā 

57:52

What are the things that power me? We'veĀ  talked about a bunch of the themes. I justĀ Ā 

58:02

really like building things. I specifically likeĀ  building things around how people communicate andĀ Ā 

58:10

understanding how people express themselvesĀ  and how people work. When I was in collegeĀ Ā 

58:13

I studied computer science and psychology. IĀ  think a lot of other people in the industryĀ Ā 

58:18

studied computer science. So, it's always beenĀ  the intersection of those two things for me.Ā 

58:27

Itā€™s also sort of this really deep drive. IĀ  don't know how to explain it but I just feelĀ Ā 

58:36

constitutionally that I'm doing something wrong ifĀ  I'm not building something new. Even when we wereĀ Ā 

58:50

putting together the business case for investingĀ  a $100 billion in AI or some huge amount in theĀ Ā 

58:58

metaverse, we have plans that I think madeĀ  it pretty clear that if our stuff works,Ā Ā 

59:03

it'll be a good investment. But you can't knowĀ  for certain from the outset. There are all theseĀ Ā 

59:10

arguments that people have, with advisorsĀ  or different folks. It's like, ā€œhow are youĀ Ā 

59:19

confident enough to do this?ā€ Well the day I stopĀ  trying to build new things, I'm just done. I'mĀ Ā 

59:26

going to go build new things somewhere else. I'mĀ  fundamentally incapable of running something,Ā Ā 

59:37

or in my own life, and not trying to build newĀ  things that I think are interesting. That's notĀ Ā 

59:43

even a question for me, whether we're going toĀ  take a swing at building the next thing. I'mĀ Ā 

59:51

just incapable of not doing that. I don't know. I'm kind of like this in all the different aspectsĀ Ā 

60:01

of my life. Our family built this ranch in KauaiĀ  and I worked on designing all these buildings. WeĀ Ā 

60:14

started raising cattle and I'm like ā€œalright, IĀ  want to make the best cattle in the world so howĀ Ā 

60:19

do we architect this so that way we can figureĀ  this out and build all the stuff up that weĀ Ā 

60:24

need to try to do that.ā€ I don't know, that'sĀ  me. What was the other part of the question?Ā 

61:37

I'm not sure but I'm actually curiousĀ  about something else. So a 19-year-oldĀ Ā 

61:42

Mark reads a bunch of antiquity andĀ  classics in high school and college.Ā Ā 

61:48

What important lesson did you learn fromĀ  it? Not just interesting things you found,Ā Ā 

61:50

but there aren't that many tokens you consume byĀ  the time you're 19. A bunch of them were about theĀ Ā 

61:55

classics. Clearly that was important in some way. There aren't that many tokens you consume...Ā Ā 

62:06

That's a good question. Hereā€™s one of the thingsĀ  I thought was really fascinating. Augustus becameĀ Ā 

62:19

emperor and he was trying to establish peace.Ā  There was no real conception of peace at theĀ Ā 

62:30

time. The people's understanding of peace wasĀ  peace as the temporary time between when yourĀ Ā 

62:36

enemies inevitably attack you. So you get aĀ  short rest. He had this view of changing theĀ Ā 

62:44

economy from being something mercenary andĀ  militaristic to this actually positive-sumĀ Ā 

62:53

thing. It was a very novel idea at the time. Thatā€™s something that's really fundamental:Ā Ā 

63:07

the bounds on what people can conceiveĀ  of at the time as rational ways to work.Ā Ā 

63:17

This applies to both the metaverse and the AIĀ  stuff. A lot of investors, and other people,Ā Ā 

63:22

can't wrap their head around why we would openĀ  source this. Itā€™s like ā€œI don't understand, itā€™sĀ Ā 

63:29

open source. That must just be the temporary timeĀ  between which you're making things proprietary,Ā Ā 

63:34

right?ā€ I think it's this very profound thing inĀ  tech that it actually creates a lot of winners.Ā 

63:49

I don't want to strain the analogy tooĀ  much but I do think that a lot of the time,Ā Ā 

63:56

there are models for building things thatĀ  people often can't even wrap their headĀ Ā 

64:06

around. They canā€™t understand how that would be aĀ  valuable thing for people to do or how it would beĀ Ā 

64:11

a reasonable state of the world. I think thereĀ  are more reasonable things than people think.Ā 

64:20

That's super fascinating. Can I give you whatĀ  I was thinking in terms of what you might haveĀ Ā 

64:24

gotten from it? This is probably totally off,Ā  but I think itā€™s just how young some of theseĀ Ā 

64:29

people are, who have very important rolesĀ  in the empire. For example, Caesar Augustus,Ā Ā 

64:33

by the time heā€™s 19, is already one of the mostĀ  important people in Roman politics. He's leadingĀ Ā 

64:39

battles and forming the Second Triumvirate. IĀ  wonder if the 19-year-old you was thinking ā€œIĀ Ā 

64:42

can do this because Caesar Augustus did this.ā€ That's an interesting example, both from a lotĀ Ā 

64:48

of history and American history too. One of myĀ  favorite quotes is this Picasso quote that allĀ Ā 

64:56

children are artists and the challenge is toĀ  remain an artist as you grow up. When youā€™reĀ Ā 

65:02

younger, itā€™s just easier to have wild ideas.Ā  There are all these analogies to the innovatorā€™sĀ Ā 

65:14

dilemma that exist in your life as well as forĀ  your company or whatever youā€™ve built. Youā€™reĀ Ā 

65:20

earlier on in your trajectory so it's easier toĀ  pivot and take in new ideas without disruptingĀ Ā 

65:26

other commitments to different things.Ā  I think that's an interesting part ofĀ Ā 

65:33

running a company. How do you stay dynamic? Letā€™s go back to the investors and open source.Ā Ā 

65:41

The $10B model, suppose it's totally safe. You'veĀ  done these evaluations and unlike in this caseĀ Ā 

65:47

the evaluators can also fine-tune the model, whichĀ  hopefully will be the case in future models. WouldĀ Ā 

65:52

you open source the $10 billion model? As long as it's helping us then yeah.Ā 

65:57

But would it? $10 billion ofĀ  R&D and now it's open source.Ā 

66:01

Thatā€™s a question which weā€™ll have to evaluateĀ  as time goes on too. We have a long history ofĀ Ā 

66:11

open sourcing software. We donā€™t tend to openĀ  source our product. We don't take the code forĀ Ā 

66:18

Instagram and make it open source. We takeĀ  a lot of the low-level infrastructure andĀ Ā 

66:24

we make that open source. Probably the biggestĀ  one in our history was our Open Compute ProjectĀ Ā 

66:29

where we took the designs for all of our servers,Ā  network switches, and data centers, and made itĀ Ā 

66:36

open source and it ended up being super helpful.Ā  Although a lot of people can design servers theĀ Ā 

66:42

industry now standardized on our design, whichĀ  meant that the supply chains basically all gotĀ Ā 

66:46

built out around our design. So volumes wentĀ  up, it got cheaper for everyone, and it savedĀ Ā 

66:50

us billions of dollars which was awesome. So there's multiple ways where open sourceĀ Ā 

66:56

could be helpful for us. One is if people figureĀ  out how to run the models more cheaply. We'reĀ Ā 

67:01

going to be spending tens, or a hundred billionĀ  dollars or more over time on all this stuff. SoĀ Ā 

67:08

if we can do that 10% more efficiently, we'reĀ  saving billions or tens of billions of dollars.Ā Ā 

67:12

That's probably worth a lot by itself. EspeciallyĀ  if there are other competitive models out there,Ā Ā 

67:17

it's not like our thing is givingĀ  away some kind of crazy advantage.Ā 

67:22

So is your view that theĀ  training will be commodified?Ā 

67:29

I think there's a bunch of ways that this couldĀ  play out and that's one. So ā€œcommodityā€ impliesĀ Ā 

67:39

that it's going to get very cheap because thereĀ  are lots of options. The other direction that thisĀ Ā 

67:44

could go in is qualitative improvements. YouĀ  mentioned fine-tuning. Right now it's prettyĀ Ā 

67:51

limited what you can do with fine-tuning majorĀ  other models out there. There are some optionsĀ Ā 

67:56

but generally not for the biggest models. Thereā€™sĀ  being able to do that, different app specificĀ Ā 

68:05

things or use case specific things or buildingĀ  them into specific tool chains. I think that willĀ Ā 

68:11

not only enable more efficient development, butĀ  it could enable qualitatively different things.Ā 

68:18

Here's one analogy on this. One thing that I thinkĀ  generally sucks about the mobile ecosystem is thatĀ Ā 

68:27

you have these two gatekeeper companies, Apple andĀ  Google, that can tell you what you're allowed toĀ Ā 

68:32

build. There's the economic version of that whichĀ  is like when we build something and they justĀ Ā 

68:38

take a bunch of your money. But then there's theĀ  qualitative version, which is actually what upsetsĀ Ā 

68:45

me more. There's a bunch of times when we'veĀ  launched or wanted to launch features and Apple'sĀ Ā 

68:51

just like ā€œnope, you're not launching that.ā€ ThatĀ  sucks, right? So the question is, are we set upĀ Ā 

69:01

for a world like that with AI? You're going toĀ  get a handful of companies that run these closedĀ Ā 

69:08

models that are going to be in control of the APIsĀ  and therefore able to tell you what you can build?Ā 

69:13

For us I can say it is worth it to go buildĀ  a model ourselves to make sure that we're notĀ Ā 

69:19

in that position. I don't want any of thoseĀ  other companies telling us what we can build.Ā Ā 

69:26

From an open source perspective, I think a lot ofĀ  developers don't want those companies telling themĀ Ā 

69:30

what they can build either. So the question is,Ā  what is the ecosystem that gets built out aroundĀ Ā 

69:36

that? What are interesting new things? How muchĀ  does that improve our products? I think thereĀ Ā 

69:43

are lots of cases where if this ends up being likeĀ  our databases or caching systems or architecture,Ā Ā 

69:50

we'll get valuable contributions from theĀ  community that will make our stuff better.Ā Ā 

69:54

Our app specific work that we do will then stillĀ  be so differentiated that it won't really matter.Ā Ā 

70:00

We'll be able to do what we do. We'll benefitĀ  and all the systems, ours and the communitiesā€™,Ā Ā 

70:03

will be better because it's open source. There is one world where maybeĀ Ā 

70:10

thatā€™s not the case. Maybe the model ends upĀ  being more of the product itself. I think it'sĀ Ā 

70:16

a trickier economic calculation then, whetherĀ  you open source that. You are commoditizingĀ Ā 

70:22

yourself then a lot. But from what I can see soĀ  far, it doesn't seem like we're in that zone.Ā 

70:26

Do you expect to earn significant revenueĀ  from licensing your model to the cloudĀ Ā 

70:30

providers? So they have to pay youĀ  a fee to actually serve the model.Ā 

70:36

We want to have an arrangement like that butĀ  I don't know how significant it'll be. This isĀ Ā 

70:42

basically our license for Llama. In a lot of waysĀ  it's a very permissive open source license, exceptĀ Ā 

70:51

that we have a limit for the largest companiesĀ  using it. This is why we put that limit in. We'reĀ Ā 

70:56

not trying to prevent them from using it. We justĀ  want them to come talk to us if they're going toĀ Ā 

71:00

just basically take what we built and resell itĀ  and make money off of it. If you're like MicrosoftĀ Ā 

71:07

Azure or Amazon, if you're going to be resellingĀ  the model then we should have some revenue shareĀ Ā 

71:12

on that. So just come talk to us before youĀ  go do that. That's how that's played out.Ā 

71:15

So for Llama-2, we just have deals with basicallyĀ  all these major cloud companies and Llama-2 isĀ Ā 

71:23

available as a hosted service on all thoseĀ  clouds. I assume that as we release biggerĀ Ā 

71:30

and bigger models, that will become a biggerĀ  thing. It's not the main thing that we're doing,Ā Ā 

71:33

but I think if those companies are going to beĀ  selling our models it just makes sense that weĀ Ā 

71:37

should share the upside of that somehow. Regarding other open source dangers,Ā Ā 

71:42

I think you have genuine legitimate points aboutĀ  the balance of power stuff and potentially theĀ Ā 

71:48

harms you can get rid of because we have betterĀ  alignment techniques or something. I wish thereĀ Ā 

71:52

were some sort of framework that Meta had. OtherĀ  labs have this where they say ā€œif we see thisĀ Ā 

71:57

concrete thing, then that's a no go on the openĀ  source or even potentially on deployment.ā€ JustĀ Ā 

72:03

writing it down so the company is ready for it andĀ  people have expectations around it and so forth. ā€ØĀ 

72:09

That's a fair point on the existential riskĀ  side. Right now we focus more on the types ofĀ Ā 

72:14

risks that we see today, which are more of theseĀ  content risks. We don't want the model to be doingĀ Ā 

72:24

things that are helping people commit violenceĀ  or fraud or just harming people in differentĀ Ā 

72:30

ways. While it is maybe more intellectuallyĀ  interesting to talk about the existential risks,Ā Ā 

72:30

I actually think the real harms that need moreĀ  energy in being mitigated are things where someoneĀ Ā 

72:31

takes a model and does something to hurt aĀ  person. In practice for the current models,Ā Ā 

72:35

and I would guess the next generationĀ  and maybe even the generation after that,Ā Ā 

72:42

those are the types of more mundane harms that weĀ  see today, people committing fraud against eachĀ Ā 

73:07

other or things like that. I just don't want toĀ  shortchange that. I think we have a responsibilityĀ Ā 

73:15

to make sure we do a good job on that. Meta's a big company. You can handle both.Ā 

73:22

As far as open source goes, I'm actuallyĀ  curious if you think the impact of open source,Ā Ā 

73:25

from PyTorch, React, Open Compute and otherĀ  things, has been bigger for the world thanĀ Ā 

73:30

even the social media aspects of Meta. I'veĀ  talked to people who use these servicesĀ Ā 

73:33

and they think that it's plausible because aĀ  big part of the internet runs on these things.Ā 

73:39

It's an interesting question. I mean almostĀ  half the world uses our consumer products soĀ Ā 

73:48

it's hard to beat that. But I think openĀ  source is really powerful as a new way ofĀ Ā 

73:56

building things. I mean, it's possible. ItĀ  may be one of these things like Bell Labs,Ā Ā 

74:08

where they were working on the transistor becauseĀ  they wanted to enable long-distance calling. TheyĀ Ā 

74:17

did and it ended up being really profitable forĀ  them that they were able to enable long-distanceĀ Ā 

74:20

calling. 5 to 10 years out from that, if youĀ  asked them what was the most useful thingĀ Ā 

74:29

that they invented it's like ā€œokay, we enabledĀ  long distance calling and now all these peopleĀ Ā 

74:32

are long-distance calling.ā€ But if you asked aĀ  hundred years later maybe it's a different answer.Ā 

74:38

I think that's true of a lot of the things thatĀ  we're building: Reality Labs, some of the AIĀ Ā 

74:44

stuff, some of the open source stuff. The specificĀ  products evolve, and to some degree come and go,Ā Ā 

74:50

but the advances for humanity persist andĀ  that's a cool part of what we all get to do.Ā 

74:58

By when will the Llama models beĀ  trained on your own custom silicon? ā€ØĀ 

75:06

Soon, not Llama-4. The approach that we took isĀ  we first built custom silicon that could handleĀ Ā 

75:16

inference for our ranking and recommendationĀ  type stuff, so Reels, News Feed ads, etc. ThatĀ Ā 

75:24

was consuming a lot of GPUs. When we were ableĀ  to move that to our own silicon, we're now ableĀ Ā 

75:31

to use the more expensive NVIDIA GPUs only forĀ  training. At some point we will hopefully haveĀ Ā 

75:43

silicon ourselves that we can be using for atĀ  first training some of the simpler things, thenĀ Ā 

75:48

eventually training these really large models. InĀ  the meantime, I'd say the program is going quiteĀ Ā 

75:57

well and we're just rolling it out methodicallyĀ  and we have a long-term roadmap for it. ā€ØĀ 

76:02

Final question. This is totally out ofĀ  left field. If you were made CEO of Google+Ā Ā 

76:07

could you have made it work? Google+? Oof. I don't know.Ā Ā 

76:14

That's a very difficult counterfactual. ā€Ø Okay, then the real final question will be:Ā Ā 

76:21

when Gemini was launched, wasĀ  there any chance that somebodyĀ Ā 

76:24

in the office uttered: ā€œCarthago delenda estā€. No, I think we're tamer now. It's a good question.Ā Ā 

76:38

The problem is there was no CEO of Google+. ItĀ  was just a division within a company. You askedĀ Ā 

76:45

before about what are the scarcest commoditiesĀ  but you asked about it in terms of dollars. IĀ Ā 

76:51

actually think for most companies, of this scaleĀ  at least, it's focus. When you're a startup maybeĀ Ā 

76:58

you're more constrained on capital. Youā€™re justĀ  working on one idea and you might not have allĀ Ā 

77:04

the resources. You cross some threshold at someĀ  point with the nature of what you're doing. You'reĀ Ā 

77:10

building multiple things. You're creatingĀ  more value across them but you become moreĀ Ā 

77:14

constrained on what you can direct to go well. There are always the cases where somethingĀ Ā 

77:22

random awesome happens in the organization and IĀ  don't even know about it. Those are great. But IĀ Ā 

77:28

think in general, the organization's capacityĀ  is largely limited by what the CEO and theĀ Ā 

77:37

management team are able to oversee and manage.Ā  That's been a big focus for us. As Ben HorowitzĀ Ā 

77:49

says ā€œkeep the main thing, the main thingā€ andĀ  try to stay focused on your key priorities.Ā 

77:59

Awesome,ā€Øthat was excellent, Mark.Ā  Thanks so much. That was a lot of fun.Ā 

78:01

Yeah, really fun. Thanks for having me. Absolutely.

Rate This
ā˜…
ā˜…
ā˜…
ā˜…
ā˜…

5.0 / 5 (0 votes)

Related Tags
AI FutureMetaverseOpen SourceInnovationTech IndustryInterview InsightsContent CreationSocial MediaData CentersSoftware Development