'Her' AI, Almost Here? Llama 3, Vasa-1, and Altman 'Plugging Into Everything You Want To Do'

AI Explained
18 Apr 202417:11

TLDRThe video discusses recent advancements in AI, focusing on Meta's release of two smaller models, Llama 3 and Vasa-1, which are highly competitive with other models in their class. Llama 3, particularly the 370B variant, is noted for its performance improvements with extensive data training, emphasizing coding data. Microsoft's Vasa-1 is highlighted for its realistic deep fake technology, allowing AI to imitate human facial expressions and movements from a single photo. The implications for AI in social interaction, healthcare, and the potential for personalized AI are explored. The video also touches on the debate over the timeline for achieving Artificial General Intelligence (AGI), with opinions ranging from disbelief in its existence to predictions of its imminent arrival.


  • 🚀 Meta (formerly Facebook) has released Llama 3, a smaller but competitive AI model, indicating ongoing improvements in model performance even with significantly more training data.
  • 📈 Llama 370b is noted to be competitive with other models like Gemini Pro 1.5 and Claude, showcasing the advancements in AI capabilities.
  • 🔍 Meta is planning to release multiple models with enhanced capabilities, such as multimodality, multilingual conversing, extended context window, and stronger overall performance.
  • 🤖 An AI technology that uses a single photo to generate realistic and controllable facial expressions and lip movements has been developed, hinting at future applications like real-time Zoom calls with lifelike avatars.
  • 📚 The Vasa-1 model from Microsoft has achieved high expressiveness in deep fakes with relatively little training data, raising questions about the future of social interactions and AI ethics.
  • 🤖 AI nurses developed by Hypocritical AI and Nvidia are reported to outperform human nurses in certain technical aspects, suggesting a potential shift in healthcare service provision.
  • 📊 The creators of Vasa-1 used a diffusion transformer model that maps audio to facial expressions and head movements, achieving unprecedented lip-syncing accuracy.
  • 🔒 Microsoft has no current plans to release Vasa-1 publicly, emphasizing the need for responsible use and regulation compliance.
  • 📈 Hume AI is focusing on analyzing emotions in the human voice, which could lead to more personalized and emotionally intelligent AI interactions.
  • 📰 The launch of a new newsletter called 'Signal to Noise' aims to provide a high-quality, hype-free source of information on AI developments.
  • 🤖 The progress in robot agility, exemplified by the new Atlas robot from Boston Dynamics, indicates a future where physical and virtual AI capabilities continue to advance at a rapid pace.

🚀 Meta's Llama 3 and AI Model Competition

The video discusses Meta's recent release of two smaller AI models, Llama 370b, which is competitive with other models like Gemini Pro 1.5 and Claude. The script highlights that Meta's models showed improved performance with a significant increase in training data, especially coding data. The company plans to release multiple models with enhanced capabilities such as multimodality, multilingual support, extended context window, and stronger overall features. A comparison is also made with an undisclosed 'mystery model' training, GPC4 Turbo, and Claude 3 Opus, noting that all three perform similarly on various benchmarks. The segment ends with a teaser about an announcement that could change how people interact with AI.


🤖 AI Imitating Human Expressions and the Future of Healthcare

The script introduces a new AI technology that uses a single photo and audio clip to generate realistic human facial expressions and movements in real-time. This technology, referred to as Vasa 1, is particularly impressive for its expressiveness, including blinking and lip movement. The implications for AI social interaction are discussed, with a focus on healthcare and the potential for AI nurses. The AI nurses are reported to outperform human nurses in certain metrics, such as bedside manner and patient education. The technology behind Vasa 1 is explained, involving a diffusion Transformer model that maps audio to facial expressions. However, due to ethical concerns, Microsoft has no immediate plans to release the model publicly.


📰 Launch of 'Signal to Noise' Newsletter and AI Personalization

The speaker announces a new newsletter called 'Signal to Noise,' which aims to maintain a high signal-to-noise ratio by only posting when interesting developments occur. The newsletter will include a 'does it change everything' rating to quickly assess the impact of the news. The author also discusses the importance of AI personalization, suggesting that it might be more crucial than raw intelligence. Personalized AI models that integrate well into users' lives could be a key differentiator. The speaker also touches on the rapid progress in robot agility, referencing the new Atlas robot from Boston Dynamics and the competitive landscape in robotics design.


🤔 Perspectives on AGI and AI Safety Levels

The video concludes with a discussion on artificial general intelligence (AGI) and its potential timelines. Various experts express their skepticism or belief in AGI, with some suggesting it could be imminent while others believe it's further away. The concept of ASL (AI Safety Levels) is introduced, with ASL 3 and ASL 4 representing different levels of risk and autonomy. The speaker ends with a reflection on the movie 'Her' and the possibility that technology might be capable of replicating a similar AI experience by the following year.



Meta has released Llama 3, a model competitive with Gemini Pro 1.5 and Claude, but without their context window size.

Llama 370b shows improved model performance even after training on significantly more data than the optimal amount.

Meta plans to release multiple models with new capabilities, including multimodality, multilingual conversing, and a longer context window.

A mystery model is still in training, expected to compete with GPC4 Turbo and Claude 3 Opus.

Microsoft's Vasa-1 can generate highly realistic deep fakes using just a single photo and an audio clip.

Vasa-1 allows control over the emotion, distance from the camera, and direction of the gaze of the generated avatar.

The technology behind Vasa-1 could lead to real-time Zoom calls with next-generation models later this year.

AI nurses developed by Hypocritical AI and Nvidia outperform human nurses in bedside manner and patient education.

The Vasa-1 model was trained on a relatively small data set, demonstrating the potential for results with limited data.

Microsoft has no current plans to release Vasa-1 due to concerns about responsible use and regulation.

Hume AI is focusing on analyzing emotions in the human voice for a more personalized AI experience.

The author is launching a new newsletter called 'Signal to Noise' with a focus on quality content and a 'does it change everything' rating system.

Boston Dynamics' new Atlas robot showcases significant advancements in robot agility.

Finger, a company known for mechanical design in robotics, may be influencing the design of new Atlas.

Personalization of AI might be more important than inherent intelligence for long-term user engagement.

Open AI's strategy might include personalizing AI through video avatars and user engagement to compete with other tech giants.

There is debate over the timeline for achieving Artificial General Intelligence (AGI), with some believing it's imminent and others skeptical.

Dario Amodei, CEO of Anthropic, predicts ASL 3 could happen within the next year or two, and ASL 4 between 2025 and 2028.

The movie 'Her,' set in 2025, seems increasingly relevant as we approach the technological capabilities depicted in the film.