New AI Video Goes Hard At Open AI!

Theoretically Media
29 Apr 202411:15

TLDRThe video discusses a new AI video generator named 'Vu', which is being compared to the upcoming Sora model. Vu, developed by Shinu Technology and Singua University, can produce 16-second clips at 1080p resolution. The video showcases a sizzle reel and longer examples of Vu's output, highlighting its architecture based on the Universal Video Transformer (UvIT), which combines Vision Transformers with a U-Net model for image generation. While not as detailed as Sora, Vu demonstrates temporal coherence and a unique aesthetic. The video also touches on the challenges and post-production work required to refine AI-generated footage for professional use, referencing a short film created using Sora. The speaker, Tim, provides a signup link for Vu and mentions an upcoming interview about Sora's integration into Adobe Premiere and After Effects.

Takeaways

  • 🎬 A new AI video generator called 'Vu' has emerged, potentially rivaling Sora in quality.
  • 🚀 Vu can generate video clips up to 16 seconds at 1080p, showcasing its capabilities through a sizzle reel.
  • 📚 Vu's architecture is based on the Universal Video Transformer (UvIT), which combines Vision Transformers and Unet for image analysis and generation.
  • 🧠 The UvIT model uses tokens and long skip connections, allowing it to maintain temporal coherence throughout the video.
  • 📺 Examples of Vu's output include a panda playing guitar and a beach vacation scene, demonstrating its ability to generate coherent visuals.
  • 🤔 While Vu's outputs are impressive, they are not as detailed as Sora's, but they maintain a consistent and appealing aesthetic.
  • 📽 A side-by-side comparison with Sora shows that both models have their strengths, with Sora leading in environment realism.
  • 🎥 The production process for AI-generated videos still requires significant human effort for post-production to achieve consistency.
  • 🌐 There is a sign-up link for Vu on their website, but as of the recording, the submit button may not be working due to high traffic.
  • 📈 The potential of AI video generators like Vu and Sora is being explored by filmmakers, with examples like the short film 'Airhead'.
  • 🔍 An exclusive interview with Adobe discusses Sora's integration into Premiere and future plans for After Effects.

Q & A

  • What is the name of the new AI video generator discussed in the script?

    -The new AI video generator discussed is referred to as 'Vu' or 'Vidu'.

  • What is the maximum duration and resolution that the new AI video generator can produce?

    -The AI video generator can produce clips up to 16 seconds at 1080p resolution.

  • Which two models or technologies does the new AI video generator's architecture seem to be based on?

    -The architecture of the new AI video generator is based on UID (Universal Video Transformer), which seems to be a combination of two separate papers: DPM Solver and 'All Are Worth Words'.

  • How does the new AI video generator differ from Sora in terms of video generation?

    -While Sora creates videos using temporal spaces, the new AI video generator (Vidu) has an in and an out point, utilizing long skip connections to chart a path between the first and last frames of the video.

  • What is the significance of the long skip connections in the new AI video generator?

    -Long skip connections allow the AI to maintain awareness of the first and last frames of the video, which helps in generating more coherent and less hallucinatory transitions between frames.

  • What is the aesthetic quality of the new AI video generator's outputs compared to Sora?

    -The new AI video generator's outputs look really good but are not as detailed as Sora's. They have a mid-journey V4 kind of look, which is appreciated for its surreal aesthetic.

  • What is the significance of the 'Sizzle reel' mentioned in the script?

    -The 'Sizzle reel' is a promotional video showcasing the capabilities of the new AI video generator. It includes clips that are direct references to the initial Sora video release.

  • How does the new AI video generator handle transitions between video frames?

    -The new AI video generator handles transitions by treating everything as tokens and utilizing its understanding of the beginning and end of the video to chart a coherent path between frames.

  • What is the current status of the sign-up link for the new AI video generator?

    -As of the recording, there is a sign-up link on the website, but the submit button appears to be broken, possibly due to high traffic.

  • What is the role of post-production in refining AI-generated videos like those from Sora?

    -Post-production plays a significant role in cleaning up AI-generated footage. This includes curation, script writing, editing, voice over, music sound design, color correction, and other typical post-production processes to achieve a semi-consistent final product.

  • How does the new AI video generator compare to Sora in terms of creating realistic environments?

    -While both the new AI video generator and Sora are capable of creating compelling imagery, Sora tends to produce more action and clearly defined visuals in its environments. However, the new AI video generator also creates realistic-looking places, albeit with some minor discrepancies in movement or detail.

  • What is the future potential of AI video generation technology in film and media production?

    -AI video generation technology can be used to create compelling imagery and can be integrated into full production processes. It allows for the creation of unique and surreal aesthetics, and with further development and refinement, it could play a significant role in film and media production.

Outlines

00:00

🚀 Introduction to a Potential Sora Rival AI Video Generator

The video script introduces a new AI video generator called 'Vu', which is being compared to Sora, a yet-to-be-released model. The presenter acknowledges the irony of comparing it to Sora before its launch. The video dives into the features of the new model, its potential to match Sora's quality, and the possibility of its use before Sora's release. The script also mentions a signup link for the audience. Vu is developed by Shinu Technology and Singua University, and it targets creating 16-second clips at 1080p resolution. The architecture of Vu is based on the Universal Video Transformer (UViT), which is a combination of two research papers: DPM solver for better predictions in diffusion models and 'All Are Worth Words' for combining Vision Transformers with a Unet model. Vu's strength lies in its ability to treat all elements as tokens and utilize long skip connections for coherent video generation.

05:02

🎥 Analysis of Longer Vid Outputs and Comparison with Sora

The script provides an analysis of full 16-second clips generated by the Vidu AI, highlighting the references to Sora in the initial hype reel. It discusses the temporal coherence of the generated content, comparing it to Sora's outputs. The presenter appreciates the mid-journey V4 aesthetic of the TVs in one of the clips, which is reminiscent of a favorite model. Another clip features a panda bear playing a guitar, which, while not the most realistic, still impresses with its background coherence and reactive shadow. A beach vacation villa clip showcases an interesting dissolve between shots, hinting at the model's ability to handle transitions. An imaginative clip with a ship in a bedroom demonstrates the model's reaction to movement and environmental interaction. The script also includes a brief comparison with Sora, noting that while Sora's environment realism is slightly superior, Vidu's output still appears as a real place. The presenter reminds the audience that both models have their strengths and that the examples shown are cherry-picked, with Sora also producing less consistent videos that require significant post-production work.

10:05

📚 Post-Production Processes and Future of AI in Filmmaking

The video script concludes with a discussion on the post-production process necessary to refine AI-generated videos into a final product. It mentions the use of AI tools in creating compelling imagery and the effort that goes into making these videos look semi-consistent. The presenter references a production company's use of Sora to create a short film, 'Airhead', and the extensive cleanup required to achieve a polished result. The script also highlights the creative process used by Paul Trello in his short film 'Notes to My Future Self', where AI imagery was integrated with traditional VFX techniques. Finally, the presenter provides a signup link for Vidu, noting a potential temporary issue with the website's submit button, and teases an upcoming interview with Adobe about Sora's integration into Premiere and future plans for After Effects.

Mindmap

Keywords

💡AI Video Generator

An AI video generator is a technology that uses artificial intelligence to create video content. In the context of the video, it refers to a new model that can generate video clips up to 16 seconds at 1080p resolution, which is being compared to another model called Sora.

💡Sora

Sora is an AI video generation model that is considered as a benchmark in the video. It is mentioned as a comparison point for the new AI video generator being discussed. The video explores whether the new model can match or surpass the quality of Sora.

💡Shinu Technology and Singua University

These are the developers of the new AI video generator. They are responsible for creating the technology that is being discussed in the video. Their collaboration signifies the academic and industrial partnership in advancing AI video generation.

💡Universal Video Transformer (UvIT)

UvIT is the underlying architecture of the new AI video generator. It is a model that combines Vision Transformers, which are good at analyzing images, with a Unet model, which is adept at generating images. This combination allows UvIT to treat various elements as tokens and utilize long skip connections for better video generation.

💡Temporal Coherence

Temporal coherence refers to the consistency of visual elements over time in a video. It is an important aspect when evaluating the quality of AI-generated videos. The video discusses the temporal coherence of the new model, noting how objects and scenes maintain their consistency throughout the generated clips.

💡Sizzle Reel

A sizzle reel is a short promotional video that showcases the highlights of a product or service. In the context of the video, the sizzle reel is used to demonstrate the capabilities of the new AI video generator, although it does not show the full 16-second clips.

💡DPM Solver

DPM Solver is a paper that is part of the foundation for UvIT. It is mentioned as helping diffusion models make better predictions about future generations in the context of video generation. It is one of the complex papers that contribute to the technical strength of the new AI model.

💡All Are Worth Words

This paper, while complex, is less math-intensive than DPM Solver and contributes to the understanding of how UvIT works. It is related to the process of combining Vision Transformers with Unet models to create the UvIT architecture.

💡Long Skip Connections

Long skip connections are a feature of UvIT that allows the model to maintain awareness of the first and last frames of a video, enabling it to chart a path between them. This feature helps in generating videos with less distortion and more coherence compared to traditional AI video generators.

💡V4 Aesthetic

V4 refers to a previous model or version that had a particular aesthetic appeal. The video mentions a preference for the V4 aesthetic, describing it as surreal and visually appealing. This aesthetic is compared to the outputs of the new AI video generator.

💡Post-Production

Post-production involves the processes of editing, sound design, color correction, and other tasks that occur after the initial filming or generation of video content. The video discusses the need for post-production work to make AI-generated videos look consistent and polished, even when using advanced models like Sora.

Highlights

A new AI video generator, potentially rivaling Sora, has been revealed.

The AI can generate clips up to 16 seconds at 1080p resolution.

The model was developed by Shinu technology and Singua University.

Vid's architecture is based on the Universal Video Transformer (UvIT).

UvIT combines Vision Transformers with a U-Net model for image generation.

The model treats all elements, including time, as tokens and utilizes long skip connections.

Vid's output is compared to Sora, showing differences in temporal coherence and video generation methods.

Vid's 16-second clips showcase temporal coherence and detailed visuals.

The AI-generated videos are noted for their aesthetic appeal, with a mid-journey V4 look.

Vid's beach vacation video demonstrates an interesting dissolve effect.

A ship in a bedroom video shows the model's ability to react to water movement.

A side-by-side comparison with Sora reveals strengths in camera movement and environment realism.

The Tokyo walk sequence from Vid shows the model's capability to handle complex scenes.

Sora's video generation still requires significant post-production work for consistency.

AI video generation technology is being used to create compelling imagery, as demonstrated by Paul Trello's VFX breakdown.

Vidu has a sign-up link on their website, but the submit button may be temporarily broken due to high traffic.

Adobe's integration of Sora into Premiere and future plans for After Effects are discussed in an exclusive interview.