Chinese Company Unveils SORA Competitor - "Vidu" AI Video Generator

AI Search
28 Apr 202411:37

TLDRA Chinese company, Shu, has announced a new AI video generator called Vidu, which is positioned as a competitor to SORA. Vidu claims to generate 16-second 180p video clips with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (Uvit). This architecture combines the strengths of the diffusion and Transformer models, which are pivotal in the advancement of generative AI. The company's research team first proposed the core technology of Uvit in September 2022, prior to Sora's model. Vidu's capabilities are showcased in a reel that demonstrates its ability to produce realistic videos, although there are noted inconsistencies in some generated scenes. The company is currently accepting applications for access to the tool at shanguai.com. This development highlights the competitive landscape in AI, with China emerging as a significant player alongside tech giants in the US.

Takeaways

  • 🎉 A Chinese company named Shu has announced a new AI video generator called 'Vidu', which is positioned as a competitor to SORA.
  • 📹 Vidu can generate a 16-second 180p video clip with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (Uvit).
  • 🔍 The UVit architecture integrates two AI models: diffusion and Transformer, which is seen as an advancement in generative AI, overcoming some limitations of previous models.
  • 📈 The Transformer model, which Vidu uses, is adept at understanding context, which should theoretically lead to more coherent and accurate video/image generation.
  • 📅 Vidu's research team first proposed the core technology of UVit in September 2022, prior to Sora's model architecture.
  • 👀 In a side-by-side comparison, Vidu generates hands well, with realistic detail, although there are some inconsistencies in certain elements like hair and leaves.
  • 📊 Despite the lower resolution of the Vidu showcase videos, the Global Times article mentions that Vidu can output 1080p quality.
  • 🌐 To apply for access to Vidu, interested parties can fill out a form on the Shangu AI website, leaving their contact details for a marketing consultant to follow up.
  • 🤖 Recent advancements from China in the AI space include a new language model and a high-speed robot, indicating a surge in innovation and competition.
  • 🏆 While Vidu showcases impressive capabilities, it is suggested that it may not yet be on par with the yet-to-be-released Open AI Sora.
  • 💬 The presenter encourages viewers to share their thoughts on Vidu and whether they will apply for access, fostering community engagement and discussion.

Q & A

  • What is the name of the AI video generator announced by the Chinese company Shu?

    -The name of the AI video generator is 'Vidu'.

  • What is the core technology behind Vidu's AI video generator?

    -The core technology behind Vidu's AI video generator is the Universal Vision Transformer (UViT), which integrates two text-video AI models: the diffusion model and the Transformer model.

  • How long does it take for Vidu to generate a 16-second 180p video clip?

    -Vidu can generate a 16-second 180p video clip with just one click.

  • What are some of the limitations of the stable diffusion model?

    -Some limitations of the stable diffusion model include its inability to generate text very well and its difficulty in understanding context or following more complicated prompts.

  • How does the Transformer model improve upon the diffusion model?

    -The Transformer model, which is good at understanding context, can be merged with the diffusion model to create more coherent and accurate images or videos.

  • Who is Ju Jun and what is his role in the development of Vidu's technology?

    -Ju Jun is the vice dean of The Institute of AI at Chingua University and the chief scientist at Shangu. He states that after the release of Sora, it closely aligned with their technical roadmap, motivating them to advance their research.

  • What are some of the features that make Vidu's AI video generator stand out?

    -Vidu's AI video generator stands out for its ability to generate realistic hands with five fingers, and its overall realistic and coherent video generation capabilities.

  • How does Vidu's video quality compare to Sora's?

    -While Vidu produces high-quality videos, the details in its videos may not be as crisp or sharp due to a lower resolution compared to Sora's full HD videos. However, the Global Times article mentions that Vidu can output 1080p.

  • What is the process to apply for access to use Vidu's AI video generator?

    -To apply for access to use Vidu's AI video generator, one needs to visit shanguai.com, scroll down to the video generation section, and fill out a form with their name, phone number, company name, and wait for a marketing consultant to serve them.

  • How does the emergence of Vidu and other Chinese AI advancements impact the global AI race?

    -The emergence of Vidu and other Chinese AI advancements shows that other countries might not be far behind in the AI race, providing more competition and potentially driving innovation in the field.

  • What are some other recent significant AI developments from China?

    -Other recent significant AI developments from China include the launch of Since Nova 5.0 by the Chinese company Since Time, which reportedly beats GPT-4 Turbo on nearly all benchmarks, and the unveiling of the S1 robot by the company ASOT.

  • What is the general sentiment towards competition in the AI video generation space?

    -The general sentiment is positive towards competition in the AI video generation space, as it is believed to drive innovation and lead to better products and services.

Outlines

00:00

🚀 Introduction to Shu's AI Video Generator

The video script introduces a new AI video generator developed by a Chinese company named Shu. The generator, called SORA, is claimed to be a competitor to OpenAI's Sora. Shu's tool is capable of creating a 16-second 180p video clip with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (UViT). UViT combines the strengths of diffusion and Transformer models, which are pivotal in the evolution of generative AI. The script discusses the limitations of previous models and how UViT aims to overcome them by integrating context understanding and coherence. The video also includes a comparison between Shu's and OpenAI's video generation capabilities, noting that while Shu's results are impressive, they may not yet match the unreleased Sora.

05:01

📊 Comparative Analysis of Video Generation Technologies

This paragraph presents a side-by-side comparison between the video generation outputs of Shu's AI and OpenAI's Sora. It highlights the quality and realism of the generated videos, pointing out specific details such as the accurate depiction of hands and the consistency in the transformation of elements within the video. The narrator also notes inconsistencies in Shu's video, such as a green leaf disappearing and a misrepresentation of a wooden toy ship on a carpet. Despite these flaws, the video quality from Shu is acknowledged to be good, although it is not in full HD resolution. The script also mentions the application process for using Shu's technology and provides a link to the company's website.

10:03

🌏 Global AI Competition and Recent Chinese Innovations

The final paragraph shifts the focus to the broader landscape of AI development, emphasizing the recent advancements made by Chinese companies in the AI space. It mentions the unveiling of a new language model and a fast-speed robot by other Chinese entities, suggesting that the global AI race is becoming more competitive. The script expresses enthusiasm for the unveiling of Shu's AI video generator, VDU, as it adds to the competition and potentially pushes the boundaries of what is possible in video generation technology. The narrator encourages viewers to share their thoughts on VDU and whether they plan to apply for access to the technology.

Mindmap

Keywords

💡AI Video Generator

An AI video generator is a technology that uses artificial intelligence to automatically create videos. In the context of the video, it refers to the 'Vidu' AI video generator, which is a product developed by the Chinese company Shu. It is presented as a competitor to SORA and is capable of generating videos that are more coherent and accurate by merging the diffusion model with the Transformer model.

💡SORA

SORA is an advanced AI system that is not yet released for public use but is known for its high-quality video generation capabilities. It is mentioned in the video as a benchmark for comparison with the 'Vidu' AI video generator. The expectation is that SORA will set a high standard for AI-generated video quality.

💡Universal Vision Transformer (Uvit)

Uvit refers to a self-developed visual transformation model architecture that integrates two AI models: the diffusion model and the Transformer model. It is the core technology behind the 'Vidu' AI video generator. The integration of these models is considered a next step in generative AI, aiming to produce more coherent and accurate video or image outputs.

💡Diffusion Model

The diffusion model is a type of generative model used in AI for creating new data samples. In the video, it is part of the Universal Vision Transformer and is combined with the Transformer model to enhance the video generation capabilities of the 'Vidu' AI video generator. The diffusion model is known for its ability to generate images but has limitations in text generation and context understanding.

💡Transformer Model

The Transformer model is a machine learning architecture that is particularly good at understanding context and handling sequential data. It is the backbone of many large language models (LLMs) and is used in the 'Vidu' AI video generator to improve the coherence and accuracy of the generated videos. The model is based on the paper 'Attention Is All You Need' by Google's DeepMind.

💡Generative AI

Generative AI refers to the subset of artificial intelligence that is focused on creating new content, such as images, videos, or text. The video discusses the advancements in generative AI through the merging of the diffusion model and the Transformer model in the 'Vidu' AI video generator, which is seen as a step forward from the limitations of the diffusion model alone.

💡Shu

Shu is the Chinese company that announced the 'Vidu' AI video generator. The company is positioning its product as a competitor to SORA and is part of a broader trend of advancements in AI technology coming from China, as mentioned in the video.

💡Stable Diffusion

Stable Diffusion is a term used to describe a type of generative model that has limitations in generating text and understanding complex prompts. The video discusses how the 'Vidu' AI video generator aims to overcome these limitations by integrating the diffusion model with the Transformer model.

💡Runway and Pika

Runway and Pika are mentioned in the video as two of the best video generators currently available. They are used for comparison with the 'Vidu' AI video generator, which is suggested to potentially outperform these existing solutions in terms of video quality and realism.

💡Resolution

Resolution refers to the clarity and sharpness of a video or image, typically measured in pixels. The video discusses the resolution of the 'Vidu' AI video generator's output, noting that while the examples shown were in 720p, the system is capable of outputting 1080p, which is considered full HD.

💡WeChat Page

The WeChat Page is mentioned as the source of the 'Vidu' AI video generator's showreel. WeChat is a Chinese multi-purpose messaging, social media, and mobile payment app, and the company uses its WeChat page to showcase their product to a potential audience.

Highlights

Chinese company Shu announces a new AI video generator, Vidu, as a competitor to SORA.

Vidu's AI video generator is claimed to be on par with OpenAI's SORA.

Vidu can generate a 16-second 180p video clip with a single click.

The technology is built on a self-developed visual transformation model architecture called Universal Vision Transformer (Uvit).

Uvit merges the diffusion and Transformer models, which is considered the next step in generative AI.

The Transformer model is known for its ability to understand context, which could improve the coherence of generated content.

The core technology of Uvit was proposed by Vidu's research team before Sora's model architecture.

Vidu's video generator produces realistic hands with detail.

Comparisons between Vidu and Sora show potential for Vidu to be a close competitor.

Vidu's video generation has some inconsistencies, such as transforming hair into a red ribbon.

Vidu's videos are currently in 720p resolution, whereas Sora's are in full HD.

The Global Times article states that Vidu can output 1080P videos.

To apply for access to Vidu, one can visit shanguai.com and fill out a form.

Recent advancements from China in AI space include a new language model and a fast robot from other Chinese companies.

The new language model from a Chinese company reportedly beats GPT-4 Turbo on nearly all benchmarks.

The unveiling of Vidu provides more competition in the AI video generation space.

Competition in the AI field is seen as beneficial for innovation and progress.