Chinese Company Unveils SORA Competitor - "Vidu" AI Video Generator

AI Search
28 Apr 202411:37

TLDRA Chinese company, Shu, has announced a new AI video generator called Vidu, which is positioned as a competitor to SORA. Vidu claims to generate 16-second 180p video clips with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (Uvit). This architecture combines the strengths of the diffusion and Transformer models, which are pivotal in the advancement of generative AI. The company's research team first proposed the core technology of Uvit in September 2022, prior to Sora's model. Vidu's capabilities are showcased in a reel that demonstrates its ability to produce realistic videos, although there are noted inconsistencies in some generated scenes. The company is currently accepting applications for access to the tool at shanguai.com. This development highlights the competitive landscape in AI, with China emerging as a significant player alongside tech giants in the US.

Takeaways

  • πŸŽ‰ A Chinese company named Shu has announced a new AI video generator called 'Vidu', which is positioned as a competitor to SORA.
  • πŸ“Ή Vidu can generate a 16-second 180p video clip with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (Uvit).
  • πŸ” The UVit architecture integrates two AI models: diffusion and Transformer, which is seen as an advancement in generative AI, overcoming some limitations of previous models.
  • πŸ“ˆ The Transformer model, which Vidu uses, is adept at understanding context, which should theoretically lead to more coherent and accurate video/image generation.
  • πŸ“… Vidu's research team first proposed the core technology of UVit in September 2022, prior to Sora's model architecture.
  • πŸ‘€ In a side-by-side comparison, Vidu generates hands well, with realistic detail, although there are some inconsistencies in certain elements like hair and leaves.
  • πŸ“Š Despite the lower resolution of the Vidu showcase videos, the Global Times article mentions that Vidu can output 1080p quality.
  • 🌐 To apply for access to Vidu, interested parties can fill out a form on the Shangu AI website, leaving their contact details for a marketing consultant to follow up.
  • πŸ€– Recent advancements from China in the AI space include a new language model and a high-speed robot, indicating a surge in innovation and competition.
  • πŸ† While Vidu showcases impressive capabilities, it is suggested that it may not yet be on par with the yet-to-be-released Open AI Sora.
  • πŸ’¬ The presenter encourages viewers to share their thoughts on Vidu and whether they will apply for access, fostering community engagement and discussion.

Q & A

  • What is the name of the AI video generator announced by the Chinese company Shu?

    -The name of the AI video generator is 'Vidu'.

  • What is the core technology behind Vidu's AI video generator?

    -The core technology behind Vidu's AI video generator is the Universal Vision Transformer (UViT), which integrates two text-video AI models: the diffusion model and the Transformer model.

  • How long does it take for Vidu to generate a 16-second 180p video clip?

    -Vidu can generate a 16-second 180p video clip with just one click.

  • What are some of the limitations of the stable diffusion model?

    -Some limitations of the stable diffusion model include its inability to generate text very well and its difficulty in understanding context or following more complicated prompts.

  • How does the Transformer model improve upon the diffusion model?

    -The Transformer model, which is good at understanding context, can be merged with the diffusion model to create more coherent and accurate images or videos.

  • Who is Ju Jun and what is his role in the development of Vidu's technology?

    -Ju Jun is the vice dean of The Institute of AI at Chingua University and the chief scientist at Shangu. He states that after the release of Sora, it closely aligned with their technical roadmap, motivating them to advance their research.

  • What are some of the features that make Vidu's AI video generator stand out?

    -Vidu's AI video generator stands out for its ability to generate realistic hands with five fingers, and its overall realistic and coherent video generation capabilities.

  • How does Vidu's video quality compare to Sora's?

    -While Vidu produces high-quality videos, the details in its videos may not be as crisp or sharp due to a lower resolution compared to Sora's full HD videos. However, the Global Times article mentions that Vidu can output 1080p.

  • What is the process to apply for access to use Vidu's AI video generator?

    -To apply for access to use Vidu's AI video generator, one needs to visit shanguai.com, scroll down to the video generation section, and fill out a form with their name, phone number, company name, and wait for a marketing consultant to serve them.

  • How does the emergence of Vidu and other Chinese AI advancements impact the global AI race?

    -The emergence of Vidu and other Chinese AI advancements shows that other countries might not be far behind in the AI race, providing more competition and potentially driving innovation in the field.

  • What are some other recent significant AI developments from China?

    -Other recent significant AI developments from China include the launch of Since Nova 5.0 by the Chinese company Since Time, which reportedly beats GPT-4 Turbo on nearly all benchmarks, and the unveiling of the S1 robot by the company ASOT.

  • What is the general sentiment towards competition in the AI video generation space?

    -The general sentiment is positive towards competition in the AI video generation space, as it is believed to drive innovation and lead to better products and services.

Outlines

00:00

πŸš€ Introduction to Shu's AI Video Generator

The video script introduces a new AI video generator developed by a Chinese company named Shu. The generator, called SORA, is claimed to be a competitor to OpenAI's Sora. Shu's tool is capable of creating a 16-second 180p video clip with a single click, utilizing a self-developed architecture known as Universal Vision Transformer (UViT). UViT combines the strengths of diffusion and Transformer models, which are pivotal in the evolution of generative AI. The script discusses the limitations of previous models and how UViT aims to overcome them by integrating context understanding and coherence. The video also includes a comparison between Shu's and OpenAI's video generation capabilities, noting that while Shu's results are impressive, they may not yet match the unreleased Sora.

05:01

πŸ“Š Comparative Analysis of Video Generation Technologies

This paragraph presents a side-by-side comparison between the video generation outputs of Shu's AI and OpenAI's Sora. It highlights the quality and realism of the generated videos, pointing out specific details such as the accurate depiction of hands and the consistency in the transformation of elements within the video. The narrator also notes inconsistencies in Shu's video, such as a green leaf disappearing and a misrepresentation of a wooden toy ship on a carpet. Despite these flaws, the video quality from Shu is acknowledged to be good, although it is not in full HD resolution. The script also mentions the application process for using Shu's technology and provides a link to the company's website.

10:03

🌏 Global AI Competition and Recent Chinese Innovations

The final paragraph shifts the focus to the broader landscape of AI development, emphasizing the recent advancements made by Chinese companies in the AI space. It mentions the unveiling of a new language model and a fast-speed robot by other Chinese entities, suggesting that the global AI race is becoming more competitive. The script expresses enthusiasm for the unveiling of Shu's AI video generator, VDU, as it adds to the competition and potentially pushes the boundaries of what is possible in video generation technology. The narrator encourages viewers to share their thoughts on VDU and whether they plan to apply for access to the technology.

Mindmap

Shu, a Chinese company
AI Video Generator
Company Introduction
To cover the announcement and details of Vidu
Purpose
Announcement
Generates 16-second 180p video clip with one click
Built on Universal Vision Transformer (Uvit)
Combines diffusion and Transformer models
Vidu's Capabilities
Introduction to SORA Competitor - Vidu
Better at generating text
Understands context better
Follows more complicated prompts
Advantages of Vidu
Limited text generation capabilities
Lack of context understanding
Limitations of Stable Diffusion
Outcompetes current video generators
Not on par with Open AI Sora
Quality Assessment
Generates hands well
Realistic appearance
Flaws in consistency
Specific Observations
Show Reel Analysis
Comparison with SORA
Proposed by Vdu's research team
Prior to Sora's model architecture
Uvit's Core Technology
Aligned with Vdu's technical roadmap
Motivated further research
Influence of Sora
Technical Background
Website for application: shanguai.com
Application process includes leaving contact details
How to Use Vidu
Other AI advancements in China
Increasing competition in AI space
Competition and Market
Access and Application
Asks for comments on Vidu's performance
User Opinion
Encourages applying for access
Invites viewers to share their thoughts
Call to Action
Conclusion and User Engagement
AI Video Generation Technology
Alert

Keywords

πŸ’‘AI Video Generator

An AI video generator is a technology that uses artificial intelligence to automatically create videos. In the context of the video, it refers to the 'Vidu' AI video generator, which is a product developed by the Chinese company Shu. It is presented as a competitor to SORA and is capable of generating videos that are more coherent and accurate by merging the diffusion model with the Transformer model.

πŸ’‘SORA

SORA is an advanced AI system that is not yet released for public use but is known for its high-quality video generation capabilities. It is mentioned in the video as a benchmark for comparison with the 'Vidu' AI video generator. The expectation is that SORA will set a high standard for AI-generated video quality.

πŸ’‘Universal Vision Transformer (Uvit)

Uvit refers to a self-developed visual transformation model architecture that integrates two AI models: the diffusion model and the Transformer model. It is the core technology behind the 'Vidu' AI video generator. The integration of these models is considered a next step in generative AI, aiming to produce more coherent and accurate video or image outputs.

πŸ’‘Diffusion Model

The diffusion model is a type of generative model used in AI for creating new data samples. In the video, it is part of the Universal Vision Transformer and is combined with the Transformer model to enhance the video generation capabilities of the 'Vidu' AI video generator. The diffusion model is known for its ability to generate images but has limitations in text generation and context understanding.

πŸ’‘Transformer Model

The Transformer model is a machine learning architecture that is particularly good at understanding context and handling sequential data. It is the backbone of many large language models (LLMs) and is used in the 'Vidu' AI video generator to improve the coherence and accuracy of the generated videos. The model is based on the paper 'Attention Is All You Need' by Google's DeepMind.

πŸ’‘Generative AI

Generative AI refers to the subset of artificial intelligence that is focused on creating new content, such as images, videos, or text. The video discusses the advancements in generative AI through the merging of the diffusion model and the Transformer model in the 'Vidu' AI video generator, which is seen as a step forward from the limitations of the diffusion model alone.

πŸ’‘Shu

Shu is the Chinese company that announced the 'Vidu' AI video generator. The company is positioning its product as a competitor to SORA and is part of a broader trend of advancements in AI technology coming from China, as mentioned in the video.

πŸ’‘Stable Diffusion

Stable Diffusion is a term used to describe a type of generative model that has limitations in generating text and understanding complex prompts. The video discusses how the 'Vidu' AI video generator aims to overcome these limitations by integrating the diffusion model with the Transformer model.

πŸ’‘Runway and Pika

Runway and Pika are mentioned in the video as two of the best video generators currently available. They are used for comparison with the 'Vidu' AI video generator, which is suggested to potentially outperform these existing solutions in terms of video quality and realism.

πŸ’‘Resolution

Resolution refers to the clarity and sharpness of a video or image, typically measured in pixels. The video discusses the resolution of the 'Vidu' AI video generator's output, noting that while the examples shown were in 720p, the system is capable of outputting 1080p, which is considered full HD.

πŸ’‘WeChat Page

The WeChat Page is mentioned as the source of the 'Vidu' AI video generator's showreel. WeChat is a Chinese multi-purpose messaging, social media, and mobile payment app, and the company uses its WeChat page to showcase their product to a potential audience.

Highlights

Chinese company Shu announces a new AI video generator, Vidu, as a competitor to SORA.

Vidu's AI video generator is claimed to be on par with OpenAI's SORA.

Vidu can generate a 16-second 180p video clip with a single click.

The technology is built on a self-developed visual transformation model architecture called Universal Vision Transformer (Uvit).

Uvit merges the diffusion and Transformer models, which is considered the next step in generative AI.

The Transformer model is known for its ability to understand context, which could improve the coherence of generated content.

The core technology of Uvit was proposed by Vidu's research team before Sora's model architecture.

Vidu's video generator produces realistic hands with detail.

Comparisons between Vidu and Sora show potential for Vidu to be a close competitor.

Vidu's video generation has some inconsistencies, such as transforming hair into a red ribbon.

Vidu's videos are currently in 720p resolution, whereas Sora's are in full HD.

The Global Times article states that Vidu can output 1080P videos.

To apply for access to Vidu, one can visit shanguai.com and fill out a form.

Recent advancements from China in AI space include a new language model and a fast robot from other Chinese companies.

The new language model from a Chinese company reportedly beats GPT-4 Turbo on nearly all benchmarks.

The unveiling of Vidu provides more competition in the AI video generation space.

Competition in the AI field is seen as beneficial for innovation and progress.