Dalle-3, Sora, & ChatGPT Plus: Stable Audio vs Suno v3 & New Video Generator!

Theoretically Media
4 Apr 202411:16

TLDRIn this week's AI news, OpenAI introduces in-painting for Dolly 3, despite its late arrival and mixed aesthetic reception. Stability AI releases Stable Audio 2.0, offering free music generation with the ability to add user audio for reference. Sunno's music generation prowess is highlighted, and Chad GPT 3.5 becomes accessible without login. Sora's first music video, 'World Weight,' showcases its potential, and a new video model, HiFi, is on the horizon with a focus on video editing and generation improvements.

Takeaways

  • 🎨 OpenAI has introduced in-painting feature in Dolly 3, allowing users to edit images by adding or changing elements within the picture.
  • 🖌️ The in-painting process in Dolly 3 is not as intuitive as one might expect and requires users to manually select and edit areas of the image using a selection brush.
  • 🍞 An example given in the script involves adding butter to a piece of toast in an image, which results in an image with an excessive amount of butter, highlighting the need for fine-tuning the prompts.
  • 🎵 Stability AI released Stable Audio 2.0, which can create full musical tracks up to 3 minutes long from a single prompt and offers 20 free credits per month for users.
  • 🏆 Sunno has released its version 3 model, which is considered superior in terms of audio fidelity and composition choices, and also allows for the addition of singing and the use of audio references.
  • 🆓 OpenAI now allows users to access Chat GPT 3.5 for free without the need to log in, providing a lower barrier for entry to experience the capabilities of the model.
  • 🎶 The first music video created with Sora has been released, featuring the track 'World Weight' by August Camp, an ambient electronic track reminiscent of artists like Aphex Twin.
  • 📈 Sora's public perception may be shifting, with some feeling it's becoming more exclusive, while others recommend exploring free alternatives like Hyper for similar results.
  • 👤 Anna Portrait is a new portrait animator that uses a reference photo and video to create a final output, with an example shown of a character created in mid-journey and upscaled in Leonardo.
  • 🌐 HiFi, a new video generator, has emerged from stealth mode, with Alex Masharov leading the team, aiming to improve video editing and character modification in videos.

Q & A

  • What new feature has been introduced in Dolly 3 that was long overdue?

    -The new feature introduced in Dolly 3 is the in-painting capability, which allows users to edit images by adding or changing elements within the photo, such as adding butter to a piece of toast.

  • What is the user's opinion on the aesthetic output of Dolly 3?

    -The user is not a huge fan of Dolly 3's aesthetic output, as they have never really jived with it that much, but they do appreciate its functionality, especially when integrated with chat GPT.

  • How does one interact with the in-painting feature in Dolly 3?

    -To use the in-painting feature in Dolly 3, users need to click on the image, enter the edit mode, and use the selection brush to define an area for editing, such as adding a piece of butter to toast or changing a cup of coffee to a glass of orange juice.

  • What is the name of the AI music generation platform that is considered the current leader in the field?

    -Sunno is considered the current leader in AI music generation, known for its high-quality audio output and a range of instrumentation and composition choices.

  • What unique feature does Stable Audio offer that sets it apart from other AI music generation platforms?

    -Stable Audio offers the unique feature of allowing users to add their own audio as a reference for generating music, which can lead to creative uses and personalized outputs.

  • How can one access and use Chat GPT 3.5 for free without logging in?

    -To use Chat GPT 3.5 for free without logging in, users can visit the homepage and click on 'Try It' to start utilizing the model without any login requirements.

  • What is the significance of the first music video created with Sora?

    -The significance of the first music video created with Sora is that it demonstrates the capabilities of the platform in generating visuals for music, showcasing its potential in the creative field, especially when combined with other elements like overlays and textures.

  • What is the user's opinion on the general public's perception of Sora?

    -The user feels that there has been a turning of the screw in terms of public opinion on Sora, as it seems to be perceived as exclusive, with only a select group having access and the rest feeling left out.

  • What is the name of the new video model that is coming up, and who is leading its development?

    -The new video model coming up is called HiFi, and it is being led by Alex Masharov, the former head of AI at Snap.

  • What is HiFi's plan for overcoming hardware limitations in video generation?

    -HiFi plans to overcome hardware limitations by running on a lean team of 16 people with a cluster of 32 GPUs, aiming to build an improved video editor and train a more powerful video generation model.

  • How does the user describe the workflow for creating a character using Visible Maker and other AI tools?

    -The user describes the workflow as starting with a reference photo in mid Journey, upscaling in Leonardo, generating voice with 11 Labs' speech-to-speech, removing the green screen, and adding a generated soundtrack with Sun to create a final character video.

Outlines

00:00

🖌️ Open AI Updates and Dolly 3's In-Painting Feature

The paragraph discusses recent updates from Open AI, highlighting the new in-painting feature in Dolly 3. The author expresses mixed feelings about Dolly 3's outputs, noting that while it's aesthetically not their preference, the integration with chat GPT for image generation is appreciated. The in-painting process is explained, where users can edit parts of an image, such as adding butter to toast, but the results may not always align with the prompt. The author also mentions the limitations of Dolly 3 compared to other image generators that have had this feature for a longer time.

05:01

🎵 Stability AI's Audio Update and Sunno's Music Generation

This section delves into Stability AI's recent update, Stable Audio 2.0, which enables the creation of full musical tracks from a single prompt. The author notes that while the update is welcomed, it's not as advanced as Sunno's music generation capabilities. Sunno's version 3 model is praised for its superior audio quality and adherence to the prompted genre. The author also highlights the unique feature of Sunno that allows adding singing and using personal audio as a reference. However, Stability AI's ability to incorporate user audio for reference is seen as a secret weapon.

10:02

🎥 Sora's First Music Video and Comparison with Hyper

The first music video created with Sora is discussed, titled 'World Weight' by August Camp. The author comments on the visual aesthetics and the consistency of the video, noting the use of tracking shots and vintage film looks. However, comparisons are made with Hyper, a free tool that can achieve similar results with the addition of overlays and textures. The author questions the uniqueness of Sora's output and suggests that Hyper could be a viable alternative for achieving similar video effects.

👤 Introducing Anna Portrait and Upcoming HiFi Field

The paragraph introduces Anna Portrait, a tool inspired by Emotive Avatar Talker, which uses a combination of a reference photo and video to create realistic portraits. A use case is presented, showcasing how a character created in Mid Journey was upscaled and used in a video with a generated voice and background music. Additionally, the author discusses an upcoming video generator, HiFi Field, led by Alex Masharov, former head of AI at Snap. HiFi aims to improve video editing by allowing modifications to characters and objects and training a more powerful video generation model.

Mindmap

Keywords

💡AI news

AI news refers to the latest updates and developments in the field of Artificial Intelligence. In the context of the video, it is the primary focus, discussing various AI advancements and their implications. The script mentions a slow week in AI news but emphasizes that AI continues to progress rapidly, akin to the speed of light.

💡Open AI

Open AI is an AI research lab that aims to ensure artificial general intelligence (AGI) benefits all of humanity. In the video, Open AI is highlighted as a source of updates, particularly regarding its Dolly 3 feature, which has recently gained the capability to 'paint in 3', a long-awaited feature for users.

💡Dolly 3

Dolly 3 is an AI-driven image generator that is part of Open AI's suite of tools. It is known for creating images based on textual prompts. The video talks about the recent update to Dolly 3, which now includes the ability to edit images directly within the platform, a feature that was not previously available.

💡Stability AI

Stability AI is a company that focuses on AI-generated content, particularly in the realm of music and audio. In the video, Stability AI is mentioned in relation to its update, Stable Audio 2.0, which enables the creation of full musical tracks from a single prompt.

💡Sunno

Sunno is an AI music generation platform that is recognized for its high-quality output and ability to generate music that closely matches the user's prompt. In the context of the video, Sunno is compared with Stability AI's music generation capabilities, with Sunno being noted as the current leader in AI-generated music.

💡Chat GPT 3.5

Chat GPT 3.5 is an AI language model developed by Open AI, known for its ability to generate human-like text based on given prompts. In the video, it is mentioned that users can now access Chat GPT 3.5 for free without needing to log in, which is a significant change in accessibility.

💡Sora

Sora is an AI platform that specializes in creating music videos. The video discusses the first music video created with Sora, which has generated interest and discussion within the AI community. The platform is noted for its unique visual style and capabilities.

💡Anna portrait

Anna portrait is an AI tool designed for creating realistic portraits, inspired by the emotive Avatar talker but taking a different approach by using both reference photos and videos to generate the final output. It represents an advancement in AI-generated visual content.

💡HiFi

HiFi is an upcoming AI video generation platform led by Alex Masharov, the former head of AI at Snap. It aims to develop a more powerful video generation model and an improved video editor for modifying characters and objects in videos.

💡AI-generated music

AI-generated music refers to the process of using artificial intelligence to create original musical compositions. In the video, this concept is discussed in relation to Stability AI's Stable Audio 2.0 and Sunno, platforms that allow users to generate music based on textual prompts.

💡AI filmmaking

AI filmmaking involves the use of artificial intelligence in the creation and production of films, including the generation of scripts, visuals, and even editing. The video touches on this concept by discussing events and tools related to AI in filmmaking, such as the Curious Refuge AI filmmaking Mega party and the world's first AI Esports tournament.

Highlights

Open AI introduces in-painting feature in Dolly 3, allowing users to edit images by adding or changing elements within the scene.

Dolly 3's in-painting feature is not as intuitive as one might expect, requiring users to manually select and edit areas of the image.

Despite personal aesthetic preferences, the new in-painting feature marks a step forward for Dolly 3, albeit a delayed one compared to other image generators.

Stability AI releases Stable Audio 2.0, which can create full musical tracks up to 3 minutes long from a single prompt, offering 20 free credits per month to users.

Sunno, a competitor to Stability AI, is recognized as the current leader in AI-generated music due to its superior audio fidelity and composition choices.

Stability AI's secret weapon is the ability to add user's own audio as a reference, which can lead to creative uses and improvements in generated music.

Open AI now allows free access to Chat GPT 3.5 without the need for login, providing a capable model for users to experiment with.

The first music video created with Sora, titled 'World Weight' by August Camp, showcases the potential of Sora in generating visuals for music.

The 'World Weight' music video demonstrates Sora's ability to create aesthetically consistent visuals, though comparisons can be made with other platforms like Hyper.

A new video model, HiFi, is on the horizon with Alex Masharov leading the project, aiming to improve video editing and generation capabilities.

HiFi operates as a lean startup with a small team and limited hardware, signifying potential for innovation in the AI video generation space.

Anna Portrait, an innovative tool inspired by emotive Avatar, uses a combination of reference photos and videos to generate realistic portraits.

Visible Maker demonstrates a workflow involving multiple AI tools, showcasing the potential for complex and creative uses of AI in content creation.

The speaker, Tim, is set to attend the Curious Refuge AI filmmaking Mega party and judge the world's first AI Esports tournament.

Even during a slow week, the AI world continues to push boundaries and innovate, suggesting a surge of new developments on the horizon.

The introduction of various AI tools and platforms, such as Dolly 3, Stable Audio 2.0, Sunno, Sora, and HiFi, indicates a rapidly evolving landscape in AI technology.

Public opinion on Sora seems to be shifting, with some feeling excluded from the platform's development and access, leading to discussions on its inclusivity and community engagement.