Dalle-3, Sora, & ChatGPT Plus: Stable Audio vs Suno v3 & New Video Generator!
TLDRIn this week's AI news, OpenAI introduces in-painting for Dolly 3, despite its late arrival and mixed aesthetic reception. Stability AI releases Stable Audio 2.0, offering free music generation with the ability to add user audio for reference. Sunno's music generation prowess is highlighted, and Chad GPT 3.5 becomes accessible without login. Sora's first music video, 'World Weight,' showcases its potential, and a new video model, HiFi, is on the horizon with a focus on video editing and generation improvements.
Takeaways
- π¨ OpenAI has introduced in-painting feature in Dolly 3, allowing users to edit images by adding or changing elements within the picture.
- ποΈ The in-painting process in Dolly 3 is not as intuitive as one might expect and requires users to manually select and edit areas of the image using a selection brush.
- π An example given in the script involves adding butter to a piece of toast in an image, which results in an image with an excessive amount of butter, highlighting the need for fine-tuning the prompts.
- π΅ Stability AI released Stable Audio 2.0, which can create full musical tracks up to 3 minutes long from a single prompt and offers 20 free credits per month for users.
- π Sunno has released its version 3 model, which is considered superior in terms of audio fidelity and composition choices, and also allows for the addition of singing and the use of audio references.
- π OpenAI now allows users to access Chat GPT 3.5 for free without the need to log in, providing a lower barrier for entry to experience the capabilities of the model.
- πΆ The first music video created with Sora has been released, featuring the track 'World Weight' by August Camp, an ambient electronic track reminiscent of artists like Aphex Twin.
- π Sora's public perception may be shifting, with some feeling it's becoming more exclusive, while others recommend exploring free alternatives like Hyper for similar results.
- π€ Anna Portrait is a new portrait animator that uses a reference photo and video to create a final output, with an example shown of a character created in mid-journey and upscaled in Leonardo.
- π HiFi, a new video generator, has emerged from stealth mode, with Alex Masharov leading the team, aiming to improve video editing and character modification in videos.
Q & A
What new feature has been introduced in Dolly 3 that was long overdue?
-The new feature introduced in Dolly 3 is the in-painting capability, which allows users to edit images by adding or changing elements within the photo, such as adding butter to a piece of toast.
What is the user's opinion on the aesthetic output of Dolly 3?
-The user is not a huge fan of Dolly 3's aesthetic output, as they have never really jived with it that much, but they do appreciate its functionality, especially when integrated with chat GPT.
How does one interact with the in-painting feature in Dolly 3?
-To use the in-painting feature in Dolly 3, users need to click on the image, enter the edit mode, and use the selection brush to define an area for editing, such as adding a piece of butter to toast or changing a cup of coffee to a glass of orange juice.
What is the name of the AI music generation platform that is considered the current leader in the field?
-Sunno is considered the current leader in AI music generation, known for its high-quality audio output and a range of instrumentation and composition choices.
What unique feature does Stable Audio offer that sets it apart from other AI music generation platforms?
-Stable Audio offers the unique feature of allowing users to add their own audio as a reference for generating music, which can lead to creative uses and personalized outputs.
How can one access and use Chat GPT 3.5 for free without logging in?
-To use Chat GPT 3.5 for free without logging in, users can visit the homepage and click on 'Try It' to start utilizing the model without any login requirements.
What is the significance of the first music video created with Sora?
-The significance of the first music video created with Sora is that it demonstrates the capabilities of the platform in generating visuals for music, showcasing its potential in the creative field, especially when combined with other elements like overlays and textures.
What is the user's opinion on the general public's perception of Sora?
-The user feels that there has been a turning of the screw in terms of public opinion on Sora, as it seems to be perceived as exclusive, with only a select group having access and the rest feeling left out.
What is the name of the new video model that is coming up, and who is leading its development?
-The new video model coming up is called HiFi, and it is being led by Alex Masharov, the former head of AI at Snap.
What is HiFi's plan for overcoming hardware limitations in video generation?
-HiFi plans to overcome hardware limitations by running on a lean team of 16 people with a cluster of 32 GPUs, aiming to build an improved video editor and train a more powerful video generation model.
How does the user describe the workflow for creating a character using Visible Maker and other AI tools?
-The user describes the workflow as starting with a reference photo in mid Journey, upscaling in Leonardo, generating voice with 11 Labs' speech-to-speech, removing the green screen, and adding a generated soundtrack with Sun to create a final character video.
Outlines
ποΈ Open AI Updates and Dolly 3's In-Painting Feature
The paragraph discusses recent updates from Open AI, highlighting the new in-painting feature in Dolly 3. The author expresses mixed feelings about Dolly 3's outputs, noting that while it's aesthetically not their preference, the integration with chat GPT for image generation is appreciated. The in-painting process is explained, where users can edit parts of an image, such as adding butter to toast, but the results may not always align with the prompt. The author also mentions the limitations of Dolly 3 compared to other image generators that have had this feature for a longer time.
π΅ Stability AI's Audio Update and Sunno's Music Generation
This section delves into Stability AI's recent update, Stable Audio 2.0, which enables the creation of full musical tracks from a single prompt. The author notes that while the update is welcomed, it's not as advanced as Sunno's music generation capabilities. Sunno's version 3 model is praised for its superior audio quality and adherence to the prompted genre. The author also highlights the unique feature of Sunno that allows adding singing and using personal audio as a reference. However, Stability AI's ability to incorporate user audio for reference is seen as a secret weapon.
π₯ Sora's First Music Video and Comparison with Hyper
The first music video created with Sora is discussed, titled 'World Weight' by August Camp. The author comments on the visual aesthetics and the consistency of the video, noting the use of tracking shots and vintage film looks. However, comparisons are made with Hyper, a free tool that can achieve similar results with the addition of overlays and textures. The author questions the uniqueness of Sora's output and suggests that Hyper could be a viable alternative for achieving similar video effects.
π€ Introducing Anna Portrait and Upcoming HiFi Field
The paragraph introduces Anna Portrait, a tool inspired by Emotive Avatar Talker, which uses a combination of a reference photo and video to create realistic portraits. A use case is presented, showcasing how a character created in Mid Journey was upscaled and used in a video with a generated voice and background music. Additionally, the author discusses an upcoming video generator, HiFi Field, led by Alex Masharov, former head of AI at Snap. HiFi aims to improve video editing by allowing modifications to characters and objects and training a more powerful video generation model.
Mindmap
Keywords
π‘AI news
π‘Open AI
π‘Dolly 3
π‘Stability AI
π‘Sunno
π‘Chat GPT 3.5
π‘Sora
π‘Anna portrait
π‘HiFi
π‘AI-generated music
π‘AI filmmaking
Highlights
Open AI introduces in-painting feature in Dolly 3, allowing users to edit images by adding or changing elements within the scene.
Dolly 3's in-painting feature is not as intuitive as one might expect, requiring users to manually select and edit areas of the image.
Despite personal aesthetic preferences, the new in-painting feature marks a step forward for Dolly 3, albeit a delayed one compared to other image generators.
Stability AI releases Stable Audio 2.0, which can create full musical tracks up to 3 minutes long from a single prompt, offering 20 free credits per month to users.
Sunno, a competitor to Stability AI, is recognized as the current leader in AI-generated music due to its superior audio fidelity and composition choices.
Stability AI's secret weapon is the ability to add user's own audio as a reference, which can lead to creative uses and improvements in generated music.
Open AI now allows free access to Chat GPT 3.5 without the need for login, providing a capable model for users to experiment with.
The first music video created with Sora, titled 'World Weight' by August Camp, showcases the potential of Sora in generating visuals for music.
The 'World Weight' music video demonstrates Sora's ability to create aesthetically consistent visuals, though comparisons can be made with other platforms like Hyper.
A new video model, HiFi, is on the horizon with Alex Masharov leading the project, aiming to improve video editing and generation capabilities.
HiFi operates as a lean startup with a small team and limited hardware, signifying potential for innovation in the AI video generation space.
Anna Portrait, an innovative tool inspired by emotive Avatar, uses a combination of reference photos and videos to generate realistic portraits.
Visible Maker demonstrates a workflow involving multiple AI tools, showcasing the potential for complex and creative uses of AI in content creation.
The speaker, Tim, is set to attend the Curious Refuge AI filmmaking Mega party and judge the world's first AI Esports tournament.
Even during a slow week, the AI world continues to push boundaries and innovate, suggesting a surge of new developments on the horizon.
The introduction of various AI tools and platforms, such as Dolly 3, Stable Audio 2.0, Sunno, Sora, and HiFi, indicates a rapidly evolving landscape in AI technology.
Public opinion on Sora seems to be shifting, with some feeling excluded from the platform's development and access, leading to discussions on its inclusivity and community engagement.