Stable Audio 2.0: AI-Generated Sample Creation For Musicians

Yellowgold Studios (Jason Howell)
4 Apr 202409:12

TLDRThe video discusses Stability AI's new audio generation model, Stable Audio 2.0, which can now create up to 3 minutes of music based on user-provided words and descriptions. The model offers 20 free credits per month, with each generation consuming two credits. The host shares their experience using the tool, noting improvements in the AI's understanding of music structure and its potential as a creative tool for musicians. They also experiment with blending AI-generated music with their own creative process, highlighting the technology's potential for original content creation.


  • 🚀 Stability AI launched Stable Audio 2.0, an upgraded audio generation model capable of creating up to 3 minutes of music.
  • 🎵 Users can provide lyrics and desired musical style, with the AI generating a piece based on the input, now with a duration of up to 3 minutes.
  • 💳 The service offers 20 free credits per month, with each generation of music consuming two credits, which might vary depending on the length of the clip.
  • 🎧 The AI can also incorporate user-uploaded, copyright-free source audio, expanding the possibilities for unique compositions.
  • 📚 Audio Sparks' library of 800,000 audio files, with the option for owners to opt out of training data, forms the foundation of the AI's training data.
  • 🎶 The AI demonstrates a better understanding of musical structure compared to earlier versions, moving beyond random and discordant sounds.
  • 🔄 The AI's generated music is not perfect and has a 'stock music' quality, but it shows potential for development and improvement.
  • 💡 The AI's music can serve as a starting point for human musicians to create original pieces, blending AI-generated elements with human creativity.
  • 🎤 The speaker experimented with different music genres, finding that the AI seemed to better understand the structure of electronic music.
  • 🤖 The AI's potential as a creative tool is highlighted, with the possibility of users improving the output through effective prompting and collaboration.
  • 🌐 The speaker plans to explore further with AI-generated art and illustration, indicating a broader interest in AI's creative applications.

Q & A

  • What is the new feature of Stability AI's audio generation model?

    -Stability AI's audio generation model, known as Stable Audio 2.0, now has the capability to create up to 3 minutes of music instead of the previous 90-second clips.

  • How many credits does a user get per month for free with Stable Audio 2.0?

    -A user is given 20 credits per month for free to use with Stable Audio 2.0.

  • What is the cost of music generation in terms of credits for Stable Audio 2.0?

    -Each music generation process consumes two credits, which might vary depending on the duration of the clip.

  • Can users upload their own source audio to Stable Audio 2.0?

    -Yes, users can upload their own source audio, provided that it is copyright-free.

  • What is the size of the library that Stable Audio 2.0 is trained on?

    -Stable Audio 2.0 is trained on a library of 800,000 audio files.

  • How does the AI model understand the structure of music?

    -The AI model is getting better at understanding the structure of music through its training on a vast library of audio files, which allows it to generate music with more recognizable patterns and sections.

  • What is the speaker's opinion on the quality of the music generated by Stable Audio 2.0?

    -The speaker feels that the music generated has a stock music quality to it and is not something they would listen to daily, but acknowledges that it is improving and has potential for further development.

  • How does the speaker view the role of AI in music creation?

    -The speaker sees AI as a tool for enhancing creativity, allowing for faster and more efficient generation of musical ideas and hooks. They believe that the more it is used this way, the more beneficial and secure it becomes for artists.

  • What did the speaker do with the techno version generated by Stable Audio 2.0?

    -The speaker took the techno version generated by Stable Audio 2.0 and incorporated it into their own music system, turning it into a potential remix or a base for a new song.

  • What is the speaker's hope for the future of AI in music?

    -The speaker hopes that AI can continue to be used as a tool for creativity, allowing musicians to collaborate with the technology to produce unique and original music.

  • How did the speaker come up with the prompt for Stable Audio 2.0?

    -The speaker initially struggled with the prompt, and then used another AI model, Perplexity, to generate a more detailed prompt based on their initial idea of a house music track with energy and pads.



🎵 Stable Audio 2.0: AI-Powered Music Creation

The first paragraph discusses the launch of Stability AI's Stable Audio 2.0, an audio generation model that has evolved from creating 90-second clips to generating up to 3 minutes of music. Users can input desired music themes and receive 20 free credits per month, with each generation consuming two credits. The speaker shares their experience with the platform, noting its improvement in understanding music structure and generating more coherent and structured audio compared to earlier versions. They mention experimenting with different music styles, such as pop-punk and electronic music, and reflect on the AI's progress in grasping musical patterns. The speaker also contemplates the potential of blending AI-generated music with their own creative process.


🎨 Human-AI Collaboration in Music Production

The second paragraph delves into the creative potential of AI tools like Stable Audio. The speaker highlights the efficiency of these tools in generating original music, which can significantly reduce the time spent searching for hooks or browsing music libraries. They express excitement about the possibilities of human-AI collaboration, where AI can serve as a creative assistant. The speaker also discusses the importance of viewing AI as a tool and embracing its use in enhancing human creativity. They mention an upcoming talk and their experiment with generative AI for creating images, emphasizing the desire to better instruct AI in artistic domains. The speaker concludes by sharing their enthusiasm for exploring AI's role in facilitating creative processes.



💡Stability AI

Stability AI is the company responsible for launching Stable Audio 2.0, an audio generation model mentioned in the transcript. This model is capable of creating music clips, initially up to 90 seconds but now extended to up to 3 minutes. The service was initially available only to paying members, but has since been made more accessible. The keyword is central to the video's theme as it sets the context for the discussion on AI-generated music.

💡Audio Generation

Audio generation refers to the process of creating new audio content, such as music or sound effects, using artificial intelligence. In the context of the video, it is the method by which Stability AI's model produces music clips based on user input. This concept is integral to the video's narrative as it explores the capabilities and potential of AI in the music industry.

💡Credits System

The credits system mentioned in the transcript is a mechanism used by Stability AI to regulate the usage of its audio generation service. Users are given a monthly allowance of credits, with each generation of music consuming a certain number of credits. This system is a key aspect of the video's discussion on the accessibility and cost-related aspects of using AI for music creation.

💡Source Audio

Source audio refers to the original audio files that are used to train AI models like Stability AI's music generator. In the transcript, it is mentioned that the AI is trained on a library of 800,000 audio files, with the option for owners to opt out of the training process. This concept is crucial to understanding how AI learns to generate music and the ethical considerations involved in using such data.

💡Music Structure

Music structure refers to the organization and arrangement of musical elements such as verses, choruses, and bridges in a composition. The video touches on how AI-generated music is improving in its understanding of these structures, moving from random and discordant sounds to more coherent and structured compositions. This concept is central to the discussion on the advancement of AI in music creation and its ability to mimic human-like composition techniques.

💡Electronic Music

Electronic music is a genre of music that employs electronic devices, such as synthesizers and computers, to produce sounds. In the transcript, the speaker explores how AI-generated music might fare with electronic genres like house, techno, and EDM, which are inherently created with electronic instruments. This concept is significant as it examines the compatibility of AI with music forms that are closely tied to electronic and digital production.

💡Human Collaboration

Human collaboration in the context of the video refers to the process of humans working alongside AI to enhance creativity and produce unique content. The speaker discusses the potential of combining AI-generated music with human input to create something original and personalized. This concept is essential to the video's message about the symbiotic relationship between humans and AI in the creative process.


Creativity in the video is discussed as a vital aspect of music production, where AI tools like Stability AI's audio generator can assist in the creative process by generating original content. The speaker highlights the excitement and potential of using AI to spark new ideas and streamline the search for inspiration in music creation. This concept is central to the video's theme of exploring the intersection of technology and artistic expression.

💡Artificial Intelligence

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think, learn, and problem-solve like humans. In the context of the video, AI is used to generate music, highlighting its evolving capabilities and the impact it has on creative fields. The concept is fundamental to the video's exploration of AI's role in revolutionizing music production.

💡Music Industry

The music industry encompasses the creation, distribution, and promotion of music, as well as the people and organizations involved in these activities. The video discusses the implications of AI-generated music on this industry, including changes in music production and the potential for new creative tools. This concept is important as it frames the broader context in which AI's role in music is being evaluated and discussed.


Stability AI launched Stable Audio 2.0, an audio generation model that creates music.

Stable Audio 2.0 can now generate up to 3 minutes of music, an increase from the previous 90-second clips.

Users provide words or themes for the music they want, and the AI generates a matching track.

The service offers 20 free credits per month, with each generation using two credits.

The AI can also incorporate user-uploaded, copyright-free source audio for music creation.

Audio Sparks' library of 800,000 audio files contributes to the AI's training data, with opt-out options for owners.

The AI demonstrates an improved understanding of music structure compared to earlier versions.

The generated music, while not perfect, shows promise and a better grasp of musical form.

Electronic music genres like house, techno, and EDM might be better suited for AI-generated music due to their digital nature.

The AI-generated techno track has potential for use on the dance floor, showing its practical application.

Human intervention can enhance AI-generated music, turning it into a creative collaboration.

Stability Audio serves as a tool for creativity, allowing users to generate original content more efficiently.

The use of AI as a creative tool is seen as positive and secure, encouraging its adoption in various fields.

The speaker's experience with AI-generated music suggests that prompting can significantly improve output quality.

AI collaboration is exemplified by using one AI to create prompts for another AI in music generation.

The creative potential of AI tools like Stability Audio is highlighted by the speaker's excitement and interest.