Voice Cloning in ElevenLabs vs. Descript

Excelerator
19 Oct 202307:25

TLDRThe video explores voice cloning technology, testing two popular apps, 11 Labs and Descript, for their effectiveness and ease of use. It highlights the process of uploading audio, the requirement of a paid plan for 11 Labs, and the new, faster AI speaker technology in Descript. The video compares the quality and usability of both platforms, noting that while 11 Labs offers realistic AI voices at a low cost, Descript's voice cloning requires specific script readings for training. The reviewer provides a balanced perspective on the strengths and limitations of each service.

Takeaways

  • 🎤 Voice cloning technology allows users to record or upload audio for AI to learn their voice for future text-to-speech purposes.
  • 📱 11 Labs is a popular app offering voice cloning, requiring a subscription plan for access to voice cloning features.
  • 🚀 11 Labs has improved its voice cloning AI to be faster, easier, and better in terms of performance.
  • 🔊 Users need to upload an audio file of at least one minute in length for the AI to learn their voice in 11 Labs.
  • 🎧 The AI-generated voice can be used to synthesize speech from typed text, mimicking the user's voice.
  • 📌 There are some limitations to the technology, such as the initial pace and emphasis on certain words which may not always sound natural.
  • 🌟 Descript, another service, has recently announced advancements in its voice cloning AI, claiming faster and improved quality.
  • 📑 To use Descript's voice cloning, users must read a provided script for authorization and training of the AI.
  • 🔄 Users can only upload a recording of themselves reading the specific script provided by Descript for voice training.
  • 💬 Both 11 Labs and Descript offer useful features beyond voice cloning, such as video editing and eye contact adjustment for Descript.
  • 💰 The reviewer is an affiliate for both 11 Labs and Descript, and may receive a commission for purchases made through their links.

Q & A

  • What is voice cloning technology?

    -Voice cloning technology allows users to record or upload audio of their voice, which is then learned by an AI system. This enables the AI to generate text-to-speech audio that sounds as if the user had spoken the words at the time of creation.

  • How does 11 Labs' voice cloning work?

    -11 Labs' voice cloning requires a subscription starting at $5 per month. Users upload a minimum of one minute of audio, and the system then creates a voice profile named after the user. This profile can be used in the speech synthesis section to generate audio from typed text.

  • What is the recommended audio length for 11 Labs' voice cloning?

    -11 Labs suggests that an audio file of at least one minute is ideal for voice cloning. They note that going over five minutes does not provide additional benefits for the cloning process.

  • What improvements have been made to 11 Labs' voice cloning AI?

    -11 Labs has made its voice cloning AI faster, easier, and better in terms of performance. The improvements aim to provide a more efficient and higher quality voice cloning experience for users.

  • How does the new AI speaker technology from Script work?

    -Script's new AI speaker technology allows users to clone their voice by recording for a minute or two. The system then processes the recording and provides a voice profile ready for use, claiming to offer better quality than previous methods.

  • What was the issue encountered when trying to upload a recording to Script's platform?

    -The issue was that the recording had to match the specific script provided by Script for authorization and training. Any other recording, even if it was the user's own voice, could not be used for training unless it was the given script.

  • What are some limitations of the voice cloning technology as demonstrated in the script?

    -Limitations include the need for specific audio lengths and content for training, as well as potential issues with the naturalness of the generated voice, such as too long gaps between words or a lack of emphasis and flavor in the speech.

  • What other features does Descript offer besides voice cloning?

    -Descript offers various features such as editing videos by editing text and an eye contact editing tool, which are considered innovative and useful for users.

  • What is the pricing model for 11 Labs' voice cloning services?

    -11 Labs offers a subscription model starting at $5 per month for access to their voice cloning services.

  • How can users provide feedback on the voice cloning experience?

    -Users can provide feedback by sharing their thoughts and experiences, and if they find the technology helpful, they can support the creator by subscribing through the provided links in the description.

  • What is the role of the provided script in the voice cloning process on Script's platform?

    -The provided script on Script's platform serves as the authorization and training material for the AI. Users must read this script into their microphone for the AI to learn and clone their voice accurately.

Outlines

00:00

🎤 Exploring Voice Cloning Technology with 11 Labs

This paragraph introduces the concept of voice cloning, a technology that allows users to record audio or upload existing recordings for AI to learn their voice. The focus is on the usability of this technology in text-to-speech applications. The script discusses testing voice cloning with 11 Labs, a popular app that recently improved its AI for faster and better results. The process of cloning a voice in 11 Labs is described, including the requirement of a paid plan, the minimum audio length for cloning, and the steps to create a cloned voice named 'Bob'. The paragraph concludes with a test of the cloned voice's quality and usability by generating a short phrase and a longer script, noting some minor issues with pacing and emphasis.

05:01

🚦 Challenges and Comparisons in Voice Cloning with Descript and 11 Labs

The second paragraph delves into the challenges faced while using Descript's voice cloning technology and compares it with 11 Labs. It highlights the issues encountered when attempting to upload a recording longer than two minutes and the requirement to use a specific script provided by Descript for training the AI. The paragraph also discusses the limitations of using a non-authorized recording. Despite these challenges, the paragraph goes on to compare the output of both Descript and 11 Labs using the same text. It notes that while Descript's output might lack some 'flavor' and natural pacing, both applications offer useful features. Descript is praised for its video editing capabilities and 11 Labs for its affordable and realistic AI voices. The paragraph ends with an invitation for feedback and information on how to access both platforms through affiliate links provided in the description.

Mindmap

Keywords

💡Voice Cloning

Voice cloning is a technology that enables the recording and replication of a person's voice, allowing AI to generate speech that sounds like the original speaker. In the context of the video, it is used to discuss the process of creating a synthetic voice for text-to-speech purposes, as demonstrated by the AI learning the user's voice from a 7-minute audio clip.

💡Text-to-Speech (TTS)

Text-to-Speech technology converts written text into spoken words, using digital voices that can mimic human speech. In the video, TTS is used to generate audio output in the user's cloned voice after they type in words, simulating as if the user had spoken those words at that time.

💡11 Labs

11 Labs is an application mentioned in the video that offers voice cloning services. It requires a subscription plan to access its voice cloning features. The app has been updated to improve the speed, ease, and quality of its voice cloning AI.

💡Instant Voice Cloning

Instant Voice Cloning refers to the rapid creation of a cloned voice, which is a feature that 11 Labs claims to have improved upon. It suggests that users can quickly generate a voice model with minimal audio input, as opposed to the previous requirement of a longer recording.

💡Descript

Descript is another application mentioned in the video that offers voice cloning services. It has recently introduced a new AI speaker technology that allows users to clone their voice with a shorter recording requirement and claims to have improved voice quality.

💡Audio File

An audio file is a digital file that contains audio data, such as music or voice recordings. In the context of the video, users are required to upload audio files to both 11 Labs and Descript for the purpose of voice cloning, with specific length requirements to ensure the AI can learn the user's voice accurately.

💡Speech Synthesis

Speech synthesis is the process of generating human-like speech from text input. It is a core component of TTS technology and is used in the video to demonstrate how the cloned voice can be used to produce audio from typed text.

💡Authorization

In the context of the video, authorization refers to the process of verifying and granting permission to use a voice recording for the purpose of voice cloning. Descript requires users to read a specific script as a part of the authorization process to train the AI simultaneously.

💡Waveform

A waveform is a visual representation of an audio signal, showing the variations in amplitude over time. In the video, the appearance of a waveform in Descript's timeline indicates that the audio is being added to the cloned voice and the process is underway.

💡Subscription Plan

A subscription plan is a payment model where users pay a recurring fee to access a service or product. In the video, the user subscribes to a monthly plan with 11 Labs to use their voice cloning service, highlighting the commercial aspect of such technologies.

💡Realistic AI Voices

Realistic AI voices refer to the high-quality, human-like speech generated by artificial intelligence. The video discusses the capabilities of 11 Labs and Descript to create AI voices that sound very close to the original speaker's voice, although not perfect.

Highlights

Voice cloning technology allows users to record audio or upload existing recordings for AI to learn their voice.

The AI-generated voice can be used for text-to-speech, making it seem as if the user spoke the words at the time of generation.

11 Labs is a popular app offering voice cloning technology, requiring a subscription plan for access.

11 Labs has introduced a new feature called 'instant voice cloning' to improve the speed and quality of voice replication.

To use 11 Labs' voice cloning, users must upload an audio file of at least one minute in length.

After uploading, 11 Labs' system takes some time to process and create a cloned voice.

The cloned voice can be tested in the speech synthesis section by typing in text and generating audio.

Descript, another platform, has recently announced improvements to its voice cloning AI, making it faster and of better quality.

Descript requires users to read a script for authorization and training purposes.

The script provided by Descript cannot be changed; users must record themselves reading it for voice training.

Descript's voice cloning process involves a short recording and authentication step before the voice can be used.

Both 11 Labs and Descript offer realistic AI voices, though there may be some differences in the naturalness and emphasis of the speech.

The reviewer found that the gaps between words in the AI-generated speech were slightly too long, affecting the natural flow.

Despite minor issues, both platforms provide useful features such as video editing and realistic voice replication.

The reviewer encourages users to try both platforms and share their thoughts on the technology.

The reviewer is an affiliate for both 11 Labs and Descript, and may receive a commission from purchases made through their links.