I Challenged My AI Clone to Replace Me for 24 Hours | WSJ

The Wall Street Journal
28 Apr 202307:34

TLDRIn a fascinating experiment, Joanna Stern explores the potential of AI to replace her for a day. With the help of Synthesia and ElevenLabs, she creates an AI avatar and voice clone to tackle challenges like phone calls, creating a TikTok, bank biometrics, and video calls. The results show that while AI voices are impressive, video clones still have a way to go. This experiment not only highlights the growing capabilities of AI but also raises concerns about potential misuse and the need for vigilance in distinguishing real from AI.

Takeaways

  • 🎥 Joanna Stern explores the possibility of replacing herself with an AI clone for a day to understand the capabilities of AI-generated voice and video.
  • 🤖 The AI avatar of Joanna was created by Synthesia, a startup that recorded her head movements and voice for training data.
  • 💬 The voice of the AI clone was improved by ElevenLabs, which used two hours of Joanna's previous recordings to enhance the quality.
  • 📞 In the first challenge, AI Joanna successfully passed as the real Joanna in a phone call with Snap CEO Evan Spiegel.
  • 📹 The second challenge involved creating a TikTok video, where the AI avatar's lack of natural movements and limited facial expressions led to its failure.
  • 🔊 AI-generated voices can be used for bank biometrics, as demonstrated when AI Joanna's voice was enough to pass through to a service rep at Chase.
  • 👤 Video calls proved to be a challenge for the AI clone; participants could tell it wasn't the real Joanna due to the avatar's unnatural appearance and behavior.
  • 🚫 The experiment showed that while AI voices are becoming quite convincing, video clones still have a way to go before they can fully deceive.
  • 💡 The video highlights the potential for saving time using AI tools, but also the risks of misuse, such as scammers using AI voices to impersonate individuals.
  • 🛑 Synthesia and ElevenLabs have measures in place to prevent misuse of their technology, such as requiring verbal consent and being able to identify their voices.

Q & A

  • What was Joanna Stern's main objective in the video?

    -Joanna Stern aimed to explore whether she could replace herself with an AI clone for a day, in order to see if the AI could perform tasks and interactions on her behalf.

  • How was Joanna's AI avatar created?

    -Joanna's AI avatar was created by Synthesia, a startup that recorded her performing head movements and reading a script. This data was used to train their AI neural networks to generate the avatar.

  • What was the role of ElevenLabs in Joanna's AI experience?

    -ElevenLabs was used to improve the quality of Joanna's AI voice. After uploading two hours of her previous recordings, ElevenLabs produced a better voice clone than the initial version provided by Synthesia.

  • What were the four challenges Joanna set for her AI clone?

    -The four challenges Joanna set were: making phone calls, creating a TikTok, passing bank biometrics, and conducting video calls.

  • How did Joanna's sister, Julia, react to the AI's call about her dead fish?

    -Initially, Julia was fooled and thought it was Joanna. However, she later realized it wasn't her sister due to the AI not pausing for talking back.

  • What was the outcome of the TikTok challenge?

    -The TikTok challenge failed because the AI-generated video had limitations such as the avatar not moving its arms, mismatched mouth movements, and lack of facial expressions.

  • How successful was the AI voice in deceiving the bank's voice biometrics system?

    -The AI voice was successful in passing the bank's voice biometrics system, as it confirmed Joanna's identity and transferred her to a customer service representative without additional questions.

  • Why did the video call challenge fail?

    -The video call challenge failed because the participants noticed the AI's limitations, such as the avatar looking like a hologram version of Joanna, poor posture, and lack of jokes, leading them to realize it wasn't the real Joanna.

  • What concerns did Joanna express about the potential misuse of AI clones and voices?

    -Joanna expressed concerns about scammers potentially using AI voices to impersonate individuals to call banks or families, and the need for people to be on high alert to distinguish between real and AI-generated interactions.

  • What measures do Synthesia and ElevenLabs take to prevent misuse of their technology?

    -Synthesia requires verbal consent from those creating avatars, while ElevenLabs requires users to check a box confirming they have permission to use the voice. Both companies claim to be capable of identifying their AI voices if misused.

Outlines

00:00

🎥 Introduction to AI Avatar Creation

The script begins with Joanna Stern's excitement about creating an AI avatar that resembles and behaves like her. She discusses the current state of AI tools that generate text and images, and how AI-generated voice and video will further blur the line between reality and fiction. Joanna outlines four challenges she has devised to test if her AI avatar can replace her for a day, aiming to free up time for her to engage in personal activities. The paragraph also touches on the unsettling feeling of seeing her frozen avatar and the process of getting ready for the challenges.

05:01

🤖 AI Avatar Development and Voice Improvement

Joanna delves into the process of creating her AI avatar with Synthesia, a startup that recorded her head movements and voice in a professional studio. The company used this data to train their AI neural networks. However, the initial voice output wasn't satisfactory, leading to the use of ElevenLabs, which improved the voice clone after analyzing two hours of Joanna's previous recordings. Both Synthesia and ElevenLabs operate on a similar principle where users can type in text for AI clones to recite. Joanna explains the commercial applications of these AI tools, with Synthesia targeting companies for internal video creation and ElevenLabs offering voice cloning services at an affordable monthly fee.

📞 Challenges Involving AI: Phone Calls and TikTok

Joanna describes the first challenge involving phone calls, where she successfully used her AI voice to impersonate Evan Spiegel, CEO of Snap, and discuss the impact of AI on communication. The second challenge was creating a TikTok video using a script written by ChatGPT in Joanna's voice, discussing an obscure iOS 16 tip. Despite the initial struggle with ChatGPT fabricating information, a usable script was eventually produced. The resulting TikTok video, however, did not impress the platform's audience due to the avatar's lack of movement and mismatched mouth movements, highlighting areas for improvement in AI avatar technology.

🏦 Bank Biometrics and Video Calls

The third challenge involved using the AI voice for bank biometrics, where Joanna's voice was successfully used to authenticate and transfer her to a customer service representative at Chase. An attempt by an intern to impersonate Joanna with a voice biometric system resulted in further verification being requested. The final challenge was video calls, where Joanna's AI avatar was integrated into Google Meet calls. However, the avatar was quickly identified as fake due to its lack of natural human characteristics such as posture and humor, resulting in a failed challenge.

🚫 Conclusion and Reflection on AI Impersonation

Joanna concludes the video by reflecting on the day's challenges, noting that while AI voices are quite convincing, video clones are not yet capable of fooling people. She expresses her mixed feelings about the potential time-saving benefits of AI clones and the risks of misuse, such as scammers using cloned voices. Joanna emphasizes the importance of vigilance in distinguishing between real and AI-generated content and ends with a humorous reminder to stay human.

Mindmap

Keywords

💡AI Clone

An AI Clone refers to an artificial intelligence-powered replica of a person, capable of mimicking their appearance, voice, and mannerisms. In the context of the video, the AI Clone is used to explore the possibility of replacing the real person for various tasks, such as making phone calls, creating TikTok videos, and participating in video calls. The AI Clone is generated using AI tools like Synthesia and ElevenLabs, which record the person's movements and voice to create a digital double.

💡Synthesia

Synthesia is a startup company that specializes in creating AI avatars, which are digital representations of individuals that can mimic their movements and speech. In the video, Joanna Stern uses Synthesia's services to generate her AI Clone by recording her head movements and voice, which are then used as training data for the AI neural networks. Synthesia's technology is aimed at companies wanting to produce internal videos and charges a custom fee for creating an avatar.

💡ElevenLabs

ElevenLabs is a company that focuses on producing AI-generated voices, or voice clones. In the video, Joanna Stern uses ElevenLabs to improve the voice of her AI Clone by uploading two hours of her previous recordings. The company offers a subscription service for creating custom voice clones, which can be used for various applications, including enhancing AI avatars.

💡Challenges

In the context of the video, challenges refer to the series of tests Joanna Stern sets up to determine if her AI Clone can effectively replace her in various scenarios. These challenges include making phone calls, creating a TikTok video, passing bank biometrics, and participating in video calls. Each challenge is designed to assess different aspects of the AI Clone's capabilities and to explore the potential and limitations of AI technology in mimicking human interactions.

💡Phone Calls

Phone calls are one of the challenges Joanna Stern sets for her AI Clone, aiming to test its ability to communicate effectively and naturally over the phone. In the video, Joanna uses her AI Clone to make a call to Evan Spiegel, the CEO of Snap, and to her sister, Julia, to see if they can distinguish between the AI and the real Joanna. The challenge explores the capabilities of AI-generated voices and how well they can simulate human conversation.

💡TikTok

TikTok is a social media platform where users create and share short videos, often set to music. In the video, Joanna Stern asks ChatGPT to write a script for a TikTok video featuring her AI Clone. The goal is to create content that appears as if it was made by the real Joanna, discussing an obscure iOS 16 tip. However, the AI-generated script initially contains inaccuracies, and the final video has limitations in terms of the avatar's movements and facial expressions, leading to mixed results in fooling the TikTok audience.

💡Bank Biometrics

Bank biometrics refer to the use of unique personal characteristics, such as voice or fingerprint, to verify a person's identity for security purposes in banking transactions. In the video, Joanna Stern tests her AI Clone's ability to pass voice biometrics by using it to call her bank's customer service. The challenge demonstrates that while the AI voice clone can sometimes fool the system, additional verification is often required to complete certain requests, highlighting the current limitations and safeguards in place.

💡Video Calls

Video calls are a form of communication where people can see and hear each other in real-time through digital devices. In the video, Joanna Stern uses her AI Clone to participate in Google Meet calls to test if the digital double can maintain the illusion of her presence in a virtual meeting. However, the AI Clone's lack of natural movements and limited facial expressions leads to it being easily identified as fake by the participants, showcasing the current challenges in creating convincing video avatars.

💡AI Voices

AI Voices refer to the artificial intelligence-generated voices that mimic human speech. In the video, AI voices are used to create a voice clone of Joanna Stern and to power her AI Clone's speech. The exploration of AI voices in the video reveals that while they can be quite convincing and are useful for saving time, there is also the potential for misuse, such as scammers using fake voices to impersonate individuals.

💡Misuse

Misuse, in the context of the video, refers to the potential negative uses of AI technology, particularly the creation of AI clones and voice clones. The video highlights concerns about scammers using these technologies to impersonate individuals, for example, by calling banks or family members with the intention to deceive. This emphasizes the need for vigilance and the development of safeguards to prevent such misuse.

💡Authenticity

Authenticity in the video pertains to the genuineness or truthfulness of the AI-generated content and interactions. Joanna Stern's experiment with her AI Clone aims to uncover how close the AI can come to replicating her identity authentically. The challenges she faces, such as the AI Clone's inability to move its arms naturally or the voice clone's initial lack of accuracy, reveal that while AI has made significant advancements, there is still a gap between the digital replicas and the real person in terms of authenticity.

Highlights

Joanna Stern explores the possibility of replacing herself with an AI clone for 24 hours.

AI tools are blurring the lines between real and fake, prompting Joanna to test the limits of AI-generated voice and video.

Joanna's AI avatar is created by Synthesia, a startup that records head movements and voice for training data.

ElevenLabs improves upon Synthesia's voice output, offering a more realistic Joanna Stern voice clone.

Synthesia targets companies wanting to make internal videos, charging at least $1,000 for a custom avatar.

Creating a voice clone with ElevenLabs costs $5 a month, making it accessible for a wider range of applications.

Challenge one involves making phone calls, testing the effectiveness of AI voice in a real-world scenario.

Joanna's AI clone successfully deceives her sister and Evan Spiegel, the CEO of Snap, during phone calls.

Challenge two requires creating a TikTok, showcasing the AI avatar's ability to perform on social media platforms.

Despite initial struggles with ChatGPT's scriptwriting, the AI-generated TikTok video is well-received.

TikTok's algorithm, however, detects inconsistencies in the AI-generated video, indicating room for improvement.

In challenge three, the AI voice clone passes a bank's voice biometrics test, raising concerns about potential misuse.

The AI clone fails to convince during video calls, as participants notice the lack of natural movements and expressions.

The experiment reveals that while AI voices are becoming increasingly convincing, video clones still have a way to go.

Joanna highlights the need for vigilance to distinguish between real and AI-generated content moving forward.

Companies like Synthesia and ElevenLabs are implementing measures to prevent misuse of their technology.

The video concludes with a call to action for viewers to stay human and be aware of the inevitable rise of AI in our lives.