Chat GPT can now speak and sing in real time | DW News

DW News
14 May 202407:33

TLDROpen AI has unveiled GPT 40, a new AI model that integrates text, audio, and vision for real-time, natural voice interactions. The technology, which can aid visually impaired individuals and provide information, is a step towards multimodal AI. While it's not a significant leap, the integration of functionalities makes interactions more natural. Concerns include overestimating AI capabilities and potential misuse for advice or decisions. Open AI faces legal issues regarding data use, particularly with the New York Times. The AI relies on user data for future development, highlighting the importance of user contribution to technology advancement. The impact of AI on society, especially on education and communication, is a significant consideration for the long term.


  • 🚀 OpenAI has released a new model called GPT 40 that can interact in real-time with natural voice conversation, incorporating audio and vision along with text.
  • 🧩 GPT 40 is a multimodal AI that connects different functionalities, allowing for faster and richer interactions compared to previous models.
  • 🎤 The new model features a voice that is almost natural and includes a slight sense of humor, which has been well-received by users.
  • 👀 The AI can aid visually impaired users by providing real-time descriptions of their surroundings, as demonstrated in an OpenAI video.
  • 🤖 Mike Cook, a senior lecturer in computer science and generative AI specialist, explains that while GPT 40 is an improvement, it's not a huge leap forward.
  • 🚫 There are concerns about the capabilities and limitations of GPT 40, especially regarding its ability to provide medical advice or make significant life decisions.
  • 📚 OpenAI is currently involved in legal disputes, including one with the New York Times, which claims OpenAI is using their data to create a competing product.
  • 📈 The data for AI training comes from various sources, including open datasets, open access data from the internet, and user interactions with the technology.
  • 🔮 The future impact of generative AI on industries, businesses, and society is a topic of concern, especially regarding how it might affect learning and communication in the long term.
  • 🌐 The global AI race is led by the US and China, with both countries having significant public and private investments in AI technology.
  • 🏁 The AI industry is rapidly evolving, and it's crucial to consider both the near-term applications and the long-term implications of integrating AI into various systems.

Q & A

  • What is the new interface developed by Open AI that works with audio and vision as well as text?

    -The new interface developed by Open AI is called GPT 40, which has moved beyond the traditional chatbot features and is capable of real-time, almost natural voice conversation.

  • What is the significance of GPT 40 being a multimodal AI?

    -GPT 40 being a multimodal AI means it can connect different modes of interaction such as text, images, and audio. This allows the system to perform tasks faster and more richly, making the interaction feel more natural.

  • Why is the emotional aspect of the voice in GPT 40 significant?

    -The emotional aspect of the voice in GPT 40 is significant because it contributes to a more natural and human-like interaction, which people tend to respond positively to.

  • What are some concerns regarding the capabilities and limitations of GPT 40?

    -One concern is that people might overestimate the capabilities of GPT 40, assuming it can perform tasks it's not designed for, such as giving medical advice or making life decisions. It's important to remember that GPT 40, while advanced, is still a simpler piece of technology than it might appear.

  • Why doesn't GPT 40 provide real-time news updates?

    -GPT 40 doesn't provide real-time news updates due to legal issues. Open AI is currently involved in court cases regarding how it retrieves and uses data, which has made the company more cautious about using live news data.

  • What are the main sources of data for training AI like GPT 40?

    -The main sources of data for training AI include open datasets created by academics, open access data from the internet which can involve legal gray areas, and data generated by users of the technology themselves, who consent to their data being used when they use the tools.

  • How does the use of AI technology like GPT 40 affect industries, businesses, and societies?

    -AI technology can have a significant impact on various sectors, influencing how we interact, learn, and use technology. There are concerns about integrating AI too quickly into areas where it may not be suitable, and the long-term effects on learning, communication, and societal norms.

  • What are the potential long-term effects of AI technology on learning and communication?

    -The long-term effects of AI technology on learning and communication include changes in how people acquire skills, such as potentially becoming less proficient in certain areas like spelling due to reliance on autocorrect. There are also concerns about how AI might alter the way future generations interact with each other.

  • Which countries are currently leading the global AI race?

    -The United States and China are the two front runners in the global AI race, with a significant split between public and private investment. However, it's difficult to determine a clear leader as much AI development happens behind closed doors.

  • What are the legal challenges that Open AI is facing regarding the use of data?

    -Open AI is facing legal challenges, including a significant court case brought by the New York Times, which alleges that Open AI is producing a competing product using their data without proper consent or legal clearance.

  • How does the use of AI tools like GPT 40 affect the future development of AI technology?

    -The use of AI tools like GPT 40 directly contributes to the future development of AI technology. As users interact with these tools, they generate data that companies like Open AI can use to train and improve the next generation of their technology.

  • What are the potential risks of integrating AI technology into critical systems like education, health, and legal systems?

    -The potential risks include making large-scale decisions about AI's role in these systems without fully understanding its long-term capabilities and implications. There is a concern that AI could be integrated too quickly into areas where it may not be appropriate or fully understood, leading to unforeseen consequences.



🤖 Advanced Multimodal AI Interface by Open AI

Open AI has introduced a new AI model, GPT 40, which integrates audio, vision, and text capabilities, enhancing the user experience beyond traditional chatbot functions. The model is designed to facilitate real-time, almost natural voice conversations. The technology is being showcased in a video assisting a blind visitor in London, demonstrating its ability to provide detailed, real-time descriptions of surroundings. Mike Cook, a senior lecturer in computer science and generative AI specialist, discusses the significance of this development. He explains that while the functionalities are not entirely new, the integration of these features into a multimodal AI system is a notable advancement. The emotional aspect of the AI's voice is highlighted as a key factor in user engagement. Concerns about the technology's capabilities are also addressed, emphasizing the importance of not overestimating its abilities and avoiding reliance on it for critical decisions such as medical advice or life-altering choices. The limitations of GPT 40 are discussed, including its inability to provide real-time news updates due to ongoing legal issues with data usage and potential competition with established news outlets like the New York Times.


🌐 Implications of Generative AI on Society and the Global AI Race

The script delves into the broader implications of generative AI technology on industries, businesses, and societal interactions. Concerns are raised about the potential long-term effects of AI integration in various sectors, such as education, health, and legal systems. The impact of AI on future generations' learning and communication methods is a significant point of discussion. The fear is that rapid integration without fully understanding the long-term consequences could lead to negative outcomes. The script also touches on the global AI race, highlighting the US and China as front runners in public and private investment in AI technology. However, the exact leadership in the race is difficult to determine due to much of the development happening behind closed doors. The conversation concludes with a note on the importance of considering the long-term impacts of AI on society and the need for careful, thoughtful integration of these technologies.



💡AI race

The 'AI race' refers to the competition among various entities, such as companies and countries, to advance artificial intelligence technology. In the context of the video, it signifies the ongoing development and innovation in AI capabilities, with OpenAI's new interface being a significant shift in this race.

💡GPT 40

GPT 40 is the new model introduced by OpenAI, which stands for 'Generative Pre-trained Transformer 40'. It is a significant upgrade from previous models, as it incorporates multimodal capabilities, allowing it to process not just text but also audio and visual inputs. This advancement is a key focus of the video, highlighting its ability for real-time, natural voice conversation.

💡Multimodal AI

Multimodal AI refers to artificial intelligence systems that can process and understand multiple types of input data, such as text, audio, and visual information. The video emphasizes this feature of GPT 40, which allows for a more integrated and natural interaction compared to systems limited to a single mode of interaction.

💡Real-time conversation

Real-time conversation implies the ability of an AI system to engage in immediate and continuous dialogue with humans. The video script mentions this capability of GPT 40, which is a step forward in AI technology and a significant aspect of its appeal and functionality.

💡Emotion in the voice

The 'emotion in the voice' is a feature of GPT 40 that allows it to convey emotional nuances in its responses, making interactions with the AI feel more human-like. This is mentioned as a key aspect that has resonated with people, enhancing the naturalness of communication with the AI.

💡Risks of technology

The 'risks of technology' is a broad term that encompasses the potential negative consequences or dangers associated with the use of advanced technologies, such as AI. In the video, concerns are raised about overestimating the capabilities of AI, leading to misuse in areas like medical advice or life decisions.

💡Legal issues

Legal issues in the context of the video pertain to the challenges and disputes that OpenAI is facing regarding data usage and the development of competing products. The New York Times' lawsuit against OpenAI for using their data to create a competing product is highlighted as a significant legal hurdle.

💡Data sources

Data sources refer to the origins of the information used to train AI systems. The video discusses three main sources: open datasets, open access data from the internet, and user-generated data from people using the AI tools. Understanding these sources is crucial for grasping how AI systems like GPT 40 learn and improve.

💡User data

User data is information provided by individuals when they interact with AI systems. In the video, it is mentioned that as OpenAI runs out of data from other sources, it relies on user data to continue improving its technology. This data is essential for the development of future AI models.

💡Impact on society

The 'impact on society' refers to the far-reaching effects that AI technology can have on various aspects of human life, including education, health, and legal systems. The video expresses concerns about the potential long-term consequences of integrating AI prematurely into these areas.

💡Global AI race

The 'global AI race' describes the international competition to lead in AI development. The video mentions the US and China as front runners, with private investment and advancements often happening behind the scenes, making it difficult to predict the future landscape of AI leadership.


AI race advances with OpenAI's new interface that integrates audio and vision with text.

GPT 40 model introduces real-time, almost natural voice conversation capabilities.

OpenAI aims to increase the user base for their technology.

GPT 40 assists a blind visitor in identifying the presence of the King at Buckingham Palace.

Ducks in St. James Park are described in a natural voice, indicating advancements in AI's sensory description.

Mike Cook, a senior lecturer in computer science, discusses the significance of multimodal AI.

GPT's advancements are more about integration than a huge leap in technology.

Emotion in AI's voice is a notable feature that has been well-received.

Risks of assuming AI can do more than it can after witnessing a single task.

Concerns about people relying on AI for medical advice or major life decisions.

OpenAI is cautious about using live news data due to ongoing legal issues.

New York Times is suing OpenAI for using their data to create a competing product.

Data for AI training comes from open datasets, open access data, and user interactions.

OpenAI encourages user engagement to gather more data for future technology.

The impact of generative AI on industries, businesses, and societal interactions.

Concerns about the long-term effects of AI on learning and communication.

The US and China are leading the global AI race with significant public and private investments.

The future of AI development is uncertain and much happens behind closed doors.