AI Frontiers: Jesper Hvirring Henriksen (OpenAI DevDay)

OpenAI
15 Nov 202309:25

TLDRBe My Eyes, in collaboration with OpenAI, introduces Be My AI on GPT-4V, enhancing visual assistance for blind and low-vision users. The feature allows users to receive AI-generated descriptions of images, improving accessibility to digital content and physical environments. With over half a million users and seven million volunteers, the app provides independence and choice, garnering overwhelmingly positive feedback. The technology's success is celebrated, highlighting its potential to revolutionize assistive technologies.

Takeaways

  • 🌟 Be My Eyes, in partnership with OpenAI, has launched a new feature called Be My AI on GPT-4V to assist blind and low-vision individuals.
  • 👥 The Be My Eyes community consists of over half a million blind and low-vision users supported by more than 7 million volunteers.
  • 📞 The traditional model of assistance involves video calls where volunteers lend their eyes to those in need.
  • 🚀 The introduction of Be My AI provides users with a 24/7 AI assistant alternative to human volunteers, enhancing independence and convenience.
  • 🖼️ GPT-4 vision enables computers to 'see' and describe images, both in the physical and digital world, making media and photos accessible.
  • 🌐 The application of GPT-4 vision transcends basic accessibility by providing detailed descriptions of images, including those in group chats and on websites.
  • 🎯 The AI's ability to describe images is not only accurate but also witty and human-like, offering a more engaging user experience.
  • 📈 Positive feedback from beta testers like Caroline and Lucy Edwards has been overwhelming, showcasing the life-changing impact of the technology.
  • 📊 Since the launch of the beta in March, Be My AI has provided a million image descriptions per month with a 95% satisfaction rate.
  • 🌐 Language support has been expanded to 36 languages by instructing the model to respond in the user's language, demonstrating its adaptability.
  • 🤖 The integration of Be My AI into enterprise customer support, such as Microsoft's Disability Answer Desks, has significantly reduced the need for calls.

Q & A

  • What is the main purpose of Be My Eyes?

    -The main purpose of Be My Eyes is to provide blind and low-vision people with access to visual assistance through a community of volunteers via an app.

  • How many blind and low-vision users does Be My Eyes currently support?

    -Be My Eyes currently supports over half a million blind and low-vision users.

  • What was the motivation behind developing the Be My AI feature on GPT-4V?

    -The motivation behind developing the Be My AI feature was to provide users with an alternative to human assistance, offering them an AI assistant available 24/7 that can see for them.

  • How does GPT-4V enhance the capabilities of computers in relation to vision?

    -GPT-4V enables computers to see and provide descriptions of images, both in the real and digital world, making a vast range of applications accessible that were previously not possible.

  • What is a common issue with media on apps and websites for visually impaired users?

    -A common issue is the lack of meaningful alt text for images, making them inaccessible to visually impaired users who rely on screen readers.

  • How does Be My AI assist with inaccessible images on websites?

    -Be My AI provides thorough descriptions of any image encountered online or in an app, making content with missing or insufficient alt text accessible to visually impaired users.

  • What was the reaction of one of the beta testers, Caroline, to the Be My AI feature?

    -Caroline, a beta tester, significantly increased her use of the service, going from making about two calls a year to completing more than 700 image descriptions.

  • How has the Be My AI feature impacted user satisfaction according to the feedback received?

    -The feedback has been overwhelmingly positive, with satisfaction ratings over 95% when excluding downtime and system errors.

  • What is one example of how Be My AI has been integrated into enterprise customer support?

    -Be My AI has been deployed into Microsoft's Disability Answer Desks, where users can start with a chatbot instead of a call, with 9 out of 10 users not escalating to a call.

  • What future improvements in accessibility do AI models that can see and hear hold?

    -AI models that can see and hear, and understand human speech, are believed to profoundly improve accessibility and assistive technologies in the future.

Outlines

00:00

🌟 Introduction to Be My AI and Its Impact

This paragraph introduces Jesper from Be My Eyes, a platform that connects blind and low-vision individuals with sighted volunteers via video calls for visual assistance. A new feature, Be My AI on GPT-4V, has been launched in partnership with OpenAI. The platform's mission is to provide visual assistance to those with limited vision, and it has grown to support over half a million users with the help of more than 7 million volunteers. The introduction of the AI assistant aims to offer users an alternative to human assistance, ensuring they can maintain their independence and not feel like a burden. The GPT-4 vision capability allows the AI to understand and describe images, which can be incredibly useful in both the physical and digital worlds. The summary highlights the potential of this technology to transform the lives of users by providing detailed descriptions of images, including those in group chats, websites, and other digital media. The introduction of Be My AI has been met with overwhelmingly positive feedback, and the platform continues to innovate and improve accessibility for its users.

05:01

📸 Daily Life Applications of Be My AI

This paragraph showcases the practical applications of Be My AI in the daily lives of blind and low-vision individuals. It follows Lucy Edwards, a beta tester, as she demonstrates how the technology assists her in various tasks such as cooking, doing laundry, and identifying items during a meal. The detailed descriptions provided by Be My AI enable users to understand their surroundings and ensure they are performing tasks correctly, such as checking for eggshells in eggs or reading expiration dates on products. The feature also helps users navigate social media by describing the content of images, thus making it more inclusive. The paragraph emphasizes the positive impact Be My AI has had on users' lives, as it provides a level of independence and convenience that was previously challenging to achieve. The success of the beta testing and the enthusiastic user feedback underscore the transformative potential of this technology in enhancing accessibility and quality of life for the visually impaired community.

Mindmap

Keywords

💡Be My Eyes

Be My Eyes is an innovative app designed to assist blind and low-vision individuals by connecting them with sighted volunteers through video calls. This community-driven platform has over half a million users and more than 7 million volunteers, aiming to enhance independence and accessibility. In the context of the video, Be My Eyes has introduced a new feature called Be My AI on GPT-4V, powered by a partnership with OpenAI, to provide users with an AI-powered visual assistance option, further expanding the range of support available to them.

💡GPT-4V

GPT-4V is an advanced AI model developed by OpenAI, which stands for Generative Pre-trained Transformer version 4 with vision capabilities. This model represents a significant leap in AI technology, as it is capable of understanding and interpreting visual data, thus enabling computers to 'see'. In the video, GPT-4V is integrated into the Be My Eyes app to offer a 24/7 AI assistant that can describe images and assist with navigation, both in the physical and digital worlds. This integration has opened up a myriad of possibilities for users with visual impairments, allowing them to gain insights into images and media that were previously inaccessible.

💡Visual Assistance

Visual assistance refers to the support provided to individuals with visual impairments to help them navigate and understand the world around them. In the video, this concept is central to the mission of Be My Eyes, which initially offered human volunteers to provide real-time visual assistance through video calls. The introduction of GPT-4V has expanded visual assistance to include AI-generated descriptions of images and environments, thus providing users with an additional layer of independence and accessibility. This technology allows users to receive detailed descriptions of their surroundings, objects, and even digital content like images and graphs, enhancing their understanding and engagement with the world.

💡Independence

Independence in the context of the video refers to the ability of blind and low-vision individuals to perform daily tasks and engage with the world without reliance on others. The Be My Eyes app, with its human volunteers and the new AI feature, aims to promote this independence by offering immediate and on-demand visual assistance. The AI component, in particular, provides a constant and reliable source of support, allowing users to maintain their privacy and avoid situations where they might feel like a burden. This empowerment is evident in the video through the experiences of users like Caroline and Lucy, who have leveraged the technology to enhance their daily lives and maintain their autonomy.

💡Accessibility

Accessibility is the concept of ensuring that technology, services, and content are usable by people with a wide range of abilities, including those with visual impairments. In the video, the development and integration of GPT-4V into the Be My Eyes app represent a significant advancement in accessibility. By providing AI-generated descriptions of images and media, the app breaks down barriers that previously prevented blind and low-vision users from fully engaging with digital content. This increased accessibility not only improves the user experience but also promotes inclusivity and equal participation in the digital world.

💡Alt Text

Alt text, short for alternative text, is a description of an image that is used to convey the same information to users who are unable to see the image, typically through screen readers. In the video, the lack of meaningful alt text on many apps and websites is highlighted as a significant barrier for blind and low-vision users. The Be My Eyes app, with its AI-powered feature, addresses this issue by providing thorough descriptions of any image encountered online or in an app, thus making the content more accessible and understandable to users who are visually impaired. This functionality is crucial in bridging the gap between the visual and digital worlds for those with limited vision.

💡Digital Use Cases

Digital use cases refer to the various ways in which technology can be applied to solve problems or meet needs in the digital realm. In the context of the video, GPT-4V's ability to 'see' and interpret visual data opens up a multitude of digital use cases. These range from providing descriptions of images and media to assisting with navigation and understanding complex data presentations, such as graphs. The video emphasizes the transformative impact of these digital use cases on the lives of blind and low-vision individuals, who can now interact with digital content in a more meaningful and comprehensive way.

💡Human-like Responses

Human-like responses are interactions from AI that mimic the natural conversational style and empathy of a human being. In the video, it is mentioned that GPT-V's descriptions are not only accurate but also exhibit a level of wit and human-like qualities in its responses. This characteristic is crucial in making the interaction with the AI more engaging and relatable for users. It helps to create a more comfortable and intuitive experience, as users can perceive the AI as a supportive and understanding entity, rather than a cold, impersonal machine.

💡Beta Testing

Beta testing is the phase of software development where the application is tested by end-users to identify and fix any bugs or issues before its official release. In the video, the Be My AI feature underwent beta testing with a group of users, including Caroline and Lucy Edwards, who provided valuable feedback on its functionality and usability. This testing phase was instrumental in refining the AI's performance and ensuring that it met the needs of the visually impaired community, ultimately leading to a more effective and satisfying user experience.

💡Language Support

Language support refers to the ability of a software or application to function in multiple languages, catering to a diverse user base. In the video, it is mentioned that the Be My Eyes app, with the integration of GPT-4V, now supports up to 36 languages. This broad language support is significant as it allows users from different linguistic backgrounds to access the AI's visual assistance capabilities in their native language, thereby enhancing the app's inclusivity and accessibility on a global scale.

💡Enterprise Customer Support

Enterprise customer support refers to the assistance provided to businesses and organizations in managing their customer service operations. In the video, it is highlighted that Be My AI has been deployed into the enterprise customer support product, specifically mentioning Microsoft's Disability Answer Desks. This integration allows users to initially interact with a chatbot, which can answer their queries effectively, reducing the need for a human call and improving the overall efficiency of the support process. This demonstrates the potential of AI in transforming traditional customer support models and enhancing the user experience for individuals with disabilities.

Highlights

Be My Eyes launched a new feature, Be My AI on GPT-4V, in partnership with OpenAI.

Be My Eyes aims to provide visual assistance to blind and low-vision individuals through a community of volunteers.

The app has over half a million blind and low-vision users supported by more than 7 million volunteers.

Volunteers assist users through video calls, but the new AI feature offers an alternative for users seeking independence.

Users expressed a desire for an AI assistant available 24/7, prompting the development of Be My AI.

GPT-4 vision enables computers to 'see' and interact with the physical world, opening up a million applications.

Be My AI can describe images, including those on websites and in apps, making content more accessible.

The AI can provide thorough descriptions of images, such as photos in group chats or inaccessible images on websites.

GPT-4 Vision is not only accurate but also displays wit and human-like responses.

Beta tester Caroline increased her usage from two calls a year to over 700 image descriptions with Be My AI.

Lucy Edwards, another beta tester, demonstrated how Be My AI assists in daily life in a video.

Be My AI helps users with tasks like checking eggs for shells, reading expiry dates, and identifying laundry colors.

The AI can analyze and describe the contents of a meal, enhancing the dining experience for users.

Users can have Be My AI describe the scent of a product, such as perfume, by analyzing an image of the packaging.

Be My AI can assist with social media, describing uncaptioned photos so users don't miss out.

Feedback from beta testers has been overwhelmingly positive, with high satisfaction rates reported.

The feature was rolled out to a small Beta group in March and later to all iOS users, resulting in a million image descriptions monthly.

Support for 36 languages was added by simply instructing the model to respond in the user's language.

Be My AI has been integrated into enterprise customer support products, such as Microsoft's Disability Answer Desks.

9 out of 10 users who start with a chatbot do not escalate to a call, showcasing the effectiveness of Be My AI.