GPT-4 Vision Access in ChatGPT! Full Tour & Impressive Results!

MattVidPro AI
6 Oct 202321:33

TLDRIn this video, the host explores the new GPT-4 Vision feature in ChatGPT, demonstrating its ability to analyze and describe images. The host tests the feature with various images, including a detailed origami dog, a channel logo, and personal photos. GPT-4 Vision impressively identifies patterns, objects, and even humor in a meme. The video highlights the potential of GPT-4 Vision to enhance user experiences, offering creative and practical applications such as recipe suggestions based on fridge contents.


  • πŸš€ OpenAI's recent announcements include DALL-E 3, an advanced AI image generator, and the integration of voice and vision capabilities into ChatGPT.
  • πŸ” The vision feature allows users to upload images into ChatGPT for analysis and detailed descriptions.
  • πŸ“† The vision feature is gradually being rolled out to ChatGPT Plus subscribers over the next two weeks.
  • πŸ“± The default section of ChatGPT is the only tab that currently allows image uploading and usage of the GPT-4 vision model.
  • 🎨 The video demonstrates the impressive detailing in image descriptions, such as the accurate depiction of an origami dog image.
  • πŸ‘Ύ The AI's image recognition capabilities are showcased by its correct identification of a stylized lemon character mascot.
  • 🧐 ChatGPT's limitations include not being able to identify real people, even when provided with images of celebrities or the user.
  • πŸ”Ž The AI can analyze and describe visual attributes of people, such as clothing and facial expressions, without making subjective judgments.
  • πŸš— Practical applications of the vision feature are highlighted, including identifying a car's engine type from an image and providing meal ideas from fridge contents.
  • 🀝 A creative experiment is presented, where GPT-4 vision is used to improve DALL-E 3 generated images by providing feedback and refining prompts.
  • 🍲 The AI's ability to generate recipes from available ingredients and adjust for dietary preferences or restrictions is demonstrated.

Q & A

  • What is the main feature of GPT-4 Vision introduced in the video?

    -The main feature of GPT-4 Vision is the ability to upload images into ChatGPT and have it analyze and answer questions about them.

  • How does the GPT-4 Vision model differ from the default section in ChatGPT Plus?

    -The GPT-4 Vision model, accessible in the default section, allows users to upload photos and use the advanced vision capabilities, unlike other sections which may not support image analysis.

  • What was the result when the video creator uploaded an image of an origami dog?

    -The GPT-4 Vision model provided a detailed description of the image, correctly identifying it as a representation of a lion's head made from folded paper triangles, despite it being labeled as a dog.

  • How did the GPT-4 Vision model describe the channel's profile photo?

    -It described the profile photo as a stylized, animated character with a large yellow head, oversized white glasses or goggles, and a cheerful smile. It also noted the presence of a leaf shape and recognized it as a character or mascot.

  • What was the GPT-4 Vision model's response when asked to identify the person in the uploaded photo of the video creator?

    -The model refused to identify the person, stating that it is programmed not to recognize real people based on images, but it provided a description of the person's physical appearance.

  • How did the GPT-4 Vision model analyze the photo of the car engine?

    -It identified the car as a Volkswagen Golf GTI and suggested that the engine might be reminiscent of the Mark 6 or Mark 7 generations of the vehicle based on the image of the exposed engine with dual carburetors.

  • What was the result when the video creator asked the GPT-4 Vision model to suggest meals based on the items in a messy fridge?

    -The model provided several meal ideas utilizing the items in the fridge, such as pasta salad, stir fry, omelette, salad sandwich, and fruit mix.

  • How did the GPT-4 Vision model respond to the meme about ChatGPT?

    -It recognized the humor in the meme, noting the unexpected context, the meta joke about ChatGPT, and the visual comedy of the melting chocolate gorilla.

  • What was the goal for the image creation when combining GPT-4 Vision with Dolly 3?

    -The goal was to create a complex image of a band of cats in a school playing instruments, with the band name being a clever play on words.

  • What was the outcome of the collaboration between GPT-4 Vision and Dolly 3?

    -The collaboration resulted in several images of a cat band with diverse instruments and settings, showing improvement based on feedback from the GPT-4 Vision model, although it noted the challenge of handling many individual characters and their instruments.



πŸŽ₯ Introduction to AI Image Generators and Chat GPT's New Features

The paragraph introduces the audience to the Matt vidpro AI YouTube channel and welcomes new viewers. It discusses the recent announcements from Open AI, including the introduction of DALL-E 3, an advanced AI image generator, and the upcoming features of voice and vision for Chat GPT. The speaker shares their excitement about obtaining access to Chat GPT's new features, specifically the ability to upload and analyze images. The paragraph also mentions the limitations of the new features, such as the inability to upload images to certain sections, and the potential for using these features to create a feedback loop with Dolly 3.


πŸ” Testing Chat GPT's Image Recognition Capabilities

The speaker delves into testing Chat GPT's image recognition capabilities by uploading various images and evaluating the AI's responses. They describe the process of uploading an image of an origami dog and the AI's detailed description of the image, highlighting the AI's ability to understand complex patterns and structures. The speaker then uploads the channel's profile photo, which includes a lemon character with a VR headset, and notes the AI's accurate recognition of the character's features. The paragraph also includes a comparison with Google Bard's image recognition capabilities, emphasizing Chat GPT's more accurate and less hallucinatory responses.


🚫 Limitations and Restrictions on Identifying People

The paragraph discusses the limitations of Chat GPT when it comes to identifying people in images. The AI correctly identifies features and attire but refrains from identifying individuals, even when prompted with a photo of a famous person like Taylor Swift. The speaker explores the AI's restrictions, such as not storing or recognizing past images and not speculating on personal characteristics. The paragraph emphasizes the AI's ethical considerations in handling images of people.


πŸ“Έ Advanced Testing and Practical Applications

The speaker shares more advanced testing of Chat GPT's image recognition capabilities, including identifying car engines from photos and providing meal ideas based on the contents of a fridge. They also discuss the AI's ability to translate non-English text and suggest improvements to images. The paragraph highlights the practical applications of these features, such as assisting with meal planning and understanding car specifications, showcasing the AI's versatility and usefulness in everyday life.


🎨 Combining Chat GPT and Dolly 3 for Enhanced Image Generation

The speaker explores the potential of combining Chat GPT with Dolly 3 for enhanced image generation. They describe the process of creating a complex image of a school band composed of cats playing instruments and the AI's role in refining the image based on feedback. The paragraph details the iterative process of improving the image through multiple generations and the challenges faced in achieving diversity in the depiction of instruments and poses. The speaker concludes by reflecting on the potential of using Chat GPT to enhance Dolly 3's output and the need for a better strategy to maximize their combined capabilities.

🌟 Conclusion and Future Prospects

In the concluding paragraph, the speaker summarizes their experience with Chat GPT's new image recognition features and the potential they hold for practical applications. They reflect on the impressive capabilities of the AI, such as creating detailed recipes from fridge contents and enhancing image generations with feedback. The speaker also discusses the limitations encountered and suggests areas for improvement. They invite viewers to join the community and share their thoughts on the new features, ending the video on a positive note about the future of AI technology.



πŸ’‘GPT-4 Vision

GPT-4 Vision is a feature that allows users to upload images into ChatGPT, which then analyzes and provides detailed descriptions or answers questions about the content of the images. In the video, the host demonstrates this capability by uploading various images and discussing the impressive results and potential applications of GPT-4 Vision.


ChatGPT is an AI language model developed by OpenAI that is capable of generating human-like text based on the prompts given to it. In the context of the video, ChatGPT is used to not only engage in conversation but also to analyze images through the GPT-4 Vision feature, showcasing its versatility and advanced capabilities.

πŸ’‘AI Image Generator

An AI image generator is a type of artificial intelligence that can create visual content based on textual descriptions or other input data. In the video, DALL-E 3 is mentioned as an example of an AI image generator, which is praised for its ability to produce high-quality, creative images. The video also explores the integration of image generation with ChatGPT through GPT-4 Vision.

πŸ’‘DALL-E 3

DALL-E 3 is an AI image generation model developed by OpenAI. It is known for its ability to generate complex and detailed images from textual descriptions. In the video, the host discusses the capabilities of DALL-E 3 and its integration with ChatGPT, allowing for a more interactive and dynamic AI experience.

πŸ’‘Image Analysis

Image analysis refers to the process of examining and interpreting visual data, such as photos or videos, to extract meaningful information. In the video, GPT-4 Vision is demonstrated to perform image analysis by providing detailed descriptions of uploaded images, showcasing its ability to understand and interpret visual content effectively.

πŸ’‘Matt vidpro AI

Matt vidpro AI is the YouTube channel of the video's host, who creates content related to artificial intelligence, technology, and their applications. The channel is mentioned in the script as the platform where the host shares his experiences and insights about the latest AI advancements, including GPT-4 Vision.

πŸ’‘ChatGPT Plus Subscribers

ChatGPT Plus Subscribers refers to the users who have subscribed to the premium version of ChatGPT, which provides them with additional features and benefits, such as early access to new functionalities like GPT-4 Vision. In the video, the host mentions that GPT-4 Vision is being rolled out to these subscribers over a period of two weeks.

πŸ’‘Facial Recognition

Facial recognition is a technology that identifies or verifies the identity of a person by analyzing their facial features. In the context of the video, it is mentioned that GPT-4 Vision can provide general descriptions about visual attributes of people but is programmed not to perform facial recognition or identify real individuals, ensuring privacy and ethical use of the technology.

πŸ’‘Image Uploading

Image uploading is the process of transferring image files from a local device to a remote server or platform. In the video, the host demonstrates the image uploading feature in ChatGPT, which enables users to share visual content for analysis or discussion with the AI, enhancing the interactive capabilities of the platform.

πŸ’‘Digital Art

Digital art refers to artistic works created using digital technology, often involving software, digital tools, and electronic devices. In the video, the host discusses the creation of digital art using AI models like DALL-E 3 and the potential for AI to revolutionize the field of art and design through its ability to generate complex and detailed images.


A meme is a concept, behavior, or idea that spreads from person to person within a culture, often through imitation or viral content on the internet. In the video, the host tests GPT-4 Vision's ability to understand and interpret humor in memes, demonstrating the AI's advanced comprehension of cultural phenomena and its potential for engaging with users in a more relatable and entertaining way.


OpenAI's GPT-4 introduces vision access in ChatGPT, allowing users to upload images for analysis and detailed description.

The feature is being rolled out to ChatGPT Plus subscribers over the next two weeks.

GPT-4's vision model can analyze images and answer questions about them with impressive accuracy.

The AI can describe complex images, such as an origami dog, with remarkable detail.

ChatGPT's vision model can identify and describe logos and icons, like the channel's lemon character with VR goggles.

The AI maintains privacy by not identifying real people in uploaded images.

GPT-4 can analyze images of famous individuals but does not reveal their identity.

The AI can compare two images and identify similarities in visual attributes.

GPT-4 can recognize and describe specific objects, such as a car's engine, based on images.

The AI can translate non-English text from images, such as labels on food items.

ChatGPT can generate meal ideas and recipes based on the contents of a fridge.

GPT-4 can understand and comment on humor found in memes.

The AI can provide feedback to improve AI-generated images, such as enhancing a scene of a cat school band.

ChatGPT and Dolly 3 can work in tandem to iteratively improve AI-generated images.

The AI demonstrates the ability to understand and generate complex prompts for image creation.

GPT-4's vision model showcases the potential to assist in various practical applications, from art creation to meal planning.

The video provides a comprehensive tour of GPT-4's vision capabilities, demonstrating its impressive results in diverse tasks.