# Exploring the New Capabilities of ChatGPT Vision: A Game Changer
Written on
Chapter 1: Introduction to ChatGPT Vision
This week, OpenAI unveiled a remarkable enhancement to its widely used ChatGPT platform: the ability to analyze and engage with images. This development is monumental, as approximately 70% of online content is visual in nature. With the proliferation of platforms like YouTube, TikTok, and Instagram, users are increasingly inclined to interact with technology through visuals.
ChatGPT Vision is being gradually rolled out to selected ChatGPT Plus users. I was fortunate enough to gain early access. Below is an overview of the impressive capabilities of this innovative AI feature.
Chapter 2: Engaging Visually with ChatGPT
Using ChatGPT Vision is straightforward. Users with access can find a small image icon next to the message input area within the ChatGPT interface. This allows for the uploading of images alongside text prompts, enabling users to inquire about images or engage in discussions concerning the visual elements.
Section 2.1: Crafting Alt Text
One of my first experiments with ChatGPT Vision involved generating alt text for an image. Alt text is essential for individuals with visual impairments who rely on screen readers while browsing the internet. It describes an image's content, allowing those who cannot view it to grasp its significance.
Creating effective alt text can be challenging and often costly, leading many website owners to neglect this important aspect. To assess ChatGPT Vision's capabilities, I uploaded an image and requested it to produce high-quality alt text.
The result was impressive: “A wall-mounted dispensing system labeled ‘SIGNET’ featuring three translucent containers filled with cleaning solutions. From left to right, the containers are labeled: ‘Hard Surface Sanitizer’ in red, ‘Glass & Multi-Surface Cleaner’ in blue, and ‘Heavy Duty Non-Chlor Washroom Disinfectant’ in green. Above the dispensers are instructional stickers with diagrams indicating proper use. Each container has a measurement scale on its side and is connected to a tubing system for distribution.”
This detailed description not only covered the basics but also provided specific visual elements, allowing visually impaired users to understand the image's content accurately. Remarkably, ChatGPT Vision generated this alt text in about 10 seconds, making it a significant resource for enhancing web accessibility.
Section 2.2: Analyzing Everyday Items
For my next test, I uploaded a photo of my spice rack and asked ChatGPT Vision to identify the spices present and which ones were low on stock.
ChatGPT Vision not only listed the spices but also recognized their brand names:
- Sesame Seeds (Quantity appears good)
- Morton Sea Salt (Cannot determine quantity from the image)
- 365 Ground Ginger (Quantity appears good)
- 365 Cinnamon (Quantity appears good)
- McCormick Chili Powder (Quantity appears good)
- McCormick Coarse Ground Black Pepper (Cannot determine quantity from the image)
- 365 Chili Powder (Quantity appears good)
- McCormick Perfect Pinch Italian (Cannot determine quantity from the image)
- Ground Turmeric (Cannot determine quantity from the image)
- 365 Ground White Pepper (Quantity appears good)
- Ground Cumin (Cannot determine quantity from the image)
Although it struggled with precise quantities, it went a step further by suggesting recipes I could prepare with the spices available.
This illustrates the power of ChatGPT's visual capabilities. It allows for ongoing dialogue about the image, enabling users to ask follow-up questions that expand on the initial responses.
Chapter 3: Expanding the Scope of Analysis
ChatGPT Vision is not limited to simple images; it can also interpret graphs, infographics, and other visual data. For instance, I shared a graph from Google Search Console that tracked traffic to one of my websites.
From this graph, it accurately extracted key metrics and analyzed traffic trends.
It even offered suggestions for improving my SEO based on the visual data provided. This ability to analyze various types of visuals will be invaluable for users seeking to understand complex data or verify research results.
Section 3.1: Identifying Film Types
As a photographer who often works with analog film, I was curious to see if ChatGPT Vision could identify the film type used for one of my photos.
The result was accurate. It identified that the photo was taken on Kodak Tri-X at 400 speed, showcasing the depth of its analytical capabilities.
Section 3.2: Recognizing Dog Breeds
I also uploaded an image of my Bichon Frise, Lance, to see if ChatGPT Vision could determine his breed.
The response stated, “Based on the image provided, your dog appears to have the physical characteristics of a curly-coated breed, potentially a Bichon Frise, Poodle, or a mix of such breeds.” While it hedged its answer by including “Poodle,” which may not sit well with Bichon enthusiasts, it did correctly identify him as a Bichon Frise, emphasizing its ability to analyze features beyond surface-level identification.
Chapter 4: Limitations and Future Potential
While ChatGPT Vision is a powerful tool, it does have limitations. The system is designed to avoid processing images of people, a wise decision given the potential biases inherent in visual AI. Additionally, it refrains from providing medical advice, ensuring that users are not misled by potentially inaccurate information.
In the future, specialized versions of ChatGPT Vision could be developed for specific applications, such as interpreting X-ray results or analyzing other medical images. For now, maintaining safeguards around sensitive topics is crucial.
Chapter 5: The Future of Multimodal AI
The launch of ChatGPT Vision signifies a pivotal moment in the generative AI landscape. This is one of the first instances where a truly multimodal AI is accessible to everyday users. While Google Bard offers some visual capabilities, it heavily relies on existing systems like Google Lens. In contrast, ChatGPT Vision appears to possess a deeper understanding of visual content and its implications.
I look forward to continuing my experiments with ChatGPT Vision, as I have explored numerous ChatGPT prompts over the past year. As a full-time creator, I have compiled some of my most useful prompts into a free guide titled "7 Enormously Useful ChatGPT Prompts For Creators." Feel free to download a copy today!