The merger between ChatGPT and GPT-Vision
Table of Contents
The joint launch of ChatGPT and GPT-Vision marks a major breakthrough in the field of artificial intelligence. These two technologies, which combine natural language processing and computer vision, open up numerous possibilities for innovative applications. Discover how they are transforming the way we interact with visual and textual data.
Captivating applications
The synergy between ChatGPT and GPT-Vision unlocks new features. Here are some examples illustrating the diversity of possible applications:
- Modeling from an image
A simple image can be transformed into an impressive 3D model, as shown in this example:
ChatGPT Vision starting to write Gcode (for a Haas) from prints pic.twitter.com/IgXeMEAS8e
— Aaron Slodov (@aphysicist) October 10, 2023
- Personalized strength training program according to your equipment
Thanks to ChatGPT Vision, it is possible to obtain a tailor-made strength training program based on the equipment you have, as shown in this example:
ChatGPT Vision turned a picture of my home gym equipment into a full 8-week workout program.
This is better than 99% of any programs I’ve ever bought. pic.twitter.com/ToACYgzTyf
— Rowan Cheung (@rowancheung) October 11, 2023
- Analysis and decoding of blurred documents
In-depth analysis of a blurred document allows its secrets to be revealed, as demonstrated by this example:
ChatGPT-4V Multimodal decodes a Redacted government document on a UFO sighting released by NASA.
I have tested this on 100s of redacted documents and I can say we are in a new world. pic.twitter.com/aCKOm577TO
— Brian Roemmele (@BrianRoemmele) October 6, 2023
- Converting photos to text for a complex letter
Using this technology, a letter image can be transformed into editable text, as shown in this example:
???? ChatGPT Vision is fk’in nuts lol pic.twitter.com/Ccsl7tFgkD
– to fart! ???? (@pwang_szn) October 4, 2023
- Retrieving complex objects in an image
The technology makes it possible to identify and recover complex objects in an image, as shown in this example:
Power of ChatGPT vision capability ???? pic.twitter.com/cr1izVP9df
— Kashan Ahmed???????????? (@KashanAhmed) October 6, 2023
- Detection of images from Google Street View or satellites
This demonstration shows the accuracy of detecting images from Google Street View or satellites:
ChatGPT Vision pic.twitter.com/X619nlCdBW
— Anu Aakash (@anukaakash) October 11, 2023
- Detailed analysis of an x-ray
Thanks to ChatGPT, it is possible to obtain a detailed analysis of an x-ray in a few seconds:
ChatGPT: The doctor in your pocket ????
ChatGPT can now look at X-rays, prescriptions, or medical reports and answer any question in a matter of seconds.
Future of health talk – simple, snappy, and AI! pic.twitter.com/nXgEfEvEsn
— Shubham Saboo (@Saboo_Shubham_) October 6, 2023
- Complex image analysis
Dive into the analysis of a highly complex image:
ChatGPT-4V Multimodal please decode this.
Thank you. pic.twitter.com/seOuma96QO
— Brian Roemmele (@BrianRoemmele) October 2, 2023
- Creation of scenarios from the analysis of several images
Four separate images can be used to create a cohesive storyline, as shown in this example:
I gave GPT-4V four “movie stills” I generated with Midjourney and asked it to construct a plotline tying them together.
A good example of how AI is more “creative” and surprising when given constraints, much like humans. Its not as creative as the best people, but interesting. pic.twitter.com/tzYJmMChsn
— Ethan Mollick (@emollick) October 2, 2023
- Analysis of a car engine
A careful analysis of a car engine is possible, but it is recommended to consult a professional:
6. Car Maintenance
Prompt: “Analyze the issue shown in this car photo, explain likely causes, and provide actionable DIY repairs or professional servicing recommendations.” pic.twitter.com/mSfUTp0j5n
— Bryan Marley (@_bryanmarley) October 9, 2023
The technology can also be used to optimize code, as this example shows:
8. Code Optimization
Prompt: “Analyze this code and suggest ways to improve performance, efficiency, conciseness, and adherence to best practices.” pic.twitter.com/4leeDoVf53
— Bryan Marley (@_bryanmarley) October 9, 2023
Limitations to consider
Despite the progress made, certain limitations persist. It is important to note that reading QR Codes and sharing conversations is not yet possible.
If you don’t see the new features, try refreshing the page or logging out/login again. If the problem persists, you can try clearing the cache related to openai.com.
Here is a screenshot of one of the user interfaces for these new features:
GPT-Vision video
I would like to credit Emile Dev’s YouTube channel, which inspired this article. Here is the presentation video: