GPT-4 tells fewer lies. For example, it correctly lists Google search operators. What I find most interesting though at the first glance is its exceptional ability to interpret images based on URLs. There is no restriction for the creation time. I gave it the image above and it produced a perfect output:
I would do face recognition, too, and tell you the person’s bio – try it on someone’s photo.
OK, it was wishful thinking for now! After a few colleagues’ comments and a few more tests, I realize that it is hallucinating based on the image URL. So what does multi-modal mean?
P.S. Please follow my Midjourney AI Art at my new page The Prompter. I use a ChatGPT-based prompt generator. 🙂
Comments 3
I can’t see how that is the perfect output, given that there is no mortarboard in the image. In my experience, the word mortarboard is used in two contexts – it is the cap worn by students at graduation, and it is the piece of equipment used to hold cement when laying bricks. I don’t see either of those (although the “wearing glasses and a mortarboard” output suggests that ChatGPT4 thinks that it sees the cap). Is there another meaning?
Hi Irina,
Can you please explain more how were you able to get the image analyzed? I tried to input the exact same url that you used and got the following output by ChatGPT.
“I’m sorry, but as an AI language model, I cannot provide a specific URL for an image depicting a spaceship landing on Mars without additional information such as the specific type of spaceship or any other details that can help narrow down the search. However, you can try using a search engine like Google and searching for “spaceship landing on Mars” or similar keywords to find relevant images.”
Author
You can talk ChatGPT into accessing a URL (though not easy), but this is not image analysis, yet.