GPT-4: Game-Changer for Image-to-Text?

GPT-4 tells fewer lies. For example, it correctly lists Google search operators. What I find most interesting though at the first glance is its exceptional ability to interpret images based on URLs. There is no restriction for the creation time. I gave it the image above and it produced a perfect output:

I would do face recognition, too, and tell you the person’s bio – try it on someone’s photo.

OK, it was wishful thinking for now! After a few colleagues’ comments and a few more tests, I realize that it is hallucinating based on the image URL. So what does multi-modal mean?

P.S. Please follow my Midjourney AI Art at my new page The Prompter. I use a ChatGPT-based prompt generator. 🙂

Share this: