One Thing Holding AI Back

At least until it doesn’t

Today’s business headlines are full of “AI has much promise, but business leaders are slow to adopt” style stories.  They go on to cite security issues, lack of understanding, and a shortage of talent.  These are all true factors, but there’s something else that no one talks about: AI’s inability to deal with images.

This may come as a surprise to many of you.  You’ve seen AI generate art, plug into self-driving cars, and recognize the name of a certain flower in a picture.  That belies the actual state of the art in the relationship between AI and images.  Let me explain.

A separate technology known as Computer Vision (CV) is largely responsible for the advancement of image recognition and analysis and has undergone a similar if parallel advancement in the last 5 years.  Yet CV is not AI.  Most every AI platform uses text (language) as its fundamental building block.  It is actually incredibly difficult to get AI alone to process a diagram or recognize an object in an image.  If you’ve seen claims of AI doing this, it usually uses CV to break an image into text metadata first, then processes the resulting information, tricking you into thinking that AI “understands” images.

My point is: images are not a “first class citizen” in the world of AI.  Only text holds that position at the moment.  That’s a shame.  In many ways when we as humans are trying to describe some complex thing to each other, we “go to the whiteboard” to make a diagram in order to convey our ideas efficiently.  Why can’t we teach an AI using diagrams?  So much of our content is in this form, and AI is simply unprepared to use it directly without a lot of other technological gymnastics to make it so.

I firmly believe that the research community is working on this as we speak.  One day we will be able to hand AI a schematic of a machine or a road network or a building blueprint and have it completely understand the domain we want to study.  Similarly, we will be able to ask it to generate diagrams, images, or even videos from a small collection of data about a system we wish to improve, like a business.  Until then, AI will “fake” the digestion of images until the technology recognizes that there are all kinds of media that make up our collective intelligence about the world around us.  When that happens, the advancements that seem to amaze us today will seem like primitive child’s play.  Stay tuned.