Computer Vision Explosion

We are about to see an explosion in the use of computer vision systems. If you thought Kinect was cool or you think Creepy Cameraman is scary, the technology right around the corner, and its impact on our lives will blow you away.

We’ve all dreamt of the day when natural user interface (NUI) systems were “real”. For example, in 1984 I built, as a high school project a system that allowed my school to do a mock Presidential election…by voting via speech. I wish I could find the specs on the voice recognition card I used for the Apple ][ (or even the code I wrote <sad face>), but suffice to say the promise was big, the results…not so much.

I sincerely believe (again?) that we are finally, really, truly, on the cusp of a NUI explosion. We’ve seen massive improvements in the real-world usage of touch (iPhone), voice (Siri), and computer vision (Kinect) the the last few years. I think this is just the beginning. 

There will be huge strides made in voice and touch based input, but in my view, the area where our world will be rocked the most is in computer vision. Cameras are everywhere. They are dirt cheap. They can see things we can’t. And as amazing as the tech in Kinect is at decoding all those signals, interpreting them, and figuring out your body’s intent is, you haven’t seen anything yet.

I had the chance to visit Israel in 2011. I met with several companies in the computer vision space and visited several of the top Israeli university research groups working on computer vision. I was under NDA so I can’t discuss details, but I’m sure you are aware that Israel has been leading the way in computer vision technology.

I found it amusing the Creepy Cameraman story and this story on a new Microsoft patent came across my feed at about the same time.  I also recently upgraded the CCTV system in my house from analog cameras circa 2002 to modern IP based digital cameras (I use a GeoVision based CCTV DVR system that is functional but very haphazardly implemented).

These modern cameras all record 1080p in real time with audio. The software I have is just OK, but is nowhere near state of the art.

Another example: sports cameras such as GoPro and Countour. Next time you are a bike event, out on the lake, or skiing notice how many people are wearing these cams. The quality is fantastic and they are getting dirt cheap.

Remember, that due to networks, we have the ability to combine camera inputs from multiple sources, meaning that future computer vision systems will not be integrated as Kinect is today.

Some scenarios where I see breakthroughs coming:

  • Detecting and tracking people’s emotional state. Imagine your TV being able to sense whether you are happy, scared, sad, or mad and adjusting the content to either amplify that state or change it. This could be used for good (making a game even more immersive) or bad (adjusting advertising).
  • Predicting intent. By understanding ‘normal’ behavior games, user interfaces, and other systems will be able to predict what you are going to do, before you do it.
  • Tele-presence. Kinect shows how easy (ha!) it currently is to allow a computer to, in real-time, build a 3D model of human bodies and do intelligent things (control a game). We also know its easy (ha!) to map photorealistic imagery on 3D models with Google/Bing/Apple Maps.  Combine these technologies and it’s not a stretch to see Princess Lea floating in front of R2D2.
  • Augmented Reality. The work Google is doing on Glasses is a great example. I can imagine combining my the three other examples above with not only a head mounted camera, but also a more direct input into the human vision system (a tiny monitor you wear like glasses is actually pretty lame; I’m much more excited about research going on regarding directly inserting imagery into the brain).

Most importantly, I think, is the impact these breakthroughs will have on mobile. I joke that I think “Mobile is Dead”. What I really mean is that I think mobile is now ubiquitous and everywhere and that it’s high time we stop thinking about it as some discrete ‘space’.

What do you think? What scenarios do you see coming? What are the risks to society and industry?


  1. Nat Brown says:

    for Apple II in ’84 it might have been the Lis’ner 1000 for speech-to-text? (I had a Sweet Talker II – it only did text-to-speech). for PC’s at the time Dialogic was doing speech-to-text and text-to-speech – I think these guys are still around and power lots of touch-tone phone systems.

    I completely agree that computer vision is on the cusp – another enabler is the incredibly intense console+ quality low-power GPUs in smart-phones – check out and try it in some mobile apps, it’s fun.

  2. I agree with you that NUI systems and computer vision systems are the future. What worries me is these systems might get too personal (I’m not sure I want my TV always monitoring me).

    I assume these systems will have lots of military applications (something like Google Glass).

  3. Dave lincoln says:

    I agree totally, though to use the Star Trek analogy, there will be the Enterprise (the traditional computer( ( (no pun intended) ) and the space shuttle (mobile).
    To my understanding, we are talking big data for many reasons.
    Let me explain.
    My background is chemistry and I worked at Kodak for over thirty years ending with technology development so that prints will last multi -generations.
    If it not for fade, there are disasters that can result in loss of your photographs , it’s widely known people run back into burning homes to save people and photographs.
    Interesting enough not many have had a disaster resulting in loss of digital photographs,
    So this problem is all important for computer vision and image understanding so that pictures can be organized and prioritized for safe keeping both in a analog sense and digital sense.
    Another issue is psychological in that digital overwhelms our brain, we took more pictures last year than the history of photography!!!

    This brings me to the final point, we need another Steve Jobs to capture these inventions in computer vision and image understanding for many applications, my interest is only one them.

  4. mtcoder says:

    the next step I see happening is better tracking of super small details. Thinks like the camera built into your monitor tracking your eye position and auto scrolling text when you get to the bottom of the screen. Or using combinations of blink patterns to control clicking etc. After that I see more natural integration, with interactive devices. Stumble into the bath room between 6am and 7am, and the shower says shall I warm up. You say sure, and it turns on sets it self to a comfortable temperature based on the rooms temperature, cooler during summer, warmer during winter. It knows the dad likes the temp an average of x degrees, and auto adjusts to that temperature.
    A long time ago some tv show did a show that was a tour of one of bill gates houses. The paintings on the wall, the wall color, and the room temperature all changed based on who walked into the room. They have systems being built to run with that type of technology, and they just need to get the vision and body sense technologies in place to manage it.

  5. Abram Abram says:

    I like how the writer organized his thoughts in addition to he visual part.

Debate this topic with me:

This site uses Akismet to reduce spam. Learn how your comment data is processed.