Skip to main content

The Patent

http://www.freepatentsonline.com/20100199232.pdf

I was somewhat surprised to come across a patent for Sixth Sense given the initial declaration that the code would be open sourced. I was even more surprised reading the contents of the patent at how general it is...oh hum lets not go there apart from to say I'm not a fan of broad patents, but I wanted to bring it up since 1) it is the only "detailed" source of information on the implementation of Sixth Sense and 2) its useful to acknowledge and recognised since I don't particularly want to be "trolled" in this research.

So yes, its out there and its worth a quick skim through (or not since a "clean room" implementation might be advisable but too late for me!) since it tells us that the source for Sixth Sense is largely based on several open source projects (see 0120). Touchless is used for fiducial recognition, the $1 Unistroke Recogniser algorithm for gesture commands, and ARToolkit for the output. OpenCV is mentioned and possibly does some of the heavy lifting for object recognition (possibly HMM?). I also just realised that the microphone is working in tandem with the camera when it used on paper, probably using some aspect of the sound on the paper to indicate contact with the destination surface since a single camera by itself is insufficient to determine when contact occurs.

So what do we do with this knowledge?

I've played with OpenCV and HandVu in the past and found them (for hand tracking at least) not that great since neither really solve the problem of reliable background segmentation in complex environments hence I can see the logic in using fiducials although a brief play (with touchless) suggests that even a fiducial based recognition system is unlikely to be perfect (at least in the case of a single unmodified webcam). This does lead to an important point for me in terms of requirements :

CVFB-R1: The computer vision system must be able to reliably determine fiducial positions in complex background images.
CVFB-R2: The computer vision system must be able to reliably determine fiducial positions in varied background images.
CVFB-R3: The computer vision system must be able to reliably determine fiducial positions with varying lighting conditions.
CVFB-R4: CVFB-R1 - CVFB-R3 must be met for 4 fiducial markers, each of a distinct colour.

and should it be possible to work without fiducial markers :

CVSB-R1: The computer vision system must be able to reliably determine hand shape in complex background images.
CVSB-R2: The computer vision system must be able to reliably determine hand shape in varied background images.

CVSB-R3: The computer vision system must be able to reliably determine hand shape with varying lighting conditions.
CVSB-R4:  The computer vision system must be able to reliably discriminate between left and right hands.


I rather suspect that I'm going to have to be flexible with tests/thresholds to determine if these requirements are met and it should also be noted that it has been recognised that no single based computer vision technique has been found to work for all applications or environments (Wach et al, 2011, p60) hence there may be some opportunity to improve on the generic libraries/algorithms which it would seem natural to apply (e.g. touchless, cvBlob)

Moving on, for those who haven't played with $1 Unistroke recogniser (Wobbrock, 2007) its impressive. I'd be reasonably confident based of the results of the tests for this algorithm in its reliability and robustness, IF the above requirements can be met.

Keeping to the KISS principle I'm going to use this as the basis of my first experiments (and code woo-hoo!) which are going to be :

1) Capture short (<5 minute) segments of video with a worn webcam (in my case I have a Logitech C910 handy, not the most discrete of cameras but sadly my Microsoft Life show broke grrrrr) in a variety of environments while wearing fiducial markers on 4 fingers.

2) Capture short (<5 minute) segments of video with a worn webcam in a variety of environments without markers.


3) Based on these exemplary videos test various recognition techniques from openCV to determine the optimal technique which meets the above requirements.

4) Apply and test sample gestures against $1 Unistroke Recogniser (Python implementation)
4.1) optional Determine if there are any differences in the performance/reliability of the Python versions.

Okay that's my week planned then, comments?

REFERENCES


Wachs, J, Kölsch, M, Stern, H & EDAN, Y 2011, ‘Vision-based hand-gesture applications’ in Communications of the ACM vol. 54, no. 2 p. 60-71

Wobbrock  et al 2007 http://depts.washington.edu/aimgroup/proj/dollar/

Comments

Popular posts from this blog

I know I should move on and start a new blog but I'm keeping this my temporary home. New project, massive overkill in website creation. I've a simple project to put up a four page website which was already somewhat over specified in being hosted on AWS and S3. This isn't quite ridiculous enough though so I am using puppet to manage an EC2 instance (it will eventually need some server side work) and making it available in multiple regions. That would almost have been enough but I'm currently working on being able to provision an instance either in AWS or Rackspace because...well...Amazon might totally go down one day! Yes, its over-the-top but I needed something simple to help me climb up the devops and cloud learning curve. So off the bat - puppet installation. I've an older 10.04 Ubuntu virtual server which has been somewhat under-taxed so I've set that up as a puppet master. First lesson - always use the latest version from a tarball unless you have kept t...

New Detector Done

Much better but I'm still not happy with it - camshift + backproj + kalman means that the marker coordinates are a lot smoother with far less noise (obviously) but the nature of detecting markers in segmented video still leads to a less than robust implementation. There's room for improvement and I still need to add in some form of input dialog for naming markers (and I must confess I am CLUELESS on the c++ side for that.....wxwidgets? Qt?) but I'm that little bit happier. As per usual I had hoped for a video, but the lack of a dialog makes configuring things into a manual process (I've got basic save/load support working but given how sensitive this is to lighting still its a lot of messing around) hence I'm delaying yet again. Given my page views though I don't think I will be disappointing many people. What is frustrating is the amount of time I've had to spend on basic work with computer vision rather than looking at the actual interactions for this ...

Finally...

Children back at school and I'm back off my hols (a rather interesting time in Estonia if you're interested). I've spent most of the last week becoming increasingly frustrated with my attempts at image segmentation. I've moved to a c++ implementation for speed and, while the VERY simplistic HSV segmentation technique I am using works, the problem is that I cannot get it to work robustly and doubt that it will ever do such. I've now covered the range of available techniques and even tried to plumb the depths of just emerging ones and it seems that every computer vision based object tracking implementation or algorithm suffers for the same issue with robustness (openTLD, camshift, touchless, hsv segmentation and cvBlob etc etc). YES, it can be made to work, but issues include (depending on the algorithm) : - Object drift : over time the target marker will cease to be recognised and other objects will become the target focus. - Multiple objects : During segments...