There are a lot of impressive videos on youtube showing hand-tracking when you throw in a moving background, varying lighting and a video stream that 99.999% of the time wont contain the object you want to track (but may contain 100000s of similar objects), well things don't tend to work as well as we would like.
The last time I looked at hand tracking was probably 5 years ago and I wasn't too impressed by the reliability or robustness of the various techniques on offer and I cant say I'm any more impressed today. I recently had a quick look at various rapid approaches - touchless works but is marker based, TLD works but loses the target in long video and ends up learning a different target (but might be applicable if some form of re-initialisation was employed) and HandVu (which I had high hopes for) was too lighting sensitive. As said, these were quick looks and I will revisit TLD at least in the near future.
|MIT budget "data-glove"|
While I don't want to use fiducial markers, when MIT are proposing the use of gloves that even my wife would be ashamed to wear (and she loves colourful things) in order to improved recognition accuracy, well, one has to realise that we just haven't solved this problem yet.
Just how bad is it though? Well there have been multiple studies [cite] investigating skin detection and proposing values for HSV skin segmentation, and the theory behind a lot of this work looks solid (e.g. HS values for skin in a narrow range due to skin pigmentation being the result of blood colour and the amount of melanin [cite]), but throwing my training samples at them (incredibly cluttered background, variable lighting conditions) produces far too many false positives and false negatives to be of any practical value. Looking at the underlying RGB and HSV values also suggest that this approach is going to be of little practical application in "everyday" scenarios hence I'll be moving onto fiducial markers for today.