Sunday, 21 August 2011

Enabling External Screen

Another slow week ~ parenting, employment, school holidays and academia do not mix well.

I have the hardware up and running how I want now. My initial build with the projector on the right and camera on the left had 2 problems. Firstly, as previously noted, its preferable to have the camera on the side of the users dominant hand since this will be used the most. Secondly...the PK301 has a small fan for cooling and having this on the right meant that the exhaust heat was blowing onto my neck and was hot enough to be very uncomfortable. So today I've switched things around, tied things off, added extra washers (oh, the technology!) for stability and, apart from the need for a more stable joint to hold the projector in a position so that it is level with the users view, I am reasonably satisfied with my budget implementation. Not $350...about £350 but then I could have bought a cheaper projector (and may well pick up a ShowWX for testing) and come under budget. I still need to add a reflector to allow for quick repositioning of the output, but getting close.

However I am rambling, I really wanted to just make a note of what I had to do to enable my laptops lid to be closed and still have the external screen function :

gconftool-2 --type string --set /apps/gnome-power-manager/buttons/lid_battery "nothing"

gconftool-2 --type string --set /apps/gnome-power-manager/buttons/lid_ac "nothing"

This is rather hackish since the laptop screen stays on when closed but I couldn't find any other way to do this - I'm reasonably sure I don't have this issue in windows - so if anyone can suggest an alternative.....

Friday, 12 August 2011

Fixing "Corrupt JPEG data: n extraneous bytes before marker" error

jdmarker.c lives in the OpenCV 3rdparty/libjpeg directory, edit out the appropriate line (search for "EXTRANEOUS"), make and install....happy days.

Edit: Hmmmm actually the above doesn't work since OpenCV is linking the installed library rather than the 3rdParty version and I can't seem to convince CMake to use the local one...ahhh well one for another day since redirecting standard err to /dev/null works just as well.

Sunday, 7 August 2011

Hardware Design

I finally took the plunge and did the epoxy wields on the prototype rig using some Bondloc titanium epoxy and it works really well, setting hard in just a few minutes and rock solid after 15. While the rig is incredibly primitive it does allow me to shoulder mount both the projector and camera and do so so that they are stable. I just need to make a final decision re placement (left vs right for camera/projector) ~ my initial take was to place the projector on the right so as to be in line with my dominant eye, but I think of more importance/potential is to have a better correlation between the forward facing camera and the dominant hand for pointing...which should also reduce occlusion of the projection.

I'm slowly uncovering papers in this area and found another one today Designing a Miniature Wearable Visual Robot which details the design rationale behind a robotised wearable camera. Mayol et al (2002) use a 3D human model to examine different frames of reference and requirements for the device identifying 3 frames (the wearer's body and active task, alignment to static surroundings, wearers position relative to independent objects) . They also identify 2 requirements, decoupling of the wearers motion from the motion of the sensor and the provision of a wide Field of View. Since we are dealing with a static rather than motorised sensor, it is only the first frame that is of particular relevance however it is interesting to note how a robotised system would enable these different frames.

They also note that, given the proximity of the device to other humans, that :

"a sensor able to indicate where it is looking (and hence where it is not looking) is more socially acceptable than using or wearing wholly passive sensors" (P1)

This is a very interesting point since the social acceptance of a wearable system is a major factor influencing the usability of "always on" wearable systems.

They go on to examine the 3 factors used in their analysis of the most optimal location to wear the robot, detailing FOV, user motion and view of the "handling space" which they define and stress the importance of via the following statement:

"The area immediately in front of the chest is the region in which the majority of manipulation occurs, based on data from biomechanical analysis" (P2 cites [2])

Of final relevance to us is there discussion of their results from fusing these criteria. The forehead is identified as the most optimal position but discounted due to the "importance of decoupling the sensors attention from the user's attention" and alternate positions are considered. Their analysis concludes that if maximal FOV and minimal motion are the most important factors that the shoulder is the optimal alternative.

Phew. And I want one.

Mayol's Robot [1]

 Along with the papers I've read on projector positioning it seems that shoulder mounting wins for both projector and camera ~ happy happy joy joy!

[1]W. Mayol, B. Tordoff, and D. Murray. Designing a miniature wearable visual robot. In IEEE Int. Conf. on Robotics and Automation, Washington DC, USA, 2002.

[2] W.S. Marras, in G. Salvendy, Handbook of Human factors and Ergonomics Sec. Ed., chapter Biomechanics of The Human Body, John Willey, 1997.

Machine Learning & OpenCV

Slow progress. I've spent the last couple of days generally looking around at Neural Networks and OpenCVs machine learning code. My aim was to find some relatively simple to implement examples of NNs and form a basic impression of the applicability of the techniques to the task of skin and marker recognition in a video stream.

Now I should point out that my understanding of most ML techniques and NNs is rudimentary at best and, at this point, that's fine since I am looking for practical techniques that can be employed at a rapid prototyping stage (hence the cost of experimenting with a proposed solution to a given problem should measure in the hours rather than days); so I am not going to go into too much depth on the AI part of things here.

What I do need to do though is define the methodology and results of the experiments I've been running so far since a write-up at this stage will form a useful appendix to my thesis and demonstrate the design process and rationale behind the "vision" system of the WGI.

Briefly though my method has been to construct a small application that allows the generation of "samples" of an video, where mouse clicks on a frame of the video stream creates an N*N sized image surrounding the centre of the mouse click, which allows the generation of a set of images which represent objects of a given type (e.g. skin, not skin (!), blue marker, red marker etc).++ A separate application is then used to create a dataset suitable for training the ML technique. This means that an image is transformed into an array of either BGR or HSV values and saved to a file. I then concatenate and randomise the files into one single dataset with a label to identify the sample so end up with something along the lines of :

SN, 1.0, 2.0, 6.0, 1.0, 2.0, 5.0, 1.0, 5.0, 3.0 
NS, 6.0, 6.0, 6.0, 1.0, 6.0, 5.0, 1.0, 6.0, 3.0 
BM, 2.0, 2.0, 6.0, 1.0, 6.0, 5.0, 4.0, 6.0, 3.0 

for a 3x3 sample. I should note that I need to add an additional step into the above to verify the data set (via visual inspection) to ensure that "mis-clicks" are not adding in erroneous samples.

I then run this through a modified version of  the OpenCV code for letter_recog (both the C and python versions - which is interesting in its right since the same techniques give differing results [implementation differences?]) in order to obtain a some basic data on the validity of the technique and yesterday finally managed to write some code to visualise the results on a live or recorded stream.

The above needs formalising, and I still have to do some work in that the Boost code is set for too many classes and falls over if I reduce the number < 10; there's no python version of mlp or nbayes and I also need to actually look at NNs properly.

And of course I need to formalise the results. My early impression is that the ML techniques are promising but too slow, however this is quite possibly an implementation issue (python boost takes minutes vs seconds for the C++ version). I also need to look in more detail at differing sample dimensions and sizes, the effects of differing colour spaces since I've only tested with HSV so far and would like to see the results with grey, BGR, HS only etc....currently too many FP and NFPs e.g.
rtrees 5x5

knearest 5x5

The above is a typical "noisy" frame (PS3eye this time) and the chair and table in the image are coloured very similar to areas of skin hence all the noise. The larger areas of skin that have not been correctly classified are overexposed and I have made a deliberate choice not to classify these areas. That of course is primarily a hardware issue displaying the importance of the camera system colour accuracy in variable lighting conditions.

Oh and of course, all of this is a side project and NOT the main focus of my thesis...all very interesting though :)

Wednesday, 3 August 2011

handwriting Recognition

More of a note post (I've been busy with work/children this week). I noted that one omission in the demonstrations of SixthSense was any form of text input. I want to address this by the provision of an "AirWriting" interface but had been struggling to find any form of decent hand-writting recognition software for Linux. However, Bret Comstock Waldow (now theres a name for the 21st Century!) has come to the rescue with ink2text and SHIP, which, as far as I can tell, runs the MS tablet hand-writing recognition DLLs under Wine and provides access to the routines via a service. Brilliant!

I am getting quite excited at what may be possible by gluing these bits together!

As a side note I also acquired a PS3eye and hacked it apart last night and was please to find support in Ubuntu 10.10 for it out of the box (so to speak). Colours seemed somewhat subdued in comparison to the c910 however the effects of the superior frame rate are obvious.

Roll on weekend!

Monday, 1 August 2011


Playing with markers today yielded some interesting results ~ I'm using a rather brute force approach in opencv by using InRangeS against an HSV image, and have pulled the ranges via an app to obtain a sample of representative pixels. Quick and dirty, but I'm encouraged by the results. False positives are high and its not robust in terms of working across lighting conditions but in-part the problem is the choice of material for the markers (electrical masking tape) which suggests :

CVFB-R5: Markers must be composed of a material which is minimally reflective.

I need to define "minimally" with more precision obviously but I think the above would improve the detection in varied lighting conditions and possibly reduce false positives if I resample. And yes, I know this should have been obvious but I have to harken back to my comments about a lack of design rationale...

I'd be reasonable confident that with a few improvements that this would roughly match the functionality of the SixthSense system so anything over and above this is an improvement.

Next steps are to look at cvblob and then move onto Neural Networks

Some notes : Popović wang ~ ~ MIT data glove.

kakuman A survey of skin-color modeling and detection methods