Skip to main content

Machine Learning & OpenCV

Slow progress. I've spent the last couple of days generally looking around at Neural Networks and OpenCVs machine learning code. My aim was to find some relatively simple to implement examples of NNs and form a basic impression of the applicability of the techniques to the task of skin and marker recognition in a video stream.

Now I should point out that my understanding of most ML techniques and NNs is rudimentary at best and, at this point, that's fine since I am looking for practical techniques that can be employed at a rapid prototyping stage (hence the cost of experimenting with a proposed solution to a given problem should measure in the hours rather than days); so I am not going to go into too much depth on the AI part of things here.

What I do need to do though is define the methodology and results of the experiments I've been running so far since a write-up at this stage will form a useful appendix to my thesis and demonstrate the design process and rationale behind the "vision" system of the WGI.

Briefly though my method has been to construct a small application that allows the generation of "samples" of an video, where mouse clicks on a frame of the video stream creates an N*N sized image surrounding the centre of the mouse click, which allows the generation of a set of images which represent objects of a given type (e.g. skin, not skin (!), blue marker, red marker etc).++ A separate application is then used to create a dataset suitable for training the ML technique. This means that an image is transformed into an array of either BGR or HSV values and saved to a file. I then concatenate and randomise the files into one single dataset with a label to identify the sample so end up with something along the lines of :

SN, 1.0, 2.0, 6.0, 1.0, 2.0, 5.0, 1.0, 5.0, 3.0 
NS, 6.0, 6.0, 6.0, 1.0, 6.0, 5.0, 1.0, 6.0, 3.0 
BM, 2.0, 2.0, 6.0, 1.0, 6.0, 5.0, 4.0, 6.0, 3.0 

for a 3x3 sample. I should note that I need to add an additional step into the above to verify the data set (via visual inspection) to ensure that "mis-clicks" are not adding in erroneous samples.

I then run this through a modified version of  the OpenCV code for letter_recog (both the C and python versions - which is interesting in its right since the same techniques give differing results [implementation differences?]) in order to obtain a some basic data on the validity of the technique and yesterday finally managed to write some code to visualise the results on a live or recorded stream.

The above needs formalising, and I still have to do some work in that the Boost code is set for too many classes and falls over if I reduce the number < 10; there's no python version of mlp or nbayes and I also need to actually look at NNs properly.

And of course I need to formalise the results. My early impression is that the ML techniques are promising but too slow, however this is quite possibly an implementation issue (python boost takes minutes vs seconds for the C++ version). I also need to look in more detail at differing sample dimensions and sizes, the effects of differing colour spaces since I've only tested with HSV so far and would like to see the results with grey, BGR, HS only etc....currently too many FP and NFPs e.g.
rtrees 5x5

knearest 5x5

The above is a typical "noisy" frame (PS3eye this time) and the chair and table in the image are coloured very similar to areas of skin hence all the noise. The larger areas of skin that have not been correctly classified are overexposed and I have made a deliberate choice not to classify these areas. That of course is primarily a hardware issue displaying the importance of the camera system colour accuracy in variable lighting conditions.

Oh and of course, all of this is a side project and NOT the main focus of my thesis...all very interesting though :)

Comments

Popular posts from this blog

I know I should move on and start a new blog but I'm keeping this my temporary home. New project, massive overkill in website creation. I've a simple project to put up a four page website which was already somewhat over specified in being hosted on AWS and S3. This isn't quite ridiculous enough though so I am using puppet to manage an EC2 instance (it will eventually need some server side work) and making it available in multiple regions. That would almost have been enough but I'm currently working on being able to provision an instance either in AWS or Rackspace because...well...Amazon might totally go down one day! Yes, its over-the-top but I needed something simple to help me climb up the devops and cloud learning curve. So off the bat - puppet installation. I've an older 10.04 Ubuntu virtual server which has been somewhat under-taxed so I've set that up as a puppet master. First lesson - always use the latest version from a tarball unless you have kept t...

More Observations

After this post I AM going to make videos ;) I spent some time doing some basic tests last night under non optimal (but good) conditions: 1) Double click/single click/long tap/short tap These all can be supported using in air interactions and pinch gestures. I'd estimate I had +90% accuracy in detection rate for everything apart from single click. Single click is harder to do since it can only be flagged after the delay for detecting a double click has expired and this leads to some lag in the responsiveness of the application. 2) The predator/planetary cursor design. In order to increase the stability of my primary marker when only looking at a single point e.g. when air drawing, I decided to modify my cursor design. I feel that both fiducial points should be visible to the user but it didn't quite "feel" right to me using either the upper or lower fiducial when concentrating on a single point hence I've introduced a mid-point cursor that is always 1/2 wa...

Hardware Design

I finally took the plunge and did the epoxy wields on the prototype rig using some Bondloc titanium epoxy and it works really well, setting hard in just a few minutes and rock solid after 15. While the rig is incredibly primitive it does allow me to shoulder mount both the projector and camera and do so so that they are stable. I just need to make a final decision re placement (left vs right for camera/projector) ~ my initial take was to place the projector on the right so as to be in line with my dominant eye, but I think of more importance/potential is to have a better correlation between the forward facing camera and the dominant hand for pointing...which should also reduce occlusion of the projection. I'm slowly uncovering papers in this area and found another one today Designing a Miniature Wearable Visual Robot which details the design rationale behind a robotised wearable camera. Mayol et al (2002) use a 3D human model to examine different frames of reference and requirem...