Skip to main content

Machine Learning & OpenCV

Slow progress. I've spent the last couple of days generally looking around at Neural Networks and OpenCVs machine learning code. My aim was to find some relatively simple to implement examples of NNs and form a basic impression of the applicability of the techniques to the task of skin and marker recognition in a video stream.

Now I should point out that my understanding of most ML techniques and NNs is rudimentary at best and, at this point, that's fine since I am looking for practical techniques that can be employed at a rapid prototyping stage (hence the cost of experimenting with a proposed solution to a given problem should measure in the hours rather than days); so I am not going to go into too much depth on the AI part of things here.

What I do need to do though is define the methodology and results of the experiments I've been running so far since a write-up at this stage will form a useful appendix to my thesis and demonstrate the design process and rationale behind the "vision" system of the WGI.

Briefly though my method has been to construct a small application that allows the generation of "samples" of an video, where mouse clicks on a frame of the video stream creates an N*N sized image surrounding the centre of the mouse click, which allows the generation of a set of images which represent objects of a given type (e.g. skin, not skin (!), blue marker, red marker etc).++ A separate application is then used to create a dataset suitable for training the ML technique. This means that an image is transformed into an array of either BGR or HSV values and saved to a file. I then concatenate and randomise the files into one single dataset with a label to identify the sample so end up with something along the lines of :

SN, 1.0, 2.0, 6.0, 1.0, 2.0, 5.0, 1.0, 5.0, 3.0 
NS, 6.0, 6.0, 6.0, 1.0, 6.0, 5.0, 1.0, 6.0, 3.0 
BM, 2.0, 2.0, 6.0, 1.0, 6.0, 5.0, 4.0, 6.0, 3.0 

for a 3x3 sample. I should note that I need to add an additional step into the above to verify the data set (via visual inspection) to ensure that "mis-clicks" are not adding in erroneous samples.

I then run this through a modified version of  the OpenCV code for letter_recog (both the C and python versions - which is interesting in its right since the same techniques give differing results [implementation differences?]) in order to obtain a some basic data on the validity of the technique and yesterday finally managed to write some code to visualise the results on a live or recorded stream.

The above needs formalising, and I still have to do some work in that the Boost code is set for too many classes and falls over if I reduce the number < 10; there's no python version of mlp or nbayes and I also need to actually look at NNs properly.

And of course I need to formalise the results. My early impression is that the ML techniques are promising but too slow, however this is quite possibly an implementation issue (python boost takes minutes vs seconds for the C++ version). I also need to look in more detail at differing sample dimensions and sizes, the effects of differing colour spaces since I've only tested with HSV so far and would like to see the results with grey, BGR, HS only etc....currently too many FP and NFPs e.g.
rtrees 5x5

knearest 5x5

The above is a typical "noisy" frame (PS3eye this time) and the chair and table in the image are coloured very similar to areas of skin hence all the noise. The larger areas of skin that have not been correctly classified are overexposed and I have made a deliberate choice not to classify these areas. That of course is primarily a hardware issue displaying the importance of the camera system colour accuracy in variable lighting conditions.

Oh and of course, all of this is a side project and NOT the main focus of my thesis...all very interesting though :)

Comments

Popular posts from this blog

I know I should move on and start a new blog but I'm keeping this my temporary home. New project, massive overkill in website creation. I've a simple project to put up a four page website which was already somewhat over specified in being hosted on AWS and S3. This isn't quite ridiculous enough though so I am using puppet to manage an EC2 instance (it will eventually need some server side work) and making it available in multiple regions. That would almost have been enough but I'm currently working on being able to provision an instance either in AWS or Rackspace because...well...Amazon might totally go down one day! Yes, its over-the-top but I needed something simple to help me climb up the devops and cloud learning curve. So off the bat - puppet installation. I've an older 10.04 Ubuntu virtual server which has been somewhat under-taxed so I've set that up as a puppet master. First lesson - always use the latest version from a tarball unless you have kept t

Camshift Tracker v0.1 up

https://code.google.com/p/os6sense/downloads/list I thought I'd upload my tracker, watch the video from yesterday for an example of the sort of performance to expect under optimal conditions ! Optimal conditions means stable lighting, and removing elements of a similar colour to that which you wish to track. Performance is probably a little worse, (and at best similar to) the touchless SDK. Under suboptimal conditions...well its useless but then so are most trackers which is a real source of complaint about most of the computer vision research out there.....not that they perform poorly but rather that there is far too little honesty in just how poorly various algorithms perform under non-laboratory conditions. I've a few revisions to make to improve performance and stability and I'm not proud of the code. It's been...8 years since I last did anything with C++ and to be frank I'd describe this more as a hack. Once this masters is out of the way I plan to look a

More Observations

After this post I AM going to make videos ;) I spent some time doing some basic tests last night under non optimal (but good) conditions: 1) Double click/single click/long tap/short tap These all can be supported using in air interactions and pinch gestures. I'd estimate I had +90% accuracy in detection rate for everything apart from single click. Single click is harder to do since it can only be flagged after the delay for detecting a double click has expired and this leads to some lag in the responsiveness of the application. 2) The predator/planetary cursor design. In order to increase the stability of my primary marker when only looking at a single point e.g. when air drawing, I decided to modify my cursor design. I feel that both fiducial points should be visible to the user but it didn't quite "feel" right to me using either the upper or lower fiducial when concentrating on a single point hence I've introduced a mid-point cursor that is always 1/2 wa