Skip to main content

First captures done

I made my first set of captures today - I'm not sure they are usable though since the act of role-playing the system use brought up a number of interesting issues :

Worn Camera Position
Mistry using Sixth Sense[1]
An option in regards to the camera position is to simply place it in the same position as used by Mistry. However my concerns here are 2 fold:

1) Design rationale. There is no design rationale as to the placement of the camera.I don't mean an extended examination of the rationale, just a simple overview of the thinking behind the placement.

2) Ergonomics. I can't help but think how uncomfortable that position is going to be to use for protracted periods of time (gorilla arm effect) or after a long day. Also in that position what about people with a high BMI - isn't the camera angle going to be greater than 90 degrees?

EDIT: Another couple of concerns in the back of my mind seem pertinent:

3) Social constraints. Having ones arms so far forwards is going to draw attention to system usage and quite frankly, looks peculiar. We don't tend to naturally gesture in an exaggerated fashion unless we are agitated or upset and I would suspect that people would feel uncomfortable using such a gestural style in public spaces.

4) Situational Awareness. One of the advantages of an "eyes-free" gestural system is the ability to improve situational awareness but concentrating on the performance of a gesture in front of the space requires attention to be placed on the gesture.

So I'm not convinced that a single forward facing camera is the best option for a WGI...or even that a monocular camera system is viable for the range of applications that have been demonstrated with Sixth Sense. While the position might be pragmatic, giving an evenly distributed view on both the users hands, the usability looks to be less than optimal in that position if we consider a wider range of users than your skinny MIT students and failed bodybuilders (moi)!

Field Of View
A "Natural" gesture pose?
I've thought for a while now though that  a serious drawback to the use of computer vision for WGIs is in the limited FOV for gestural input - if I could do some blue-sky research alongside this project it would probably be to use 2 cameras on motorised heads each dedicated to locating and tracking 1 hand but I just don't have the resources for that. But I DO have the resources to look at the use of 2 cameras, 1 focused down to capture gestures in a more natural position (hands at the sides) and one in the forward position. Of course a fish-eye lens is another option...(*sigh* oh for funding).

This of course led me to thinking about camera angles. As said, the lack of design rationale means we don't really know if there were particular benefits in the selection of the camera Mistry used ~ personally I'm working (initially) with the c910 for the widescreen angle and HD capture @ 30fps but I have a growing inclination to look at IR..but I digress. To help consider FOV and camera angles I put together a couple of illustrations.
 


Quite interesting to have a visual representation of just what angle (160 deg) is necessary to capture the hands in a natural gesture position and also observe the limitations of the forward facing camera in capturing anything more than the extremities of the hand. Another observation is that the hand seldom has more than the fingers over the 90 degree line, which might be useful....

Frame Rate
Going back to the difficulties I've been having with openCV frame rate is yet another issue. I know a lot of computer vision projects have been using the PS3Eye due to its ability to capture at 120FPS ~ obviously this provides a far clearer image of the region of interest on any given frame but the trade off is, of course, the resolution drops to 320x240. Still its one to be considered.

Anyways, quite a lot of food for thought....

1. image source : http://www.technologyreview.com/files/32476/mistry_portrait_x220.jpg)
2. Thanks to fuzzimo for the protractor.

Comments

Popular posts from this blog

I know I should move on and start a new blog but I'm keeping this my temporary home. New project, massive overkill in website creation. I've a simple project to put up a four page website which was already somewhat over specified in being hosted on AWS and S3. This isn't quite ridiculous enough though so I am using puppet to manage an EC2 instance (it will eventually need some server side work) and making it available in multiple regions. That would almost have been enough but I'm currently working on being able to provision an instance either in AWS or Rackspace because...well...Amazon might totally go down one day! Yes, its over-the-top but I needed something simple to help me climb up the devops and cloud learning curve. So off the bat - puppet installation. I've an older 10.04 Ubuntu virtual server which has been somewhat under-taxed so I've set that up as a puppet master. First lesson - always use the latest version from a tarball unless you have kept t

Camshift Tracker v0.1 up

https://code.google.com/p/os6sense/downloads/list I thought I'd upload my tracker, watch the video from yesterday for an example of the sort of performance to expect under optimal conditions ! Optimal conditions means stable lighting, and removing elements of a similar colour to that which you wish to track. Performance is probably a little worse, (and at best similar to) the touchless SDK. Under suboptimal conditions...well its useless but then so are most trackers which is a real source of complaint about most of the computer vision research out there.....not that they perform poorly but rather that there is far too little honesty in just how poorly various algorithms perform under non-laboratory conditions. I've a few revisions to make to improve performance and stability and I'm not proud of the code. It's been...8 years since I last did anything with C++ and to be frank I'd describe this more as a hack. Once this masters is out of the way I plan to look a

More Observations

After this post I AM going to make videos ;) I spent some time doing some basic tests last night under non optimal (but good) conditions: 1) Double click/single click/long tap/short tap These all can be supported using in air interactions and pinch gestures. I'd estimate I had +90% accuracy in detection rate for everything apart from single click. Single click is harder to do since it can only be flagged after the delay for detecting a double click has expired and this leads to some lag in the responsiveness of the application. 2) The predator/planetary cursor design. In order to increase the stability of my primary marker when only looking at a single point e.g. when air drawing, I decided to modify my cursor design. I feel that both fiducial points should be visible to the user but it didn't quite "feel" right to me using either the upper or lower fiducial when concentrating on a single point hence I've introduced a mid-point cursor that is always 1/2 wa