Monday, 24 October 2011

Final survey now up

Now the hard part, I have to find some people to participate! I've vastly over stated the amount of time needed to take the surveys since I know everyone is different in how they approach these. If there's any incentive, you get to see me performing gestures...more likely to be a deterrent given I didn't even shave before hand!

If you wander across this blog before December 1st, if you could please take 20 minutes to participate in one of the surveys it would be a huge help.

Added some anotations to the video...

that is all :)

Camshift Tracker v0.1 up

I thought I'd upload my tracker, watch the video from yesterday for an example of the sort of performance to expect under optimal conditions! Optimal conditions means stable lighting, and removing elements of a similar colour to that which you wish to track. Performance is probably a little worse, (and at best similar to) the touchless SDK.

Under suboptimal conditions...well its useless but then so are most trackers which is a real source of complaint about most of the computer vision research out there.....not that they perform poorly but rather that there is far too little honesty in just how poorly various algorithms perform under non-laboratory conditions.

I've a few revisions to make to improve performance and stability and I'm not proud of the code. It's been...8 years since I last did anything with C++ and to be frank I'd describe this more as a hack. Once this masters is out of the way I plan to look at this again and try out some of my ideas but I really see any approach which relies on colour based segmentation and standard webcams as having limited applicability.

So its taken a while but I hope this proves of use to someone someone some day.

Survey 1

Well with my current need to avoid people as much as possible I've had to make a last minute change to my methodology for data gathering. Hopefully I'll be able to mingle with the general populace again next week and do a user study but this week at least I'm in exile! Hence I have put together 3 surveys of which the first is online. The first ones quite lengthy but it would be a huge help if anyone who wanders across this would take 20 minutes to participate.

Gesture Survey 1

Sunday, 23 October 2011

Video 2!

Took me forever to get around to that one but I've been trying to solve lots of little problems. There's no sound so please read my comments on the video for an explanation of what you're seeing.

The main issue I'm now having is with the fiducial tracking in that the distance between the centroids of each fiducial is important in recognising when a pinch gesture is made, however due to the factors of distance from the camera causing the area of the fiducial to vary and, at the same time, the often poor quality of the bounding area for each fiducial causing the area to vary, I cant get the pinch point to the level where it provides "natural feedback" to the user i.e. the obvious feedback point where the systems perception and the users perception should agree is when the user can feel that they have touched their fingers together.

As it stands, due to the computer vision problems my system can be off by as much as 1cm :(

I should actually say that it IS possible to reduce this however then tracking suffers and the systems state (which is really limited to engaged/unengaged) varies wildly meaning that dynamic gestures are poorly recognised.


I could go back to the beginning and do another iteration of the basic marker tracking code - I've mentioned one option (laplacian) with my current hardware that I think would enhance the performance (and allow me to get rid of the makers!) and I could also do some basic contour detection within the current code which might enhance things...but this is NOT a computer science thesis I'm working on and feel I've treked further along that road than I had intended already.

Hence any additional code is going to focus specifically on making the interaction with the air-writing interface as fluid as possible. Before that though - SURVEY time!

Friday, 21 October 2011

More Observations

After this post I AM going to make videos ;)

I spent some time doing some basic tests last night under non optimal (but good) conditions:

1) Double click/single click/long tap/short tap
These all can be supported using in air interactions and pinch gestures. I'd estimate I had +90% accuracy in detection rate for everything apart from single click. Single click is harder to do since it can only be flagged after the delay for detecting a double click has expired and this leads to some lag in the responsiveness of the application.

2) The predator/planetary cursor design.
In order to increase the stability of my primary marker when only looking at a single point e.g. when air drawing, I decided to modify my cursor design. I feel that both fiducial points should be visible to the user but it didn't quite "feel" right to me using either the upper or lower fiducial when concentrating on a single point hence I've introduced a mid-point cursor that is always 1/2 way between the 2 fiducial. The "feel" when interacting is now much better since the "pinch point" is where we would normally naturally expect a pen to be. 

3) Pinch movement
In relation to the above though the fact that pinching/unpinching moves the points is causing me some issues with accuracy and extraneous points being add to any drawing. I'm hoping to overcome this by better accuracy of pinch/unpinch events however THAT is tied back to accuracy on the fiducial positioning/area detection.

4) Kalman filtering
I'm not too sure how happy I am with the Kalman filtering on the input. While it increases stability it creates a more "fluid" movement of the marker which isn't good for tight changes in direction. That said it makes the air-writing feel very smooth - I wish I could increase the FPS...which I may attempt to do by making the markermonitor use sockets rather than pipes. However I feel like I've spent enough time on the technical details of the prototype and am loathe to spend more at this point.

5) Breathing.
I was surprised at how much impact breathing makes  when sitting down. Depending on the distance between the fiducial and camera this can be massive during a deep breath and is enough to cause any gestures to be poorly recognised. A more advanced system would have to compensate for this and also any movement involved in walking so giros/accelerometers are a must in the longer term. This was already a known requirement for any projection system and has been looked at in some papers (see Murata & Fujinami 2011) hence I expect that any "real world" system would have access to this data. Not too much I can do about this at this point.

6) Right arm/lower right quadrant block when sitting down
This one surprised me. When sitting down and using the right arm for movement, the lower right hand quadrant nearest the body is essentially "blocked" for use in my system since its difficult to move the arm back to this position. Not an issue when standing.

I plan on making some tweaks in terms of the pinch/unpinch detection to see if I can improve the accuracy of that and some UI changes to support it but the next step with the prototype is to take some empirical measurements on the systems performance,

Right, time to make some videos.

Murata, Satoshi;   Fujinami, Kaori; 2011 stabilization of Projected Image for Wearable Walking Support System Using Pico-projector

Thursday, 20 October 2011

So Wheres the Survey/Video?

I've had a very unexpected event happen in that my little one has come down with mumps (who has already mostly recovered from it) and its something I've never had or been immunised against, hence I've had to cancel the study I had organised for this weekend (it obviously would not be ethical for me to be in close contact with people while I might have a serious communicable illness...I just wish others would take a similar attitude when ill). And I may have to avoid contact with people for up-to 3 weeks since the contagious period is 5 days before developing symptoms and 9 days afterwards which rather puts the dampers on my plans for a user study...3 weeks from now I had planned to be writing up my analysis NOT still analysing my results. PANIC!

Hence, I've adapted my research plan - I'm going to be putting up a survey this weekend which I'll run for 3 weeks, run a limited (5 uesrs! lol) user study of the prototype just after that and have to base my results/discussion/conclusion on that. Hence video up tomorrow (promise) with survey up on Saturday/Sunday. Best laid plans and all that :)


Well there's obviously going to be a flurry of interest in WGIs given the publishing of the Omnitouch paper. Brilliant stuff, anyone want to fund me to buy a PrimeSense camera? Seriously though, ToF cameras solve a lot of the computer vision problems I have been experiencing and I was very tempted to work with a Kinect , the problem being that the Kinects depth perception doesn't work below 50cm and that would have lead to an interaction style similar to Mistrys, one which I have discounted due to various ergonomic and social acceptance factors.

If I had access to this technology I would be VERY interested in applying it to the non-touch gestural interaction style I've been working on since I see the near term potential of the combined projection/WGI in enabling efficient micro-interactions (interactions which take less time to perform than it does to take a mobile phone from your pocket/bag).

Anyways, good stuff and its nice to see an implementation demonstrating some of the potential of the technology without the sing-and-dance that accompanied SixthSense.

(Harrison et al, 2011)

Monday, 17 October 2011

Have we got a video (2)?

Yes, but I'm not posting it yet *grin*

A very frustrating bug cropped up when I tried tying the camshift based detector into the marker tracking service - only 1/3 of the marker co-ordinate updates were being processed! Sure, my code is ugly, inefficient, leaking memory left right and centre BUT thats no reason to just silently discard the data I'm generating (and yes, I am generating what I think I'm generating). I strongly suspect the culprit is asyncproc - I've had some experience before with trying to parse data via pipes and hence know its....not the preferred way to do things, however proof of concept wise I hoped it would save the hassle of having to get processes talking to each other. *sigh* "If its worth doing once, its worth doing right."

Anyways, I've worked around it, and have the basics up and running. What are the basics?

- Basic gmail reader. Main purpose here is to look at pinch scrolling.

- Basic notifier. Shows new mails as they arrive. Purpose is to examine if the pico-projector provides efficient support for micro-interactions.

- Basic music player. Main purpose here is to test simple gesture control.

- Basic text input area. Main purpose is to test the air-writing concept.

The basic test run I've done so far suggests that there's work to be done, but the basic concept is sound and the hardware/software is sufficient for a small scale study.  Some thoughts :

- dwell regions need to be active rather than passive. By this I mean that its too easy to enter a region and unintentional execute the associated behaviour. Requiring that the markers are engaged when within the dwell region will satisfy this.

- Engaged distance needs to be a function of the area of the fiducials otherwise the engaged state is entered when the markers are at different widths depending on the distance from the camera. 

- if the application supports any form of position based input (whether it be dwell regions, hover buttons, clicks etc) use positions substantially away from the edge of the "screen".

- Marker trails; I'm actually finding these confusing, in part since there are two of them. I had thought a while ago about drawing a "pointer" that is at the mean distance between the two fiducials when in the "engaged" state. I think I need to experiment with that.

- I need to work with this concept of "command mode" a bit more. The basic concept I've been working on is that the user draws a circle to execute a command, and then ether a word or symbol (e.g. music note) to execute commands. "commands" in this context are things such as switching between "applications". While it works reasonably well, I have the main module interpreting gestures and then passing a text string to the application, due to the way that the gesture recogniser works performance would be improved if the unistroke recogniser was using a reduced set of gestures which are applicable to each module. This would improve gesture recognition substantially I think but I need to get some metrics to test that.

- Performance: Performance is DIRE! I've managed to do some rather nifty things with Python that I had thought would require C++ but the slapdash approach that I've taken to this prototype framework has things running at about 10% of the speed that it should.

Another long night ahead of me *sigh*

UPDATE: Frustratingly I've found that the camshift tracker is also very lighting dependant. What was a reasonably positive experience under daytime lighting conditions deteriorated rapidly once night fell and I attempted to use the code under artificial lighting. No amount of tweaking of parameters or changing marker colour would rectify things :
- the yellow marker was reasonably stable, as good as the light green marker I had been using for the hsv tracker. That said, under my homes lighting white takes on a yellowish hue and hence the track would get confused on areas of white.
- red/orange/pink were all terrible, frequently becoming confused with either skin tones , areas of my carpet or lights on my laptop.
- dark green was not detected at all.
- dark blue frequently became confused with my laptop screen, laptop keyboard or areas of my t-shirt (black).

So the biggest problem that the project has is that the implementation of a robust system is VERY difficult which is going to make user testing potentially tricky.

THANKFULLY this has all framed my user study very tightly - within group study, look at natural gesture drawing, compared to use of the system, command gestures, limited number of letters, words and sentences; some basic interface command and control tasks; and videos on public/private perception of WGI usage, one set using an "exagerated" UI (e.g. SixthSense) compared to the discrete "OS6Sense" style (do people even perceive it as being discrete?).

Only 4 weeks later than I had initially intended doing it *sigh*

Oh video? Hmmmm not today, there's still a few tweaks I need to do before putting it online (e.g. putting something other than Jakalope in my media dir...I like Jakalope but its their latest album and I cant help but feel that I'm listening to Brittany Spears! The shame!)

Sunday, 16 October 2011

New Detector Done

Much better but I'm still not happy with it - camshift + backproj + kalman means that the marker coordinates are a lot smoother with far less noise (obviously) but the nature of detecting markers in segmented video still leads to a less than robust implementation. There's room for improvement and I still need to add in some form of input dialog for naming markers (and I must confess I am CLUELESS on the c++ side for that.....wxwidgets? Qt?) but I'm that little bit happier.

As per usual I had hoped for a video, but the lack of a dialog makes configuring things into a manual process (I've got basic save/load support working but given how sensitive this is to lighting still its a lot of messing around) hence I'm delaying yet again. Given my page views though I don't think I will be disappointing many people.

What is frustrating is the amount of time I've had to spend on basic work with computer vision rather than looking at the actual interactions for this technology. While I may NOT be the greatest coder ever, or even 1/2 as clever as I once thought I was (about 25 years ago), with the number of truly great coders and minds who have worked on computer vision I'm still somewhat disappointed that there's nothing really magnitudes more robust out there than what I'm doing. That said of course, if I was able to work with a kinect or similar tech I would expect something far more impressive but the 50cm limit for the depth sensing renders that point moot (by about 25cm). And I still think that some of the AI techniques could pay off dividends....and there are still a number of basic tricks I could apply (e.g. laplace to build a feature module for sift/surf/hmm detection of hands and finger pose detection - I cant help but think that would work really well) but I have to get away from the computer vision research sadly.

Anyways - I do think I'm at the last fence; video definitely over the next few days.

Saturday, 15 October 2011

Rewrite of fiducial detector

Its the last thing I want to do - I've roughed out code for most of the UI elements, the plumbing for the back-end works (although you can hear it rattle in places and there is considerable scope for improvement) but the marker detection code just isn't up for to the job and is getting a rewrite to use camshift and a Kalman filter. I tried the Kalman on the current code and its effective in smoothing the jitter caused by variations in centroid position but the continual loss of marker and the extreme numbers I am having to use to sense when the markers are engaged/unengaged is making it a frustrating experience.

I MUST come up with something working by Monday so that I can do something with this and was hoping to be tweaking various parameters of the interaction today but I'm going right back to stage one. Very frustrating but I ran a few experiments with the camshift algorithm and feel its required to make the air-writing implementation flow smoothly.

All nighter it looks like then :(

Friday, 14 October 2011

Drag Drop - a gesture design delema!

So I've run into an interesting interaction design problem. I've implemented some very basic list interface elements and initially supported the scrolling interaction via dwell regions. I'm unhappy with this for a number or reasons :

1) Dwell regions are not obvious to a user since there is no visual feedback to the user as to their presence. While I can provide feedback, there are times where I may choose not to do so (e.g. where the dwell region overlaps with the list).
2) Dwell regions when combined with other UI elements can hinder users interaction - e.g. if a user wishes to select an item that is within the dwell region and the dwell region initiates the scrolling behaviour causing the users selected item to move.
3) Interaction is very basic and I dont really want to implement any more support for these.

The obvious alternative to a dwell region though is drag and drop (or in the case of OS6Sense, pinch and unpinch) however since these are gestures, there's a possibility that the gestures will be interpreted as a command interaction.

But I want to be able to support pinch and unpinch...I think this is one that I will just have to try and see how it works out, but I think I have found a flaw in the interaction style.

Sunday, 9 October 2011

Another couple of observations

Schwaller in his reflections noted that developing an easy way to calibrate the marker tracking was important. I've observed that for application development, providing alternate input methods is equally important...quick a general usability principle of course, and all harkens back to providing multiple redundant modalities but....

My framework is about 50% of the way there, I'm becoming VERY tempted to look at a native android client but on x86 since I have the horsepower to drive things. If I had more time I'd go for it but...

Friday, 7 October 2011

Some observations/questions

Study delayed since I think I can make progress with the prototype and answer some of my questions while opening up new ones :/ I'm glad I know this sort of last minute thing is quite common in research or I might be panicking (3 months to go, omg!).

I'm still having problems with marker tracking due to varying lighting conditions.  At home, my "reliable" green marker doesn't like my bedroom but is great downstairs and in my office. Blue/red/yellow - all tend to suffer from background noise. I may have to try pink! Basically I know that colour based segmentation and blob tracking is a quick and easy way of prototyping this, but real world? Terrible!

If using dynamic gestures what are the best symbols to use? In fact is any semiotic system useful for gesture interaction? One could also ask are symbolic gestures really that useful for a wearable system....

Where should a camera point? i.e. where should its focus be?
I've found myself starting gestures slightly left of any central line, so primarily using the right side of my body - how true does this hold for other people? Is there a specific camera angle that is useful? These ones are for the study.

I was trying out a mockup of the interface on my palm and while I know the "natural" interaction style would be to make any projected UI elements into touch elements, my tracking just cant support it (in fact I have to wonder how anyone has done that since it sends histogram based trackers e.g. camshift insane). Hence there is a gulf between the input modalities location and that of the visual output modality which I don't believe represents an effect interaction paradigm. I've not tried it on other surfaces, I want to get a little further with the UI first.

Still hopeful of putting together another demo video by Monday.

Wednesday, 5 October 2011

Another Quick Update

I've been very busy putting together a framework to support a number of small applications for implementation - the apps are intended to be nothing more than proof of concept and to explore some of the interaction issue e.g. are dwell regions a better option than selectable areas (we're in eye tracking territory now)?; can these be applied to navigation?; How do we implement a mobile projected UI (terra incognita I believe)?

The framework is largely event/message driven since it affords us with loose coupling and dynamic run-time binding for both messages and classes ~ if I wasn't farting around with abstraction of the services (useful in the longer become both sinks and producers of events) it would probably come in at < 200 lines of code...

The point being while I'm not supposed to be writing code at this stage I am and hope to have at least a video by the end of the weekend (yes a week late).

Saturday, 1 October 2011

VERY excited!

Back to the research today and today is the day where I had set myself the serious goal of knuckling down and rewriting my introduction and literature review because I am VERY unhappy with both. I'd finished up my study definition document for my exploratory study next week and was doing some research into social acceptability and gestures...when it hit me. Most of the research suggests that only discrete gestures are socially acceptable (thus sixthsense/minority report style interaction is unlikely to be accepted by users in many social situations) so I asked myself :

1) Why look at how users naturally perform the gestures? Good question....and to be honest, because I honestly don't KNOW what I will find out. I *think* I know, but theres a huge gulf there!
2) How do I make a discrete gesture based system?

and I had also been asking myself :

3) How do I expand the number of states that I can represent using my current implementation?

And it hit me like an express train.

If I'm designing an air-writing gesture based interaction, design a system that recognises users gestures as they would naturally write!

Most technical and interaction issues solved in one fell swoop! Hopefully demo with another video tomorrow!