Skip to main content

Posts

Showing posts from 2011

Getting the design right vs getting the right design

So today I've been looking at ways to make the text input interaction more fluid and investigating if I can able some level of error correction (i.e. a spell checker). 1) 2 hours of use off and on and my arm aches, my back aches, my wrist aches. Even with the minimal arm movement that is required, my forearm still needs supporting and to spell a 5 letter words I am still moving the focal point about....6 inches across which is enough to induce fatigue over time. If I had more time I would do an about turn and look at just using a single finger (although that reintroduces the segmentation issue which the pinch technique solves).... 2) Writing without visual feedback is more taxing than I had thought. I originally did an experiment which involved people walking and writing at the same time as a "proof of concept" exploration. I suspect I had the task wrong though - I should have had them do it blindfolded! Luckily I've still time to repeat that experiment. 3) The

Recording/Playback

I've been busy with whipping my literature review and methodology sections together for the last few weeks (with the occasional diversion to tout the surveys, still a very low response so far *sadface*) and I'm heading towards crunch time now where I'm going to have to bring everything together for a draft version early next months. Since I'm now more in a documenting than development phase I've little work done on the prototype apart from to add recording/playback capabilities so that a session can be "recorded" and then I can explore if changes to the gestural interface improve recognition (although that isn't a major aim at this point). Again, a quick plea to anyone reading, just a few more responses to the gesture and display surveys and I'll be able to start my analysis for that data so if you have 5 minutes it would be greatly appreciated.

Final survey now up

Now the hard part, I have to find some people to participate! I've vastly over stated the amount of time needed to take the surveys since I know everyone is different in how they approach these. If there's any incentive, you get to see me performing gestures...more likely to be a deterrent given I didn't even shave before hand! If you wander across this blog before December 1st, if you could please take 20 minutes to participate in one of the surveys it would be a huge help.

Camshift Tracker v0.1 up

https://code.google.com/p/os6sense/downloads/list I thought I'd upload my tracker, watch the video from yesterday for an example of the sort of performance to expect under optimal conditions ! Optimal conditions means stable lighting, and removing elements of a similar colour to that which you wish to track. Performance is probably a little worse, (and at best similar to) the touchless SDK. Under suboptimal conditions...well its useless but then so are most trackers which is a real source of complaint about most of the computer vision research out there.....not that they perform poorly but rather that there is far too little honesty in just how poorly various algorithms perform under non-laboratory conditions. I've a few revisions to make to improve performance and stability and I'm not proud of the code. It's been...8 years since I last did anything with C++ and to be frank I'd describe this more as a hack. Once this masters is out of the way I plan to look a

Survey 1

Well with my current need to avoid people as much as possible I've had to make a last minute change to my methodology for data gathering. Hopefully I'll be able to mingle with the general populace again next week and do a user study but this week at least I'm in exile! Hence I have put together 3 surveys of which the first is online. The first ones quite lengthy but it would be a huge help if anyone who wanders across this would take 20 minutes to participate. Gesture Survey 1

Video 2!

http://www.youtube.com/watch?v=v_cb4PQ6oRs Took me forever to get around to that one but I've been trying to solve lots of little problems. There's no sound so please read my comments on the video for an explanation of what you're seeing. The main issue I'm now having is with the fiducial tracking in that the distance between the centroids of each fiducial is important in recognising when a pinch gesture is made, however due to the factors of distance from the camera causing the area of the fiducial to vary and, at the same time, the often poor quality of the bounding area for each fiducial causing the area to vary, I cant get the pinch point to the level where it provides "natural feedback" to the user i.e. the obvious feedback point where the systems perception and the users perception should agree is when the user can feel that they have touched their fingers together. As it stands, due to the computer vision problems my system can be off by as much

More Observations

After this post I AM going to make videos ;) I spent some time doing some basic tests last night under non optimal (but good) conditions: 1) Double click/single click/long tap/short tap These all can be supported using in air interactions and pinch gestures. I'd estimate I had +90% accuracy in detection rate for everything apart from single click. Single click is harder to do since it can only be flagged after the delay for detecting a double click has expired and this leads to some lag in the responsiveness of the application. 2) The predator/planetary cursor design. In order to increase the stability of my primary marker when only looking at a single point e.g. when air drawing, I decided to modify my cursor design. I feel that both fiducial points should be visible to the user but it didn't quite "feel" right to me using either the upper or lower fiducial when concentrating on a single point hence I've introduced a mid-point cursor that is always 1/2 wa

So Wheres the Survey/Video?

I've had a very unexpected event happen in that my little one has come down with mumps (who has already mostly recovered from it) and its something I've never had or been immunised against, hence I've had to cancel the study I had organised for this weekend (it obviously would not be ethical for me to be in close contact with people while I might have a serious communicable illness...I just wish others would take a similar attitude when ill). And I may have to avoid contact with people for up-to 3 weeks since the contagious period is 5 days before developing symptoms and 9 days afterwards which rather puts the dampers on my plans for a user study...3 weeks from now I had planned to be writing up my analysis NOT still analysing my results. PANIC! Hence, I've adapted my research plan - I'm going to be putting up a survey this weekend which I'll run for 3 weeks, run a limited (5 uesrs! lol) user study of the prototype just after that and have to base my results/d

OmniTouch

Well there's obviously going to be a flurry of interest in WGIs given the publishing of the Omnitouch paper. Brilliant stuff, anyone want to fund me to buy a PrimeSense camera? Seriously though, ToF cameras solve a lot of the computer vision problems I have been experiencing and I was very tempted to work with a Kinect , the problem being that the Kinects depth perception doesn't work below 50cm and that would have lead to an interaction style similar to Mistrys, one which I have discounted due to various ergonomic and social acceptance factors. If I had access to this technology I would be VERY interested in applying it to the non-touch gestural interaction style I've been working on since I see the near term potential of the combined projection/WGI in enabling efficient micro-interactions (interactions which take less time to perform than it does to take a mobile phone from your pocket/bag). Anyways, good stuff and its nice to see an implementation demonstrating some

Have we got a video (2)?

Yes, but I'm not posting it yet *grin* A very frustrating bug cropped up when I tried tying the camshift based detector into the marker tracking service - only 1/3 of the marker co-ordinate updates were being processed! Sure, my code is ugly, inefficient, leaking memory left right and centre BUT thats no reason to just silently discard the data I'm generating (and yes, I am generating what I think I'm generating). I strongly suspect the culprit is asyncproc - I've had some experience before with trying to parse data via pipes and hence know its....not the preferred way to do things, however proof of concept wise I hoped it would save the hassle of having to get processes talking to each other. *sigh* "If its worth doing once, its worth doing right." Anyways, I've worked around it, and have the basics up and running. What are the basics? - Basic gmail reader. Main purpose here is to look at pinch scrolling. - Basic notifier. Shows new mails as they

New Detector Done

Much better but I'm still not happy with it - camshift + backproj + kalman means that the marker coordinates are a lot smoother with far less noise (obviously) but the nature of detecting markers in segmented video still leads to a less than robust implementation. There's room for improvement and I still need to add in some form of input dialog for naming markers (and I must confess I am CLUELESS on the c++ side for that.....wxwidgets? Qt?) but I'm that little bit happier. As per usual I had hoped for a video, but the lack of a dialog makes configuring things into a manual process (I've got basic save/load support working but given how sensitive this is to lighting still its a lot of messing around) hence I'm delaying yet again. Given my page views though I don't think I will be disappointing many people. What is frustrating is the amount of time I've had to spend on basic work with computer vision rather than looking at the actual interactions for this

Rewrite of fiducial detector

Its the last thing I want to do - I've roughed out code for most of the UI elements, the plumbing for the back-end works (although you can hear it rattle in places and there is considerable scope for improvement) but the marker detection code just isn't up for to the job and is getting a rewrite to use camshift and a Kalman filter. I tried the Kalman on the current code and its effective in smoothing the jitter caused by variations in centroid position but the continual loss of marker and the extreme numbers I am having to use to sense when the markers are engaged/unengaged is making it a frustrating experience. I MUST come up with something working by Monday so that I can do something with this and was hoping to be tweaking various parameters of the interaction today but I'm going right back to stage one. Very frustrating but I ran a few experiments with the camshift algorithm and feel its required to make the air-writing implementation flow smoothly. All nighter it lo

Drag Drop - a gesture design delema!

So I've run into an interesting interaction design problem. I've implemented some very basic list interface elements and initially supported the scrolling interaction via dwell regions. I'm unhappy with this for a number or reasons : 1) Dwell regions are not obvious to a user since there is no visual feedback to the user as to their presence. While I can provide feedback, there are times where I may choose not to do so (e.g. where the dwell region overlaps with the list). 2) Dwell regions when combined with other UI elements can hinder users interaction - e.g. if a user wishes to select an item that is within the dwell region and the dwell region initiates the scrolling behaviour causing the users selected item to move. 3) Interaction is very basic and I dont really want to implement any more support for these. The obvious alternative to a dwell region though is drag and drop (or in the case of OS6Sense, pinch and unpinch) however since these are gestures, there's

Another couple of observations

Schwaller in his reflections noted that developing an easy way to calibrate the marker tracking was important. I've observed that for application development, providing alternate input methods is equally important...quick a general usability principle of course, and all harkens back to providing multiple redundant modalities but.... My framework is about 50% of the way there, I'm becoming VERY tempted to look at a native android client but on x86 since I have the horsepower to drive things. If I had more time I'd go for it but...

Some observations/questions

Study delayed since I think I can make progress with the prototype and answer some of my questions while opening up new ones :/ I'm glad I know this sort of last minute thing is quite common in research or I might be panicking (3 months to go, omg!). I'm still having problems with marker tracking due to varying lighting conditions.  At home, my "reliable" green marker doesn't like my bedroom but is great downstairs and in my office. Blue/red/yellow - all tend to suffer from background noise. I may have to try pink! Basically I know that colour based segmentation and blob tracking is a quick and easy way of prototyping this, but real world? Terrible! If using dynamic gestures what are the best symbols to use? In fact is any semiotic system useful for gesture interaction? One could also ask are symbolic gestures really that useful for a wearable system.... Where should a camera point? i.e. where should its focus be? I've found myself starting gestures sli

Another Quick Update

I've been very busy putting together a framework to support a number of small applications for implementation - the apps are intended to be nothing more than proof of concept and to explore some of the interaction issue e.g. are dwell regions a better option than selectable areas (we're in eye tracking territory now)?; can these be applied to navigation?; How do we implement a mobile projected UI (terra incognita I believe)? The framework is largely event/message driven since it affords us with loose coupling and dynamic run-time binding for both messages and classes ~ if I wasn't farting around with abstraction of the services (useful in the longer term...services become both sinks and producers of events) it would probably come in at < 200 lines of code... The point being while I'm not supposed to be writing code at this stage I am and hope to have at least a video by the end of the weekend (yes a week late).

VERY excited!

Back to the research today and today is the day where I had set myself the serious goal of knuckling down and rewriting my introduction and literature review because I am VERY unhappy with both. I'd finished up my study definition document for my exploratory study next week and was doing some research into social acceptability and gestures...when it hit me. Most of the research suggests that only discrete gestures are socially acceptable (thus sixthsense/minority report style interaction is unlikely to be accepted by users in many social situations) so I asked myself : 1) Why look at how users naturally perform the gestures? Good question....and to be honest, because I honestly don't KNOW what I will find out. I *think* I know, but theres a huge gulf there! 2) How do I make a discrete gesture based system? and I had also been asking myself : 3) How do I expand the number of states that I can represent using my current implementation? And it hit me like an express train

Do with have a video?

Yes, we have a video! http://www.youtube.com/watch?v=GXlmus93o68 I wasn't intending to work on any code this weekend but I felt compelled to try out the recognition server and run another set of tests but with the Logitech C900 in place. Results were an improvement on the PS3 eye, in part due to the better low light capabilities, in part due to the camera placement, and in part due to the wider angle. Some anecdotal notes : The recognition server provided seems to perform better that the unistroke implementation - I still need to sit down and do the numbers but I wouldn't be surprised if it wasn't significantly better. I suspect recall for all but the most basic figures/shapes provided via the default unistroke implementation will be poor amongst users. On the flip side, most of us know the alphabet! Big problem with the use of fiducials on the end of the fingers - they become obscured during natural hand movements! I ended up cupping the marker in my hand and s

Update.

Just an update - while I had originally approached this with the intention of releasing the code as open source my findings regarding....well various aspects of this project, but relevant to this aim, the code itself, means that I'm putting any software development on the back burner for the next few weeks while I perform a study into how people naturally perform gestures. I'm also looking at some options to improve certain show stopping issues with the system (primarily the limited FOV of the webcam). Any code that does emerge for the project, at least for version 0.1 is unlikely to be very robust but I think that can be overcome : I'm currently thinking that broad colour segmentation followed by some form of object matching technique (e.g. SIFT/SURF) should make quite a robust and reasonably fast algorithm for marker detection however if the FOV problem cant be solved, I actually think that ANY vision based system is inappropriate for this sort of interaction style. Y

Finally...

Children back at school and I'm back off my hols (a rather interesting time in Estonia if you're interested). I've spent most of the last week becoming increasingly frustrated with my attempts at image segmentation. I've moved to a c++ implementation for speed and, while the VERY simplistic HSV segmentation technique I am using works, the problem is that I cannot get it to work robustly and doubt that it will ever do such. I've now covered the range of available techniques and even tried to plumb the depths of just emerging ones and it seems that every computer vision based object tracking implementation or algorithm suffers for the same issue with robustness (openTLD, camshift, touchless, hsv segmentation and cvBlob etc etc). YES, it can be made to work, but issues include (depending on the algorithm) : - Object drift : over time the target marker will cease to be recognised and other objects will become the target focus. - Multiple objects : During segments

Enabling External Screen

Another slow week ~ parenting, employment, school holidays and academia do not mix well. I have the hardware up and running how I want now. My initial build with the projector on the right and camera on the left had 2 problems. Firstly, as previously noted, its preferable to have the camera on the side of the users dominant hand since this will be used the most. Secondly...the PK301 has a small fan for cooling and having this on the right meant that the exhaust heat was blowing onto my neck and was hot enough to be very uncomfortable. So today I've switched things around, tied things off, added extra washers (oh, the technology!) for stability and, apart from the need for a more stable joint to hold the projector in a position so that it is level with the users view, I am reasonably satisfied with my budget implementation. Not $350...about £350 but then I could have bought a cheaper projector (and may well pick up a ShowWX for testing) and come under budget. I still need to add a

Fixing "Corrupt JPEG data: n extraneous bytes before marker" error

http://www-personal.umd.umich.edu/~dennismv/corruptjpeg.html jdmarker.c lives in the OpenCV 3rdparty/libjpeg directory, edit out the appropriate line (search for "EXTRANEOUS"), make and install....happy days. Edit: Hmmmm actually the above doesn't work since OpenCV is linking the installed library rather than the 3rdParty version and I can't seem to convince CMake to use the local one...ahhh well one for another day since redirecting standard err to /dev/null works just as well.

Hardware Design

I finally took the plunge and did the epoxy wields on the prototype rig using some Bondloc titanium epoxy and it works really well, setting hard in just a few minutes and rock solid after 15. While the rig is incredibly primitive it does allow me to shoulder mount both the projector and camera and do so so that they are stable. I just need to make a final decision re placement (left vs right for camera/projector) ~ my initial take was to place the projector on the right so as to be in line with my dominant eye, but I think of more importance/potential is to have a better correlation between the forward facing camera and the dominant hand for pointing...which should also reduce occlusion of the projection. I'm slowly uncovering papers in this area and found another one today Designing a Miniature Wearable Visual Robot which details the design rationale behind a robotised wearable camera. Mayol et al (2002) use a 3D human model to examine different frames of reference and requirem

Machine Learning & OpenCV

Slow progress. I've spent the last couple of days generally looking around at Neural Networks and OpenCVs machine learning code. My aim was to find some relatively simple to implement examples of NNs and form a basic impression of the applicability of the techniques to the task of skin and marker recognition in a video stream. Now I should point out that my understanding of most ML techniques and NNs is rudimentary at best and, at this point, that's fine since I am looking for practical techniques that can be employed at a rapid prototyping stage (hence the cost of experimenting with a proposed solution to a given problem should measure in the hours rather than days); so I am not going to go into too much depth on the AI part of things here. What I do need to do though is define the methodology and results of the experiments I've been running so far since a write-up at this stage will form a useful appendix to my thesis and demonstrate the design process and rationale b

handwriting Recognition

More of a note post (I've been busy with work/children this week). I noted that one omission in the demonstrations of SixthSense was any form of text input. I want to address this by the provision of an "AirWriting" interface but had been struggling to find any form of decent hand-writting recognition software for Linux. However, Bret Comstock Waldow (now theres a name for the 21st Century!) has come to the rescue with ink2text and SHIP , which, as far as I can tell, runs the MS tablet hand-writing recognition DLLs under Wine and provides access to the routines via a service. Brilliant! I am getting quite excited at what may be possible by gluing these bits together! As a side note I also acquired a PS3eye and hacked it apart last night and was please to find support in Ubuntu 10.10 for it out of the box (so to speak). Colours seemed somewhat subdued in comparison to the c910 however the effects of the superior frame rate are obvious. Roll on weekend!

Markers

Playing with markers today yielded some interesting results ~ I'm using a rather brute force approach in opencv by using InRangeS against an HSV image, and have pulled the ranges via an app to obtain a sample of representative pixels. Quick and dirty, but I'm encouraged by the results. False positives are high and its not robust in terms of working across lighting conditions but in-part the problem is the choice of material for the markers (electrical masking tape) which suggests : CVFB-R5: Markers must be composed of a material which is minimally reflective. I need to define "minimally" with more precision obviously but I think the above would improve the detection in varied lighting conditions and possibly reduce false positives if I resample. And yes, I know this should have been obvious but I have to harken back to my comments about a lack of design rationale... I'd be reasonable confident that with a few improvements that this would roughly match the fu

git repository, google project

First code commit! I've not used git before for source control (my programming over the last 5 years has been a solo effort so no need for anything beyond tarball source control) hence I just wanted to make some notes for myself on how to use it. Firstly, code is hosted on google at https://code.google.com/p/os6sense/. To check out a copy: git clone https://morphemass@code.google.com/p/os6sense/ os6sense There's very little there yet unless you are interested in just how badly python can be mangled. And for my own notes, to add a file, commit changes and update the repository : git add filename.ext git commit -m "Insert Comment" git push origin master I'll put together a download at some point in the future

Hand tracking

My initial attempt at bolting all the bits together has run into a delay since I've decided to try and epoxy wield some parts together and don't have any suitable,  so yesterday my focus was on marker-less hand tracking. There are a lot of impressive videos on youtube showing hand-tracking when you throw in a moving background, varying lighting and a video stream that 99.999% of the time wont contain the object you want to track (but may contain 100000s of similar objects), well things don't tend to work as well as we would like. The last time I looked at hand tracking was probably 5 years ago and I wasn't too impressed by the reliability or robustness of the various techniques on offer and I cant say I'm any more impressed today. I recently had a quick look at various rapid approaches - touchless works but is marker based, TLD works but loses the target in long video and ends up learning a different target (but might be applicable if some form of re-initiali

Prototype Hardware Rig

Its late here so I'm a brief post. As I've said, hardware wise there are a lot of options as to where to place the various components for a WGI and a few studies out there have looked at the various components in some depth (TODO refs). However an obvious constraint for this project is available resources (hardware, time, money etc) and when it comes to the act of physically assembling thing, and doing so quickly, for my first build it is almost a matter of just making the best out of things. Hence I present the rigging for O6Sense FG v0.01 which has been inspired by observing the trend towards the wearing of large closed-cup headphones and the opportunities this opens in having a "socially acceptable" wearable prototype. What is it? Its a pair of cheap headphones with the cups cut off and 2 Rode pop-shield mounts fitted into the rear wire loops. I cant help but feel some providence at work here since not only is the rear arm a snug fit the screw holes are the ex

FOV - camera options

So continuing looking at cameras, firstly let me be clear I have a VERY limited budget for this project having already pushed out the boat to buy an Optoma PK301 (I'll cover pico-projectors at a later date) hence commercial options such as this HQ lens and pre-modded IR cameras are just out of my price bracket. Hence the PS3 camera is looking very tempting given they can be picked up on ebay for less than £15 and a large range of hacks have already been done for them. I wanted to document my comparison of the various options I have considered though : Name FOV (degrees) fps320 fps640 fps1280 Cost PS3 Eye 75/56 120 60 NA £15 C910 83H 1 60 60 30 £70 Kinect 58H (IR)/63H (RGB) 30 30 NA £100 Xtion 58H £120 Samsung SII 75??? ?? ?? 30 NA The above table is incomplete obviously ~ I've thrown in the SII since I have one available but I can't find any specifications on the camera, even from the datasheet hence the numbers are a guestimate based on a comparison with the

First captures done

I made my first set of captures today - I'm not sure they are usable though since the act of role-playing the system use brought up a number of interesting issues : Worn Camera Position Mistry using Sixth Sense[1] An option in regards to the camera position is to simply place it in the same position as used by Mistry. However my concerns here are 2 fold: 1) Design rationale. There is no design rationale as to the placement of the camera.I don't mean an extended examination of the rationale, just a simple overview of the thinking behind the placement. 2) Ergonomics. I can't help but think how uncomfortable that position is going to be to use for protracted periods of time (gorilla arm effect) or after a long day. Also in that position what about people with a high BMI - isn't the camera angle going to be greater than 90 degrees? EDIT: Another couple of concerns in the back of my mind seem pertinent: 3) Social constraints. Having ones arms so far forwards is

Some notes - opencv

Just quickly throwing a recording app together yesterday I found that the video size wasn't being changed - a little digging suggests that the reliance on icv_SetPropertyCAM_V4L is always going to fail if the change in resolution between the width and height calls result in an unsupported resolution on the first call. Why isn't a simple call to set the video size with both height and width  parameters by exposing icvSetVideoSize supported? Its not my aim to patch opencv though, so for my purposes I've updated the values for DEFAULT_V4L_WIDTH and DEFAULT_V4L_HEIGHT in highgui/src/cap_v4l.cpp for 1280x720 and rebuilt. Yes its a fudge, and if I remember I'll have to bug it. But with that fixed I have a little recorder application ready to go with the only issue left to solved being, well, the usual fun games and open source politics. I get the following error when recording : Corrupt JPEG data: 1 extraneous bytes before marker 0xd0 Reading this http://forums.quickcamt

The Patent

http://www.freepatentsonline.com/20100199232.pdf I was somewhat surprised to come across a patent for Sixth Sense given the initial declaration that the code would be open sourced. I was even more surprised reading the contents of the patent at how general it is...oh hum lets not go there apart from to say I'm not a fan of broad patents, but I wanted to bring it up since 1) it is the only "detailed" source of information on the implementation of Sixth Sense and 2) its useful to acknowledge and recognised since I don't particularly want to be "trolled" in this research. So yes, its out there and its worth a quick skim through (or not since a "clean room" implementation might be advisable but too late for me!) since it tells us that the source for Sixth Sense is largely based on several open source projects (see 0120). Touchless is used for fiducial recognition, the $1 Unistroke Recogniser algorithm for gesture commands, and ARToolkit for the out

Welcome to the Open Source Sixth Sense WGI Project!

Its been over two years since Pranav Mistry announced that his Sixth Sense project would be made open source. Since then we've heard little about this technology or the project and like many AR point technology research examples, this appears to have become abandonware. So when it came around for me to pick a topic for my Masters thesis I couldn't help but think it would be a great opportunity to do a design and build project for a similar system, investigating the HCI aspects of this novel technology as the focus...and that's what I'm doing. First though I need to build one and along the way I couldn't help but think 2 things : 1) This is also a good opportunity to build my first open source project so that, if other researchers want to explore this technology, an artefact exists allowing for rapid development of a research system. 2) There's also an opportunity to examine an interesting concept of "The Inventor as the User" as a UCD perspecti