encryptio.com

This sentence is very meta.

Driver Thoughts

The driver is the single most CPU-intensive piece of the project, moreso even than the window manager.

Quick overview of the driver: it takes the images from the camera and turns them into touch events, such as “new touch,” “moved,” “dropped touch,” etc. It doesnt directly deal with gestures or window interaction at all, its only care is getting camera images and turning them into touch information.

Problems to be Solved By the Driver

The driver has two major problem parts: turning camera images into (uncorrelated) touch data per frame, and relating that data to previous frames to know which touches are the “same touch.” The latter is the harder one.

In this document, the uncorrelated touch data is called “tap info,” and the correlated touch data is called “touch info.”

Camera Images to Tap Info

My implementation of this is farily simple. When starting, take a few images and average them to get a “background” image. Then, for each frame after that, we subtract the background, then threshold the image to get a binary touch map. We go through that touch map and use the algorithm described here to join the binary data into touches, keeping track of both size (amount of binary “yes” in the touch) and power (amount of brightness in the “yes” area). Then, we iterate through this and throw away the small size and small brightness touches.

Tap Info to Touch Info

This is the tough one. A naïve implementation, in psuedocode:

olditems = []
infinite loop:
    newitems = get_tap_data()
    replacement = []
    for each ni in newitems:
        for each no in olditems:
            if they<span class="ap">’</span>re really close:
                remove the item from no
                update it
                put the item in the replacement array
        # if we get here, then it<span class="ap">’</span>s a new touch, not an existing one
        create a new touch and put it in the replacement array.
    filter out items from olditems that are older than MAX_DEAD_TIME seconds.
    append replacement to olditems.
    do socket send stuff.

There are a few problems with this:

There is another relatively obvious solution to the third item in this list: Instead of looking at each point individually, group them into sections (that are close together) and for each section look for the one change that will cause the least movement.

Another quality improvement, is to instead look at the ones that would take the least amount of effort (or, “least turbulence”), rather than the least amount of movement. That is, if we see two horizontal points close together, and the next frame theyre both moved over about the distance we saw them at, then we assume that instead of a switch, die, and create (as given by the “least movement” criteria) we move them all (as given by the “least effort” criteria).

The least effort algorithm lets us increase the minimum distance farily safely, which increases our maximum movement speed.

But, even this algorithm has a downside: if you hold two points, drop the first, and add another in the same frame, theres a strong likelyhood of that gesture being taken as movement. Luckily, this gesture is rare and would require precise timing, so the problem is minimized.

I havent thought up a decent algorithm to calculate “least effort” yet.

Optimization

Identifying Connected Components

The algorithm for indentifying connected components makes a few very wrong assumptions. Its trivially possible to track the objects iterating only once. And in fact, very quickly. This bad assumption is in the “join” operation, when two unconnected blobs are connected by a corner point. The optimization is that you dont need to create two data sets, only one. When you find that two need to be joined, you simply iterate over the previously processed pixels and replace the number on the fly.

Even then, theres still room for improvement. If your algorithm doesnt care about reading the map afterwards, only about keeping track of your own seperate data during the scan, such as my tap algorithm, you dont need to iterate over *all* the previous pixels. You only need to iterate over the past ROW_SIZE+1 pixels. This is because the others arent looked at during the scan. In real-time, this is a massive speed-up.