- Total Cost: $4.73
- 40 HITs, including both text transcription and voting HITs.
- TurKit code for this experiment: code.js, all files
- NOTE: The experiment was stopped prematurely, so you probably can’t run the code.js file using the provided database, since the program will try to make calls to MTurk for HITs under my account. However, all the partially completed HITs have been written out to a file called output.txt.
- TurKit Version: 0.1.37
We are in search of a task that lends itself to iteration — a task where it is easier to understand previous people’s work than it is to redo that work. We did an experiment in the past, before this blog, where people attempted to decipher extremely poor handwriting (page 6). This handwriting seemed impossible for anyone to decipher alone, but by building on each other’s work, turkers were able to transcribe it almost verbatim.
Of course, we didn’t have a control in that experiment, so we don’t actually know that nobody could decipher it all on their own. So we are now running a set of experiments comparing iteration and non-iteration for handwriting recognition.
Well, almost. It is difficult to write with a consistent poorness, so instead I wrote passages with the text tool in the Gimp, and obfuscated them with a distortion filter. To be precise, I used the “Sans” font, size 22 pixels. Then I distorted each passage with Filters–Noise–Spread, using 9 pixels for both horizontal and vertical spread. Here is a result:
We then adopted the experimental design of a previous blog post comparing image description writing iteratively and non-iteratively, except we used 8 iterations for both conditions instead of 6. We also paid $0.10 for each iteration instead of $0.02 — we tried $0.02, but it didn’t seem like we were getting any takers.
Even with $0.10, the experiment ran for a few days without completing, and I could already see some room for improvement, so I shut it off prematurely.
Here is the final iterative version of the passage shown above:
I had intended to hit the nail, but I’m not a very good aim it seems and I ended up hitting my thumb. This is a common occurence I know, but it doesn’t make me feel any less ridiculous having done it myself. My new strategy will involve lightly tapping the nail while holding it until it is embedded into the wood enough that the wood itself is holding it straight and then I’ll remove my hand and pound carefully away. We’ll see how this goes.
The highlighted word should have been “wedged”. This transcription also fixes — or almost fixes — a couple of spelling errors in the original passage.
The following pages show the iterative and non-iterative submissions for each of the three blurry passages. Note that the iterative submissions build on each other, while the non-iterative submissions are all independent.
Nail Passage : After eight iterations, turkers transcribe the text with only one error, shown above. We only have 5 non-iterative responses — all of them essentially say that the text is unreadable.
Boat Passage : The first seven iterations don’t transcribe any words. The last iteration is a promising start. We got 7 non-iterative responses before terminating the experiment, and one of these is even more promising than the last iterative response.
Babysitting Passage : In this passage, we solicited non-iterative responses before the iterative ones. One of the responses is very good, with only about seven missed words. The iterative process only gets the first two iterations before the experiment stops.
We stopped the experiment because it was taking a while, and many people were submitting responses that essentially said “this text is unreadable.” Empirically, the text appears to be mostly readable to multiple people (the major contributors from each experiment were different turkers).
I hypothesize that the real problem is convincing turkers that progress is possible, since it looks impossible. An early iteration for the Nail Passage made a good effort, and laid the groundwork for future iterations. None of the subsequent turkers iterating on this passage complained about readability — at least not as an addendum to the transcribed passage — which suggests that people are more comfortable with this task after it has been broken down a little.
Another related problem is convincing turkers that it’s ok to just do a little bit, when faced with the entire passage. Most turkers would either do none of it, or make an attempt at most of it. One counter example is the very first turker in the Nail Passage, who attempted to transcribe only the first line or so. But then the voters voted against it. So even the voters need to be convinced that it’s ok to do just a little bit.
The plan for version 2.0 of this experiment is to have a textbox beneath each word, and instructions saying it’s ok to contribute only 1 or 2 words. The hope is to make turkers more comfortable contributing just what they are able. This should also make a better comparison between the iterative and non-iterative conditions, since it will be easier for the non-iterative contributors to make guesses on individual words, without feeling like they need to transcribe the entire text. These guesses can then be combined programmatically later, similar to how we combine tags from non-iterative responses in the tag cloud experiment.