ProPublica’s Guide to Mechanical Turk
This is a guide to journalists looking to use Mechanical Turk in their data projects.
Amazon Mechanical Turk – or mTurk – is an online marketplace, set up by the online shopping site Amazon, where anyone can hire workers to complete short, simple tasks over the Internet. Amazon originally developed it as an in-house tool, and commercialized it in 2005. The mTurk workforce now numbers more than 100,000 workers in 200 countries, according to Amazon. At ProPublica, we use it for tasks like collecting, reformatting, and de-duplicating data.
This is a guide to journalists looking to use Mechanical Turk in their data projects. It’s meant for users who are already familiar with mTurk and are looking for ways to improve their results. Readers who are new to Mechanical Turk should start with Amazon’s mTurk Resource Center.
Much of this document is based on the research and advice of Panos Ipeirotis, a computer scientist at NYU’s Stern School of Business, who maintains a blog of his research into Mechanical Turk.
Before you start putting projects up, there are some fundamentals to get right.
Get the lingo down. Mechanical Turk is great for delegating lots of simple, well-defined requests, which Amazon refers to as “Human Intelligence Tasks,” or “HITs.” People who complete tasks are called “workers” and those who post them are “requesters.” In this guide, the word “project” refers to a batch of similar tasks. “Spammers” are workers who try to make money without actually doing the work by answering with random data.
Make sure your project is suitable for mTurk. Can you easily deconstruct your project into tasks that can be completed independently? Use mTurk. Do these tasks require specialized knowledge? Don’t use mTurk. Are the tasks simple and quick to finish? Use mTurk. Are there multiple correct answers for each task? Don’t use mTurk.
For example, ProPublica uses mTurk to help figure out which companies are getting stimulus money for our Recovery Tracker. We get the data from the government, but in at least 400 cases last quarter, the data identified some companies only by their DUNS number instead of the company’s name. And because the online DUNS database uses a CAPTCHA to prevent scraping, we used mTurk workers to find company names by their DUNS numbers. Each task is simple; a worker plugs a DUNS number into a field on the DUNS website and then copies and pastes relevant company information into fields in mTurk.
Be Careful! Mechanical Turk is a Public Site. Amazon doesn’t allow search engines to index its content, nor does it maintain a public archive of tasks or projects. However, third party programs can still collect information about the projects you’re running, and workers are under no obligation to keep their work secret. To date, we’ve run all of our projects under the ProPublica name because it makes our work more interesting for potential workers. If you’re uncomfortable with someone accessing your work, and if your ethics rules are OK with it, you can run sensitive work under an assumed name. Or, you can follow our lead and make it hard for anyone to scrape your work by allowing only approved workers to see your HITs (just use Amazon’s built-in qualification tests to screen workers based on accuracy rate, location and experience).
Ready, Set, Turk
mTurk is a big marketplace, and you’ll attract workers’ attention by project type, work quantity, pay, and your organization’s reputation. Here are some tips for when you’re ready to assign mTurk workers to a project:
Design Your HITs so they’re hits. Think through the best approach to designing your projects and describing your tasks. Do you have one batch of tasks or are there several to be done in sequence? What information do you need to provide so people can complete a task? We’ve found it’s best to keep tasks as simple as possible. You’ll want to experiment at breaking down projects into different kinds of tasks to see which ones work best for you.
Write crystal-clear and detailed instructions. Workers won’t be able to ask you questions as they complete your tasks, so you need to make sure your instructions are well-written and precise. Any ambiguity will increase your error rate. It will also frustrate workers because they’ll make unnecessary mistakes.
We recommend giving workers as much information about projects as possible, including how you’ll review their work. Newsrooms and nonprofits should underscore their mission, as many workers respond favorably to mission-driven work. Always clue workers into your project’s goals so they’re better positioned to troubleshoot any problems.
Don’t be the judge of your own instructions: ask someone who knows little about your subject to complete a task before you launch your project. Amanda’s favorite test subject is her sister, who has no problem voicing complaints.
Error-proof your project beforehand. Here are a few best practices to minimize the chances that your projects will yield useless results:
Use redundancy. You should never trust the response of a single worker. Instead, delegate each task to at least two but sometimes as many as five workers. The number of workers you should use depends on the task’s difficulty. At ProPublica we often delegate each task to four workers and trust responses when three or more of the workers agree. Our use of several workers for each task is based on Professor Ipeirotis’ research on the subject.
Troubleshoot any complications a worker may face, and adjust accordingly. For instance, for our DUNS project, we originally forgot to plan for situations in which the DUNS number could not be found in the database. This led to workers entering responses like “error” and “no zip code listed” into the “company name” field. The next time we ran this task, we gave workers the option to note that a DUNS number returned an error.
Price your HITs appropriately. Pricing on mTurk is more art than science, unfortunately. Pay too little, and workers won’t complete your tasks. Pay too much and you’ll attract more spammers. At ProPublica, we pay between 1 and 10 cents per task. Professor Ipeirotis recently surveyed tasks on mTurk and found that 90 percent of them were priced within this range. Requesters can also reward workers with bonuses after the project has been completed. But don’t expect the promise of a bonus to work in your favor. Workers rarely trust new requesters who guarantee bonuses for good work.
Use “code words” to ward off spammers. Spammers love multiple choice questions because they can complete tasks quickly by clicking randomly. Avoid this problem by asking workers to obtain an arbitrary piece of data on the Internet, even if you won’t use it. You can use this data to verify that workers actually completed the task and didn’t simply provide answers at random.
Spot-check results. Before we launch a project on mTurk, we try to complete 50 or so tasks on our own, and then compare the results to mTurk’s. The first time we ran a project on mTurk, we ran an old data set we’d completed ourselves and then compared the results. The good news: In 524 cases, data from mTurk matched our own results &ndash and mTurk actually caught 10 errors we made in-house.
Be good to workers. Workers like requesters who treat them with respect. Develop a reputation for prompt payment, clear instructions and fair judgment, and you will attract high-quality workers who will do quick and accurate work.
Get to know your workers, and give them a heads-up when you post to MT. mTurk assigns every worker a persistent ID, but doesn’t show you their name or provide contact information. We’ve worked around this by adding a text field to our HITs that reads, “If you would be interested in receiving email updates about upcoming HITs from ProPublica, enter your email address here. Don't worry, we will only send you messages related to Amazon Mechanical Turk.” Our plan is to send our best workers an e-mail alert before uploading a new project.
Test massive projects. Before posting a massive project, post a batch of 100-200 HITs as a test. Testing has helped us catch possible complications and errors. A $3 test can save you from running $100 of flawed results.
You can see some of these tips in practice in our DUNS project. In the HIT template, we showed workers a DUNS Number, and asked them to retrieve the corresponding company's name and ZIP code. We assigned each task to four workers to ensure accuracy. Since the tasks involved copying text, an arbitrary code word was not necessary. We also asked workers to note when a DUNS number was not listed in the database, or if a company did not have a U.S. ZIP code. We told workers that we would reject their work only if there was evidence of cheating, and we gave them an opportunity to sign up for ProPublica's mTurk team. We paid workers one cent per task. This is on the low side of our pay range, as we underestimated the time it takes to run DUNS numbers through the database. We’ll pay three cents next time.
Uploading Your Project to mTurk.
When you’re actually filling in the details of your HITs, there are some good tips to keep in mind:
Don’t worry too much about your task’s expiration date or the “time allotment per HIT.” Workers can’t search for HITs based on either of these factors, so they don’t generally have an effect on the speed or accuracy of your project. Our recommendation: Just keep the default values.
Always use your company name as a keyword. Add a few more that describe your work. We use “data,” “fast,” and “easy” a lot. Professor Ipeirotis has assembled a list of the top keywords.
We recommend you allow only workers with a 95 percent accuracy rate or higher to view your HITs. If your tasks require knowledge of American culture or slang, limit your project to American workers.
Believe it or not, the bigger your job, the faster it will complete. If your project includes more than 200 HITs and is simple, we’ve found that it will almost always finish within 12 hours. Smaller projects take longer – in one case, a job with 18 HITs took four days to complete and a similar job with 155 HITs finished in less than an hour.
Run your project over a weekend or at night. We’ve found that projects run over the weekend finish faster (very few HITs are loaded up on the weekend). If you want to run your projects during the week, launch them in the evening, U.S. Eastern Time, when American workers are getting home from their day jobs, and the Indian workday is just beginning. We published our DUNS project on a Friday afternoon at 5:30 p.m., and all 401 tasks were completed within six hours.
Higher-paying work doesn’t get prioritized. People favor simple tasks and large projects. One-cent jobs often complete faster than two- or three-cent jobs. That doesn’t mean that under-priced HITs attract quality workers; workers tend to search for easy tasks to complete, and lower-priced tasks tend to be easier.
Project Completed, Now the Review Begins
Unless you ran your project over the weekend, don’t let more than a day pass before you review the results.
Make sure you examine the data closely. If there are a lot of HITs in which workers disagreed about the correct answer, you should check if one worker was responsible for all the errors. You can also check for spammers by calculating whether one worker completed tasks much faster than others. You can see how we analyzed the results to our DUNS project.
Pay workers promptly. mTurk workers are paid only once you’ve approved their work. You should pay workers within one business day of posting your project. If there is no evidence of spam, payment should happen immediately. While Mechanical Turk gives you the ability to reject submissions and refuse payment to workers, we highly recommend against doing so unless there is clear evidence that workers are providing spam responses. Many projects won’t accept workers who have had less than 95 percent of their work accepted, so they are understandably protective of their accuracy rates. Rejecting work that was completed with good intentions is not only unfair, it will also get your company a reputation for being inhospitable to workers.
Our collaborative reporting initiative that connects readers with our reporters to produce stories in the public interest.