Update, April 16, 2020: This methdology refers to a previous version of the Political Ad Collector project. That project is now maintained by Quartz.
Political ads on Facebook have come under scrutiny since it was revealed that Russia used such messages to try to influence the 2016 U.S. election. But online political ads are often seen only by a small target audience — making it difficult for the public to check them for accuracy. In order to shine a light on political advertising on Facebook, we built a tool that allows Facebook users to automatically send us the political ads that were displayed on their news feeds. (You can install the tool, known as a web browser extension, for Chrome or Firefox.)
The extension, which we call the Political Ad Collector, is a small piece of software that users can add to their web browsers. When a user logs into Facebook, the extension will collect the ads displayed on the user’s news feed and guess which ones are political based on an algorithm built by ProPublica. Ads that are found likely to be political are made public in a searchable database.
Our tool collects basic information about each ad, such as the Facebook ad identification number and the dates we saw the ad in our system. However, to protect the privacy of users, we automatically remove any personally identifiable information from the ads we collect, including Facebook ID numbers and tracking identifiers, which are tiny images that can be used to identify users. We also remove the names and profile links of the user's friends who have liked the ad and any comments on the ad.
We collect targeting information that Facebook provides with the ad, but we do not connect that information to any data that could be used to identify a user. The targeting data tells users some of the criteria used to decide which ads to display to which user, such as age and location. Facebook users can see targeting information if they click the dots at the top right corner of any Facebook ad and select “Why am I seeing this?”
To determine which ads are political, ProPublica built a machine-learning algorithm to calculate the statistical likelihood that an ad contains political content. This algorithm, called a Naive Bayes classifier, has long been used for identifying spam emails. It works particularly well on classifying text into one of two groups, such as spam or not spam, or, in this case, political and not political.
Before we launched the tool, we trained this algorithm on a list of Facebook posts that we knew were political — posts from parties and candidates, and posts about political issues — and a list of posts that weren’t political — published by big stores and other companies.
If we relied solely on these initial hand-selected posts, our classifier would have been able to reliably find ads published by the Democratic or Republican parties, but it would have missed ads from groups that we didn’t include in our training data. It also would miss how politically charged language and subjects change over time.
So, for our algorithm to distinguish more accurately between political and non-political ads, our tool regularly shows users a selection of ads and asks them to identify which ones they think are political and which ones they think aren’t. These include the ads that appeared on the user’s own feed as well as ads that were shown to other people. Just because a single user tells us an ad is non-political doesn’t mean it will get dropped from our database of political ads, but the algorithm will take that vote into account.
The tool is already being used in several countries, including Germany, Italy, Australia, Austria and the U.S. For each country, we customize the the algorithm to learn to identify political content. The open source code behind our project is available to the public.