Breaking the Black Box

When Machines Learn by Experimenting on Us

by Julia Angwin, Terry Parris Jr., Surya Mattu and Seongtaek Lim, ProPublica October 12, 2016

As we enter the era of artificial intelligence, machines are constantly trying to predict human behavior. Google predicts traffic patterns based on motion sensors in our phones. Spotify anticipates the music we might want to listen to. Amazon guesses what books we want to read next.

Machines learn to make these predictions by analyzing patterns in huge amounts of data. Some patterns that machines find can be nonsensical, such as analyses that have found that divorce rates in Maine go down when margarine consumption decreases. But other patterns can be extremely useful: For instance, Google uses machine learning to understand how to optimize energy use at its data centers.

Depending on what data they are trained on, machines can “learn” to be biased. That’s what happened in the fall of 2012, when Google’s machines “learned” in the run-up to the presidential election that people who searched for President Obama wanted more Obama news in subsequent searches, but people who searched for Republican nominee Mitt Romney did not. Google said the bias in its search results was an inadvertent result of machine learning.

Sometimes machines build their predictions by conducting experiments on us, through what is known as A/B testing. This is when a website will randomly show different headlines or different photos to different people. The website can then track which option is more popular, by counting how many users click on the different choices.

A particular type of A/B testing software — called Optimizely — is quite common. Earlier this year, Princeton researchers found Optimizely code on 3,306 websites among 100,000 sites visited. (Optimizely says that its “experimentation platform” has been used to deliver more than 700 billion “experiences.”)

The Princeton researchers found the Jawbone fitness tracker website was using Optimizely to target a specific message to users at six geographic locations, and that one software company, Connectify, was using Optimizely to vary the discounts it offered to visitors. During the presidential primaries, the candidates used Optimizely to vary their website colors and photos, according to a study by the news outlet Fusion.

“People should be cognizant that what they see on the web is not set in stone,” said Princeton researcher Dillon Reisman.

Many news sites, including The New York Times and the New York Post, use Optimizely to evaluate different headlines for news articles. Remy Stern, chief digital officer of the New York Post, said that the website has been using Optimizely to test headlines for several years. Two to five headlines will be randomly shown until the system can determine the most popular headline.

“In the old days, editors thought they knew what people wanted to read,” Stern said. “Now we can test out different headlines to see what angle is most interesting to readers.”

The Post’s online headlines are totally different than the ones that are crafted each evening for the next morning’s newspaper, he said. The print headlines use a lot of idioms, such as calling the New York City Mayor “Hizzoner,” that don’t work online, Stern said.

The New York Times just began testing web headlines on its homepage late last year, said senior editor Mark Bulik. “We can tell which stories on the homepage are not meeting our expectations on readership, so we try to come up with alternatives for those headlines,” he said.

Bulik said sometimes the winning headline is obvious within minutes — sometimes it takes as long as an hour for test results to become clear. A good headline can increase readership dramatically. For example, he said, the headline “Thirteen of his family died from Ebola. He lived.” increased readership by 1,006 percent over “Life after a plague destroyed his world.”

The winning New York Times headlines are used on the homepage, and increasingly inform editors’ choices for the final headlines for the online article and in the print newspaper, said Carla Correa, social strategy editor for the Times.

Correa said that the Times tries to avoid one of the perils of optimizing headlines — the “clickbait” headline that promises more than it delivers. “If we see a headline that we think is misleading to readers, we push back,” she said.

To show you how A/B testing works, we’ve gathered headline tests that have run on the websites of The New York Times and the New York Post. And, because New York Post headlines are so much fun — we also built a Twitter bot that automatically tweets out all the headlines that the Post is testing on its stories. Follow it here.

Additional design and production by Rob Weychert and David Sleight.