In May, ProPublica published an article that took a close look at a computer program used in many jurisdictions to forecast whether criminal defendants are likely to commit crimes if released.

To test its accuracy, we obtained the risk scores for thousands of defendants in Broward County, Florida, and tracked down how many were arrested again within two years. We found that about 60 percent of those classified as higher risk went on to commit new crimes, a rate that was the same for both black and white defendants.

When we analyzed the 40 percent of predictions that were incorrect, we found a significant racial disparity. Black defendants were twice as likely to be rated as higher risk but not re-offend. And white defendants were twice as likely to be charged with new crimes after being classed as lower risk.

The company that makes the software we analyzed, Northpointe, has released a paper defending its approach, arguing that its test is unbiased because it is equally predictive for black and white defendants. The company said the inaccurate predictions were irrelevant because they were of “no practical use to a practitioner in a criminal justice agency.”

We have reviewed Northpointe’s claims and stand by our conclusions and findings.

In its paper, Northpointe dismissed the racial disparities we detected by saying “this pattern does not show evidence of bias, but rather is a natural consequence of using unbiased scoring rules for groups that happen to have different distributions of scores.” In simple terms, the company is arguing that one would expect more black defendants to be classified as higher risk because they are as a group more likely to be arrested for new crimes.

That is true, but not the whole story.

To understand what’s really going on, think of the Northpointe software as a tool that sorts defendants by those deemed at higher risk of committing new crimes and those less likely to do so.

The customers for this product — mainly judges — use its predictions to help make pivotal decisions about peoples’ lives. Who can be diverted into drug treatment programs? Who can be released before trial? Who should be sent to prison for the longest possible time? Who should get a lenient sentence?

Northpointe says the test serves customers well since it is both informative (60 percent accuracy is better than a coin flip) and unbiased (it correctly sorts black and white defendants at roughly the same rate.)

But things look very different when analyzed from the perspective of the defendants, particularly those wrongly classified as future criminals.

Here’s how that played out in the real world of Broward County’s courts:When the algorithm crunched data on black defendants, it placed 59 percent of them in the more likely to re-offend category, a larger group than the 51 percent who actually did commit crimes.

There were too many people in that high-risk group, which meant that the most likely error for black defendants was to be wrongly classified as higher risk.

By contrast, the software underestimated the number of white defendants who would commit new crimes, classifying only 35 percent as higher risk, short of the 39 percent who actually went on to be arrested. This meant the most likely error for white defendants was to be incorrectly judged as low risk.

Court records show that 805 of the 1,795 of the black defendants who did not commit future crimes were deemed higher risk by Northpointe. The test was much more accurate for white defendants who did not commit future crimes. Only 349 of the 1,488 categorized as higher risk were not arrested on new charges over the next two years.

When we calculated those percentages, it turned out that 45 percent of black, higher risk defendants were misclassified as compared to 23 percent of comparably scored white defendants.

An important question is: Are black people still disproportionately getting higher scores even while taking into account differences in the rates of re-offending, criminal history, age and gender? The answer is yes. We used a statistical technique called logistic regression to control for all those variables, and found black defendants were still 45 percent more likely to get a higher score.

Andrew Gelman, professor of statistics and political science at Columbia University, said the issues raised by the Northpointe algorithm are not uncommon in statistics.

“This is a situation where even if the system could be calibrated correctly” — meaning, it’s equally accurate between racial groups — “it can be unfair to different groups,” Gelman said.

“From the perspective of the sentencer it might be unbiased,” he said. “But from the perspective of a criminal defendant it could be biased.”

Bias against defendants is what the U.S. legal system is designed to prevent. “The whole point of due process is accuracy, to prevent people from being falsely accused,” says Danielle Citron, law professor at the University of Maryland. “The idea that we are going to live with a 40% inaccurate result, that is skewed against a subordinated group, to me is a mind-boggling way to think about accuracy.”

If you are interested, read our more technical response to Northpointe’s criticisms. You can also read our annotated responses to an academic paper that defended Northpointe’s approach.