ProPublica

Journalism in the Public Interest

The ProPublica Nerd Blog

Nonprofit Explorer Update: Full Text of 1.9 Million Records

.

(Rob Weychert/ProPublica)

We have updated our Nonprofit Explorer news application, adding raw data from more than 1.9 million electronically filed Form 990 documents dating back to 2010. This new trove includes the full text of more than 132,000 forms for which we did not previously have complete data.

In addition to making the machine-readable XML files available to download, we are publishing the full text of many of these documents as human-readable web pages. These appear similar to the PDFs that have appeared on Nonprofit Explorer in the past, but their text can be copy-and-pasted, and they are easier to browse and analyze.

You can find the XML and HTML of e-filed returns by clicking the buttons labeled “Full Text” and “Raw XML,” which appear on a nonprofit organization’s page under each year for which the data is available.

The release of the XML documents was made possible thanks to a 2015 lawsuit brought by Public.Resource.Org, a nonprofit organization that makes government documents available to the public. The suit compelled the IRS to fulfill Freedom of Information Act requests for electronically filed Form 990 documents in “Modernized e-File” XML format. The IRS started sharing the XML versions of e-filed forms as a public dataset starting in 2016.

For several years, Public.Resource.Org and its founder, Carl Malamud, have helped ProPublica acquire the page-image versions of Form 990 documents from the IRS. These documents make up the bulk of Nonprofit Explorer.

Malamud sees the release of XML data as a huge improvement.

“XML data is machine-processable,” Malamud wrote in an email to ProPublica. “You can instantly access the value of any specific field in a Form 990 (such as CEO compensation) from a computer program.”

Of the comparative advantage between XML and a page image, Malamud made an analogy. The raw XML data is like a spreadsheet, from which you can extract data easily. As for a page image, it’s as if “you make a printout of the spreadsheet, take a picture on your cellphone of the printout, and post the picture on Instagram.”

“Releasing the e-file data instead is vastly superior and will make the Form 990 a much more useful tool."

While the XML files provide the most complete and useful data possible for e-filed Form 990 documents, they’re formatted for computer programs to understand, not humans. So the IRS provides stylesheets that a programmer can use to make the documents look more like the paper forms that make up a Form 990 tax return. We adapted open-source code based on those IRS stylesheets to make cosmetic transformations for Form 990 documents from 2013 and later.

Most nonprofits file their tax documents electronically. However, there are still thousands of nonprofit organizations that file them on paper. We will continue to provide PDF versions of these documents in order to make sure we’re providing information for as many nonprofit organizations as possible.

Our work on the XML-based data is just beginning. In the coming months, we will continue to improve Nonprofit Explorer and the Nonprofit Explorer API, providing users with new ways to explore and analyze tax-exempt organizations.