We work with a lot of data at ProPublica. It's a big part of almost everything we do — from data-driven stories to graphics to interactive news applications. Today we're launching the ProPublica Data Store, a new way for us to share our datasets and for them to help sustain our work.
Like most newsrooms, we make extensive use of government data — some downloaded from "open data" sites and some obtained through Freedom of Information Act requests. But much of our data comes from our developers spending months scraping and assembling material from web sites and out of Acrobat documents. Some data requires months of labor to clean or requires combining datasets from different sources in a way that's never been done before.
For datasets that are the result of significant expenditures of our time and effort, we're charging a reasonable one-time fee: In most cases, it's $200 for journalists and $2,000 for academic researchers. Those wanting to use data commercially should reach out to us to discuss pricing. If you're unsure whether a premium dataset will suit your purposes, you can try a sample first. It's a free download of a small sample of the data and a readme file explaining how to use it.
The datasets contain a wealth of information for researchers and journalists. The premium datasets are cleaned and ready for analysis. They will save you months of work preparing the data. Each one comes with documentation, including a data dictionary, a list of caveats, and details about how we have used the data here at ProPublica.
We've long worked informally with people interested in purchasing our datasets; some of our apps have provided downloads of the data used to build them. We hope that providing a clearinghouse for all of our datasets will help this material reach a broader community and will support, in spirit and financially, our journalistic mission.
The Data Store is a bit of an experiment. We don't know for sure how much interest there is for the data. For now, there are only a few datasets available and it's a manual process to buy them. We'll add more data over time; you can see some of the datasets we'll be releasing in the next few weeks under Coming Soon. We're paying close attention and expect to learn a lot in the first few weeks after launch.
If you have suggestions for datasets we should make available, or features we should add, please don't hesitate to contact us at [email protected].