ProPublica

Journalism in the Public Interest

Cancel

The ProPublica Nerd Blog

Data-Driven Journalism’s Secrets

.

Hassel Fallas is a data journalist at La Nación in Costa Rica. She’s visiting ProPublica as a 2013 Douglas Tweedale Fellow from the International Center for Journalists and as ProPublica’s October 2013 P5 Resident. (Krista Kjellman Schmidt/ProPublica)

Lea este artículo en español.

I’ve spent the last few weeks in the U.S. on a Douglas Tweedale Memorial Fellowship with the International Center for Journalists, talking to some American newsrooms about how they approach data-driven journalism. Here’s a bit about what I’ve learned.

The best way to start doing data-driven journalism is simply to start. When you’re just getting started, you really have nothing to lose. With every mistake, you gain experience and knowledge for your personal growth and to improve the quality of the journalism you are practicing.

It’s easy to say, simply, “I am not good at math,” and decide that data-driven journalism isn’t for you. Well, I wasn’t good at math either until I decided to tear down that barrier, and found that learning is better and easier by applying theory to real-life projects.

In 2008 before I even knew there was such a thing called “data-driven journalism,” I proposed a small, simple analysis of a data set of international tourists visiting in Costa Rica. On the way, I started to learn about Excel formulas, to calculate variations, to analyze totals by year, seasons, quarters, etc.

Gradually, my skill at correlating data began to increase, as did the complexity of my projects. These days I’m working in the Data Unit of La Nación in Costa Rica.

Here are three things you can start today to increase your data-driven journalism skills:

First, talk to developers and engineers about how to approach a data analysis project – what new software to use, formulas and methodologies, given the goals of your project.

Second, stick with it! Don’t give up too easily when learning to use a new technique or software like Open Refine, Tableau, Tabula or others to clean, analyze or visualize data.

Finally, make sure to share what you’re learning with others. Very often the questions people will ask you show challenges and motivate you to search for the right answers that you hadn’t thought of, increasing your knowledge and encourage you to try different approaches.

What I want to say is: If you want to do data-driven journalism, go ahead and start. Good ways to start learning include online courses, books and tutorials.

If you live in Latin America, you can take advantage of projects like Chicas Poderosas (“Powerful Girls”), which promotes the development of data-driven journalism skills through workshops that connect journalists, developers, designers, animators and storytellers and get them to work together on storytelling projects.

I also recommend global initiatives like Hacks & Hackers, which hosts meetups in many countries in and outside Latin America.

You must also commit to never stop learning. Even after you have developed advanced skills and a deep understanding of the techniques, tools and methodologies of analysis and visualization, there will always be a bigger challenge ahead – bigger datasets, new software to test, new techniques to try and different approaches to generate participation from people for whom your story is important.

Why Data-Driven Journalism Matters

You may ask, “why does data-driven journalism matter to me?” A good summary comes from Sisi Wei, a data journalist at ProPublica, who I talked to while I was on a two-week assignment in that newsroom.

“It allows me to scrutinize and examine information better than ever before. For example, when I was in college, I thought that you interview experts, believe in what they say, then quote them in your articles. But why not take the data results of their research and analyze it yourself? I can check what they are saying. If I analyze the data before I do an interview, I can ask questions about the anomalies that I found or questions on procedures that I understand better than before.”

I know just what she means – it’s very useful and satisfying to be able to scrutinize and examine the information in a better and new way.

Every good story starts with an idea, a question or an observation, said Sarah Cohen, who has been editor of computer-assisted reporting at The New York Times for about a year.

“And then, we look for data or documents that help us to extend the impact of these observations,” she added. We spoke in the Times’s employee cafeteria earlier this week.

That’s how one particular Pulitzer-Prize-winning series of stories, that was reported by Sarah and her colleagues at The Washington Post was born.

The series exposed the District of Columbia's culpability in the death of 229 children placed in protective care between 1993 and 2000.

“I feel that those stories have more impact than traditional investigations, because they show that this is not an isolated incident, that this was not just a one-time problem,” concluded Cohen.

Another of the great benefits of data-driven journalism is how it improves the quality of journalism and the engagement with audiences through visualization and interactive databases.

In Argentina, La Nación in Buenos Aires understands this benefit very well. “Putting data in contact with citizens through news apps or interactive visualizations is the way to activate citizen participation in the mobile and on-demand era,” said Angelica “Momi” Peralta, La Nación’s Multimedia and Interactive Development Manager.

Argentina doesn’t have a Freedom of Information Act. Their open-data government portals have launched only recently, so the data journalists at La Nación build their own datasets from scratch by scraping PDFs, among other techniques. They also share the data they develop in open formats so those interested in the data can get access to it.

“We believe that sharing data is a must. Media must be in front this time, wake up, make useful data sets ‘famous’ and available for others to reuse. Bring data to life, because darkness and corruption is killing people in places like ours, and data that you open comes to light, so once it reaches the hands of citizens (through mobile or visualizations), it will be more difficult for those who want to make it disappears,” Peralta told me via email.

The Idea and the Process

In my experience, succeeding at data-driven journalism is a matter of patience, lots of team work and refusing to give up. Here’s one story from my newsroom about how a project came together.

Last March, during a meeting with my team, we came up with the idea to investigate the recycling practices of households in the 81 regions of Costa Rica, and to see how local governments are supporting these efforts. Immediately, I began to dig into the issue by reading up on the relevant laws and regulations, as well as academic studies on the subject. I also created two databases.

The first set was built by extracting data from the 2011 Costa Rican Census. For the very first time, that Census asked whether or not households were separating plastic, paper, aluminum and glass from ordinary trash.

I assembled the second dataset by calling and requesting information from all 81 local governments.

I knew from the census that 40 percent of Costa Rican households were separating recyclable waste, but beyond that, what I wanted to know was: In what regions are those recycling practices the most extensive? What do people do with the stuff they are separating? Do the local governments handle the recycled waste properly, or is the effort in vain? How many tons are being collected every month, and how much of it is being recycled?

The Census had answers to the first two questions, but not the last two. Those questions could only be answered by asking the local governments. The second database was gathered from that reporting. I took to the streets to confirm my findings, talking with experts, government officials, associations and companies in the communities.

This is an important step, advises the Times’s Cohen. “Data-driven journalists need to spend some time on the street seeing how the data works in the three-dimensional world. The same happens with reporters that work on street. They need to spend some time with the data to see how it is represented. Any record is actually part of something that is happening. And without any of those perspectives, I think you can lose a lot.”

In the end, with the help of some of my colleagues, we mashed the two databases together so we could figure out the exact number and location of households splitting recyclables from the ordinary trash, and where the local government wasn’t actually picking them up in a separated way and instead just mixing recycling bags with garbage.

We now also had a data set of the towns with the highest rates of recycling, and which recycling efforts were really being supported by their local governments.

In parallel to the data analysis, we developed an interactive database that allowed each reader to interact and find their own local information, and so to “tell their own story” using the data.

The Secret Is: Don’t Keep Secrets

Finally, if you are going to start in data-driven journalism, you should know a big secret: Do not keep secrets from the team you’re working with. Share all your findings, drafts and data.

Any idea that is not shared with others is destined to die because the oxygen supply that a data-driven project needs to live depends on how much you share it and how much you nurture it with others’ feedback.

Don’t keep information to yourself. Share all your data and findings at the very beginning. Keep your notes in a place others can access, like a wiki or shared network drive. From the very start, engage developers, designers and multimedia experts with all aspects of your story. They’ll enrich your perspectives and boost the quality of your questions.

You may also find new sources (both data and human) that perhaps you wouldn’t have otherwise thought of, as well as new tools and methods to extract and analyze data. It will help you do your job better.

Talk as much as you can about your idea, even when it is only a project idea. Keep talking when it turns into a project and when it’s a draft and when you’re really far into the project.

And discard the notion that journalists and engineers and graphic designers should work separately. In a newsroom everybody can be doing the work of journalism.

On that point, I like a phrase that I heard during my stay here at ProPublica. It was said by Scott Klein, Senior Editor of News Applications. “Forget about saying to developers: OK, here's the data, work with that. My part is done.” Your project can only be successful if it is really done as a team.

I have confirmed this myself at La Nación (Costa Rica) during the development of a database that reveals the identity of more than 100,000 offshore entities in tax havens. It was a global project led by the International Consortium of Investigative Journalists, in which La Nación (Costa Rica) joined with other newsrooms in different parts of the world and worked together to generate an investigation with global impact.

It would not have been possible if the newspaper hadn’t assembled a multidisciplinary team that was willing to communicate and work together in whatever way we needed to.

My next challenge will be trying to learn the basics of software development. I know there are opposing views on whether or not a journalist should learn to code, but during the past two weeks I have been sitting in the ProPublica newsroom, I have seen why it is important for journalists to have at least basic programming skills.

If you are working with large volumes of data, even basic coding will help you, if only to communicate better with your engineers and to get the most out of your data and to analyze it in the best way possible. And understanding how code works will help you to know the most suitable visualization or interactive technique to deploy.

Again, I know it seems daunting, but the most important advice I have for journalists is to at least try data-driven journalism, and as you master it, keep learning.

Hassel Fallas is a data journalist at La Nación in Costa Rica. She’s visiting ProPublica as a 2013 Douglas Tweedale Fellow from the International Center for Journalists and as ProPublica’s October 2013 P5 Resident.

blog comments powered by Disqus