Journalism in the Public Interest

The ProPublica Nerd Blog

Introducing DocDiver


The "pages" tab in DocDiver's version of the DocumentCloud DocumentViewer shows how many findings were left on each page.

Today we’re launching a new feature that lets readers work alongside ProPublica reporters—and each other—to identify key bits of information in documents, and to share what they’ve found. We call it DocDiver.

Here’s how it works:

DocDiver is built on top of DocumentViewer from DocumentCloud. It frames the DocumentViewer embed and adds a new right-hand sidebar with options for readers to browse findings and to add their own. The “overview” tab shows, at a glance, who is talking about this document and “key findings”—ones that our editors find especially illuminating or noteworthy. The “findings” tab shows all reader findings to the right of each page near where readers found interesting bits.

We’re inaugurating DocDiver with a set of previously unreleased government audit reports of GMAC, the nation’s fifth largest servicer of home mortgages. The documents, obtained by ProPublica’s Paul Kiel, show weak oversight of the administration’s main foreclosure prevention program.

As you scroll through the document, the DocDiver sidebar shows findings as yellow “tabs” near where the finding itself can be found. Findings with red backgrounds are from ProPublica reporters or editors. Click on the “pages” tab immediately above the page image and you can see the number of findings on each page—a quick way to find the areas of the document that others have found the most interesting.

To submit a finding, click the button to sign in with Facebook, and then click anywhere on the document page where you see something interesting. Enter a finding in the window that pops up and hit “submit.” The finding will instantly show up in the sidebar near the spot where you clicked. If you clicked near an area others have already annotated, DocDiver may group your finding with others’ in a single tab.

The DocDiver sidebar

In addition to taking part in document dives, you can also vote up other findings you find especially noteworthy. When you post a finding DocDiver will also give you a chance to post it to your Facebook wall.

DocDiver builds on the work done by TPM, the Guardian’s MP Expenses Project, as well as the New York Times’ recent project that solicited reader findings in the Sarah Palin email trove. More important, DocDiver is the brainchild of our Director of Distributed Reporting Amanda Michel, who collaborated with the news apps team to design it.

We’ve got big plans for future document dives, so stay tuned.

Nerdy Details

To build DocDiver, we combined what we learned about using Facebook as an identity management system for our “Opportunity Gap” news application with the excellent (albeit slightly hidden) JavaScript API for DocumentCloud’s DocumentViewer to create a “layer” that sits on top of, and interacts with, the embedded DocumentViewer.

Using the DocumentViewer JS API is easy. Say you’ve embedded a document using code like this:

var dcBaseUrl = '';
var dcSlug    = '231997-clive-goodman-letter-submitted-by-news-corp';
var currentDocument = DV.load(dcBaseUrl + dcSlug + '.js', {
   container : '#doc',
   embedded  : true

Since the document viewer is assigned to currentDocument, you can access to DocumentViewer API methods through currentDocument.api. There are a host of API methods—for example, to get the current tab, title and zoom of the embed, and to set functions that fire when the current page or tab is changed. We needed a few new API features, such as setting the current page, manipulating the URL fragment and working with internal DocumentCloud annotations, which DocumentCloud helped us get into their API. These new API calls are now live on every DocumentViewer embed across the web, so other news organizations can build on it, too.

When users submit findings in DocDiver, we store them, along with the page numbers and coordinates in our own database. These never get attached to the document itself or become actual DocumentCloud annotations.


As more people submit findings, the sidebar fills up with little tabs near the interesting bits. This comes with a slight complication: What if two people add a finding in the same place? How do the tabs fit without becoming a confusing mess?

We came up with the idea of slicing the document into multiple “buckets” of vertical space. When a reader clicks somewhere on the page to leave an annotation, their click will fall into one of those buckets. Every time a page within the viewer is loaded or changed, we “bucketize” all the findings on the page—organizing all the current findings into one bucket or another.

As more findings are added to the document, they fall into buckets. Buckets then combine to form threads

Here’s where the fun comes in: If there is more than one finding in a bucket, DocDiver will automatically turn that bucket into a thread. If there are two findings in each of two adjoining buckets, those two buckets will collapse those into one big thread, and so on. By bucketing nearby findings, we think DocDiver will consolidate and amplify interest around sections of a page. We’re hoping that will help spur a discussion among people who find similar things without needing a “reply” button on tabs, and that it will help avoid atomized and redundant findings.

DocDiver is an experiment that’s begins today. We’ve got big plans for it. We’re excited to see what readers make of it.

Looks like a great start! How does this fit in with DocumentCloud’s rencent Knight News Challenge grant to add reader annotations?

I like it!  I don’t know if I’ll have time to go through the documents in significant detail, but the bucketing and thread-building are great ideas.  I

Are there plans to release the software side?  I don’t need it in my current work, but it’s head and shoulders above any sort of groupware I’ve been forced to look at previously.

Hopefully, moderation has been made simple.  I can easily see these things getting out of hand, especially from companies trying to invoke copyrights and other legal threats to suppress information, not to mention the run of the mill spammers and other unsavory elements.  Or are they not open to public comment?  I can’t see how to contribute (not that I have something TO contribute, mind you).

I like the concept, and hopefully can doc along with you.
I have interesting observations as I move through this maze myself, AND am going through the HUD Certification process @ the same time!

I can not wait any longer when and how can i use it?

Throw anything in this tool and I’ll look at it (sadly, time permitting).

Robert Berkman

Nov. 4, 2011, 11:03 a.m.

Hello—As a communications professor and great admirer of ProPublica (just finished showing Page One in my class that includes an interview with your founder), and editor of a publication for researchers, I was stunned and disappointed at the reaction of your staff and your Director of Communication to refuse to speak about this project for an article I’d like to do for my readers.

After sending two nreturned emails and making two unreturned calls to try to find a source, I was finally abruptly told by the Communications Dept. that DocDiver is “not interested”.(?)  When I expressed surprise that this seems to fly in the face of ProPublica’s mission of transparency and openness, the person hung up the phone.

I am mystified. I have been a teacher and editor for 25 years, and this has never happened before, and certainly not from a fellow media organization.

Commenting is not available in this section entry.