The "pages" tab in DocDiver's version of the DocumentCloud DocumentViewer shows how many findings were left on each page.

Introducing DocDiver

Today we’re launching a new feature that lets readers work alongside ProPublica reporters—and each other—to identify key bits of information in documents, and to share what they’ve found. We call it DocDiver.

by Al Shaw

October 4, 2011, 5:30 am

Today we’re launching a new feature that lets readers work alongside ProPublica reporters—and each other—to identify key bits of information in documents, and to share what they’ve found. We call it DocDiver.

Here’s how it works:

DocDiver is built on top of DocumentViewer from DocumentCloud. It frames the DocumentViewer embed and adds a new right-hand sidebar with options for readers to browse findings and to add their own. The “overview” tab shows, at a glance, who is talking about this document and “key findings”—ones that our editors find especially illuminating or noteworthy. The “findings” tab shows all reader findings to the right of each page near where readers found interesting bits.

We’re inaugurating DocDiver with a set of previously unreleased government audit reports of GMAC, the nation’s fifth largest servicer of home mortgages. The documents, obtained by ProPublica’s Paul Kiel, show weak oversight of the administration’s main foreclosure prevention program.

As you scroll through the document, the DocDiver sidebar shows findings as yellow “tabs” near where the finding itself can be found. Findings with red backgrounds are from ProPublica reporters or editors. Click on the “pages” tab immediately above the page image and you can see the number of findings on each page—a quick way to find the areas of the document that others have found the most interesting.

To submit a finding, click the button to sign in with Facebook, and then click anywhere on the document page where you see something interesting. Enter a finding in the window that pops up and hit “submit.” The finding will instantly show up in the sidebar near the spot where you clicked. If you clicked near an area others have already annotated, DocDiver may group your finding with others’ in a single tab.

The DocDiver sidebar

In addition to taking part in document dives, you can also vote up other findings you find especially noteworthy. When you post a finding DocDiver will also give you a chance to post it to your Facebook wall.

DocDiver builds on the work done by TPM, the Guardian’s MP Expenses Project, as well as the New York Times’ recent project that solicited reader findings in the Sarah Palin email trove. More important, DocDiver is the brainchild of our Director of Distributed Reporting Amanda Michel, who collaborated with the news apps team to design it.

We’ve got big plans for future document dives, so stay tuned.

Nerdy Details

To build DocDiver, we combined what we learned about using Facebook as an identity management system for our “Opportunity Gap” news application with the excellent (albeit slightly hidden) JavaScript API for DocumentCloud’s DocumentViewer to create a “layer” that sits on top of, and interacts with, the embedded DocumentViewer.

Using the DocumentViewer JS API is easy. Say you’ve embedded a document using code like this:

var dcBaseUrl = 'https://www.documentcloud.org/documents/';
var dcSlug    = '231997-clive-goodman-letter-submitted-by-news-corp';
var currentDocument = DV.load(dcBaseUrl + dcSlug + '.js', {
   container : '#doc',
   embedded  : true
 });

Since the document viewer is assigned to currentDocument, you can access to DocumentViewer API methods through currentDocument.api. There are a host of API methods—for example, to get the current tab, title and zoom of the embed, and to set functions that fire when the current page or tab is changed. We needed a few new API features, such as setting the current page, manipulating the URL fragment and working with internal DocumentCloud annotations, which DocumentCloud helped us get into their API. These new API calls are now live on every DocumentViewer embed across the web, so other news organizations can build on it, too.

When users submit findings in DocDiver, we store them, along with the page numbers and coordinates in our own database. These never get attached to the document itself or become actual DocumentCloud annotations.

Buckets

As more people submit findings, the sidebar fills up with little tabs near the interesting bits. This comes with a slight complication: What if two people add a finding in the same place? How do the tabs fit without becoming a confusing mess?

We came up with the idea of slicing the document into multiple “buckets” of vertical space. When a reader clicks somewhere on the page to leave an annotation, their click will fall into one of those buckets. Every time a page within the viewer is loaded or changed, we “bucketize” all the findings on the page—organizing all the current findings into one bucket or another.

As more findings are added to the document, they fall into buckets. Buckets then combine to form threads

Here’s where the fun comes in: If there is more than one finding in a bucket, DocDiver will automatically turn that bucket into a thread. If there are two findings in each of two adjoining buckets, those two buckets will collapse those into one big thread, and so on. By bucketing nearby findings, we think DocDiver will consolidate and amplify interest around sections of a page. We’re hoping that will help spur a discussion among people who find similar things without needing a “reply” button on tabs, and that it will help avoid atomized and redundant findings.

DocDiver is an experiment that’s begins today. We’ve got big plans for it. We’re excited to see what readers make of it.