For our series, Policing Patient Privacy, ProPublica analyzed data on privacy violations from three main sources: the U.S. Department of Health and Human Services’ Office for Civil Rights, the U.S. Department of Veterans Affairs, and the California Department of Public Health. The primary goal of our analysis was to find the medical facilities and health care companies with the most violations and understand what kinds of repercussions repeat offenders faced.

We also used the data to create a searchable database of privacy violation reports called HIPAA Helper.

HHS Office for Civil Rights

Under the Health Insurance Portability and Accountability Act, the federal privacy law known as HIPAA, it is generally illegal for a company or a health care provider to share personal health information without consent. If someone believes their confidential data has been accessed or shared inappropriately, they can file a complaint with the Office for Civil Rights at the U.S. Department of Health and Human Services. The office is responsible for investigating possible violations of HIPAA and can order offenders to take corrective action and impose penalties.

We examined two sets of data kept by the agency: breaches affecting 500 or more people, and privacy complaints and their outcomes.

Information on large data breaches is kept on a public website, which industry insiders call “the Wall of Shame.” We analyzed data on more than 1,100 breaches reported by health care providers from January 2011 to November 2015. We manually cleaned provider names and attempted to match them to those from the Office for Civil Rights privacy complaint database (see below).

Data on HIPAA complaints is not available online. Under the Freedom of Information Act, we requested all closed HIPAA investigations conducted by the Office for Civil Rights since January 2008. We sought the name of each institution or person who was the subject of a complaint, as well as the date opened, date closed, how the case was resolved, and a description of the complaint. We chose to focus our analysis on a subset of these complaints — those closed from 2011 to 2014. The data was contained in multiple PDFs, totaling more than 5,000 pages. A sizable portion of the data was redacted, particularly complaints that referred to individual practitioners. (HHS took the position that the names of health facilities could be disclosed but the names of doctors or other providers could not. We are appealing that.)

We scanned and parsed the text using optical character recognition and Tabula, and then checked the results to ensure accuracy. We counted 31,310 complaints.

For our analysis, we wanted to focus only on complaints that resulted in providers submitting corrective action plans or receiving technical assistance from the Office. We therefore omitted complaints in which the Office found no violation of HIPAA or determined that it had no jurisdiction, as well as those in which the people who filed the complaints did not cooperate in the investigation.

In some cases, the outcome was not known and we kept those in our analysis.

We also omitted any complaint in which a provider’s name was redacted or the entity’s name was too general (for example, “Hospital” or “Pharmacy,” without any specific identifying information).

After the omissions, about 13,200 complaints remained. We corrected and standardized the names of the entities by hand (for example, “Wal greens” became “Walgreens” and “Wal-Mart” became “Walmart”) so that we could accurately find the entities with the greatest number of privacy violations. In HIPAA Helper, we include both the names as they appear in the data, as well as our names.

Finally, to learn more about the complaints, we requested hundreds of “closure” letters from the Office for Civil Rights under the Freedom of Information Act. To date, we have received more than 150. These provide additional detail on the complaints and their resolution, but do not identify patients. In situations in which we have these records, we are making them available within HIPAA Helper.

U.S. Department of Veterans Affairs

The entity with the highest number of privacy violations, according to our analysis of data from the Office for Civil Rights, was the Department of Veterans Affairs. There were more than 300 complaints involving VA facilities from 2011 to 2014. Some 220 of those resulted in corrective-action plans or technical assistance to help resolve the issue. The VA also has its own internal process for reviewing and investigating privacy complaints.

Through a separate Freedom of Information Act request, we received a listing of privacy incident reports at VA facilities from 2010 to August 2015. The data came from the VA’s Office of Information and Technology, Office of Information Security, Risk Management and Incident Response Team/Incident Resolution Team. This included all alleged privacy violations committed by VA employees, consultants or representatives.

We sought the following fields: the name of the name of the facility/institution, the names of VA employees involved, the date of the privacy violation, any information pertaining to follow-up investigations, whether the investigations are still open or closed, the outcome of any investigations and the type of privacy violation. There were nearly 11,000 privacy reports, affecting thousands of veterans and employees at the VA. We’ve included data from 2011 to 2015 in HIPAA Helper.

We received the data in two formats: the first part of the data was a PDF of an online database that had been provided to the Pittsburgh Tribune-Review for a story in 2013. The second part of the data was an excel spreadsheet of more recent reports. We scanned and parsed the text of the first half of the data, and manually checked it to ensure the data’s integrity.

In our dataset, the name of each entity appeared as a numeric VA facility code, so we matched the codes with facility names or locations.

California Department of Public Health

California has among the toughest laws in the country for health privacy breaches that take place in medical facilities. Under the state’s Public Records Act, we requested data on privacy violations cited by state health inspectors at hospitals since Jan. 1, 2012, which included over 3,700 privacy deficiencies. Some took place before 2012, but were only cited later. We sought the following fields: hospital name, hospital ID, start date of survey, end date of survey, survey ID, and details of any deficiencies.

The date that appears in HIPAA Helper is the survey start date. We only included data for inspections that began after Jan. 1, 2011.

Sometimes, inspectors found violations related to quality of care while reviewing privacy lapses or data breaches. Where possible, we’ve omitted those quality-of-care reports. Finally, we found that a single privacy event sometimes resulted in multiple deficiencies. We attempted to note such situations in the database.