This dataset contains the demographic breakdowns of participants in clinical trials for FDA-approved drugs between January 2015 and June 2018. The FDA has been providing demographic reports for each approved drug since January 2015. While the FDA provides summary reports by year, sometimes in PDF format only, this dataset was compiled to include all available data across years in an easily usable format.

The columns of the dataset include: brand name; drug indication; percentage of women in the clinical trials; percentage of participants by race: white, black or African American, Asian, and other; percentage of participants of Hispanic ethnicity; percentage of participants who are age 65 and older; and year. 

The "Other" race category was used as a catch-all for any of these categories: American Indian/Alaska Native (AI/AN), Native Hawaiian or Other Pacific Islander (NH/OPI), mixed race, multiple races, Unknown, Unreported, and Other. While the FDA also provides these demographic breakdowns by drug, which contains more detailed information, raw numbers for patients, and occasionally disaggregated "Other" categories, we did not include this information here. For individual drugs, the disaggregated "Other" categories are not consistent. 

For drugs approved in 2015 and 2016, percentages for the "Other" category were provided in FDA summary reports. For 2017 drugs, we calculated this percentage by subtracting the other categories from 100%. For 2018 drugs, we manually compiled these percentages from the reports for each individual drug.

The "Hispanic" ethnicity category was not included in the yearly summary reports for 2015 and 2016, although it is sometimes included in individual drug reports. Note that this percentage is one category out of the following: Hispanic, Not Hispanic, and Unknown/Unreported. Also to note is that some drugs report "Hispanic or Latino" whereas others only have "Hispanic." 

ProPublica used this data in our piece about racial representation in cancer clinical trials. We analyzed this data to determine the race distribution of patients in clinical trials for cancer drugs. We also compiled a more detailed dataset, including disaggregated "Other" categories, using the FDA demographic reports specifically for drugs indicated to treat cancer.


