While many people and organizations have long called for data from randomized clinical trials (RCTs) and other clinical research to be shared with other researchers for re-analysis and other re-use, the impetus for it accelerated about a year ago with two publications. One was a call by the International Committee of Medical Journal Editors (ICMJE) for de-identified data from RCTs to be shared as condition of publication . The other was the publication of an editorial in the New England Journal of Medicine wondering whether those who do secondary analysis of such data were “research parasites” . The latter set off a fury of debate across the spectrum, e.g. , from those who argued that primary researchers labored hard to devise experiments and collect their data, thus having claim to control over it, to those who argued that since most research is government-funded, the taxpayers deserve to have access to that data. (Some of those in the latter group proudly adopted the “research parasite” tag.)
Many groups and initiatives have advocated for the potential value of wider re-use of data from clinical research. The cancer genomics community has long seen the value of a data commons to facilitate sharing among researchers . Recent US federal research initiatives, such as the Precision Medicine Initiative  and the 21st Century Cures program  envision an important role for large repositories of data to accompany patients in cutting-edge research. There are a number of large-scale efforts in clinical data collection that are beginning to accumulate substantial amounts of data, such as the National Patient-Centered Clinical Research Network (PCORNet) and the Observational Health Data Sciences and Informatics (OHDSI) initiative.
As with many contentious debates, there are valid points on both sides. The case for requiring publication of data is strong. As most research is taxpayer-funded, it only seems fair that those who paid are entitled to all the data for which they paid. Likewise, all of the subjects were real people who potentially took risks to participate in the research, and their data should be used for discovery of knowledge to the fullest extent possible. And finally, new discoveries may emerge from re-analysis of data. This was actually the case that prompted the Longo “ esearch parasites” editorial, which was praising the “right way” to do secondary analysis, including working with the original researchers. The paper that the editorial described had discovered that the lack of expression of a gene (CDX2) was associated with benefit from adjuvant chemotherapy .
Some researchers, however, are pushing back. They argue that those who carry out the work of designing, implementing, and evaluating experiments certainly have some exclusive rights to the data generated by their work. Some also question whether the cost is a good expenditure of limited research dollars, especially since the demand for such data sets may be modest and the benefit is not clear. One group of 282 researchers in 33 countries, the International Consortium of Investigators for Fairness in Trial Data Sharing, notes that there are risks, such as misleading or inaccurate analyses as well as efforts aimed at discrediting or undermining the original research . They also express concern about the costs, given that there are over 27,000 RCTs performed each year. As such, this group calls for an embargo on reuse of data for two years plus another half-year for each year of the length of the RCT. Even those who support data sharing point out the requirement for proper curation, wide availability to all researchers, and appropriate credit to and involvement of those who originally obtained the data .
There are a number of challenges to more widespread dissemination of RCT data for re-use. A number of pharmaceutical companies have begun making such data available over the last few years. Their experience has shown that the costs are not insignificant (estimated to be about $30,000-$50,000 per RCT) and a scientific review process is essential . Another analysis found that the time to re-analyze data sets can be long, and so far the number of publications have been few . An additional study found that identifiable data sets were only explicitly visible from 12% of all clinical research funded by the National Institutes of Health in 2011 . This means that from 2011 alone, there are possibly more than 200,000 data sets that could be made publicly available, indicating some type of prioritization might be required.
There are also a number of informatics-related issues to be addressed. These not only include adherence to standards and interoperability , but also attention to workflows, integration with other data, such as that from electronic health records (EHRs), and consumer/patient engagement . Clearly the trialists who generate the data must be given incentives for their data to be re-used . My own work assessing the caveats of re-using EHR data is somewhat applicable here too, in that even RCT data may not have the breadth of data or cover sufficient periods of time for additional analyses .
There is definitely great potential for re-use of RCT and other clinical research data to advanced research and ultimately health and clinical care for the population. However, it must be done in ways that represent an appropriate use of resources and result in data that truly advances research, clinical care, and ultimately individual health.
1. Taichman, DB, Backus, J, et al. (2016). Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. New England Journal of Medicine. 374: 384-386.
2. Longo, DL and Drazen, JM (2016). Data sharing. New England Journal of Medicine. 374: 276-277.
3. Berger, B, Gaasterland, T, et al. (2016). ISCB’s initial reaction to The New England Journal of Medicine Editorial on data sharing. PLoS Computational Biology. 12(3): e1004816. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004816.
4. Grossman, RL, Heath, AP, et al. (2016). Toward a shared vision for cancer genomic data. New England Journal of Medicine. 379: 1109-1112.
5. Collins, FS and Varmus, H (2015). A new initiative on precision medicine. New England Journal of Medicine. 372: 793-795.
6. Kesselheim, AS and Avorn, J (2017). New "21st Century Cures" legislation: speed and ease vs science. Journal of the American Medical Association. Epub ahead of print.
7. Dalerba, P, Sahoo, D, et al. (2016). CDX2 as a prognostic biomarker in stage II and stage III colon cancer. New England Journal of Medicine. 374: 211-222.
8. Anonymous (2016). Toward fairness in data sharing. New England Journal of Medicine. 375: 405-407.
9. Merson, L, Gaye, O, et al. (2016). Avoiding data dumpsters — toward equitable and useful data sharing. New England Journal of Medicine. 374: 2414-2415.
10. Rockhold, F, Nisen, P, et al. (2016). Data sharing at a crossroads. New England Journal of Medicine. 375: 1115-1117.
11. Strom, BL, Buyse, ME, et al. (2016). Data sharing — is the juice worth the squeeze? New England Journal of Medicine. 375: 1608-1609.
12. Read, KB, Sheehan, JR, et al. (2015). Sizing the problem of improving discovery and access to NIH-funded data: a preliminary study. PLoS ONE. 10(7): e0132735. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132735.
13. Kush, R and Goldman, M (2016). Fostering responsible data sharing through standards. New England Journal of Medicine. 370: 2163-2165.
14. Tenenbaum, JD, Avillach, P, et al. (2016). An informatics research agenda to support precision medicine: seven key areas. Journal of the American Medical Informatics Association. 23: 791-795.
15. Lo, B and DeMets, DL (2016). Incentives for clinical trialists to share data. New England Journal of Medicine. 375: 1112-1115.
16. Hersh, WR, Weiner, MG, et al. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care. 51(Suppl 3): S30-S37.