1. Promising directions for new data science research in the context of health and biomedicine. Input might address such topics as Data Driven Discovery and Data Driven Health Improvement.
The scientific literature is increasingly filled with papers describing novel and exciting applications of data science, such as improving clinical diagnosis and determining safer and more efficient healthcare. But there is more to impactful data science than the data and tools. We need studies that demonstrate real impact in improve patient and system outcomes. We need to assess the impact of efforts improving data standards and data quality.
One way to look at this is to consider the growing area of data analytics, which may be thought of as applied data science. Data analytics classifies three levels of analytics :
- Descriptive - describing what the data say about what has happened
- Predictive - using the data to predict what might happen going forward
- Prescriptive - deciding on actions based on the data to improve outcomes
These predictive analytics applications are important, but equally important is research into how they will be best applied. Attention to hospital readmissions has somewhat lowered its rate, but the problem is far from solved. We not only need to predict who these patients will be, but device programs that will enable action on that data.
Likewise, as we learn to improve diagnosis and treatment of disease through predictive analytics, we will need to determine ways to make actions on those predictions possible, both for clinical researchers who discover new possible diagnostic tests and treatments for disease as well as clinicians who apply the new complex information in patient care. This will require both clinical decision support from machines and new organizational structures to conduct research and apply its results optimally in clinical care.
As such, a new thread of research in prescriptive analytics, i.e., applying the outcomes of data science research, is critical for realizing the value of biomedical science. The NLM should be at the forefront of thought leadership and funding of that research. Such research can build on its unique strong portfolio of existing research in biomedical informatics (which some of us consider data science to be a part of).
2. Promising directions for new initiatives relating to open science and research reproducibility. Input might address such topics as Advanced Data Management and Intelligent and Learning Systems for Health.
Open science and reproducibility of research are critical for the transition of data science from predictive to prescriptive analytics. Since the value of data science comes from large understanding of populations of patients, it is only fair to all who contribute their data to benefit from research using it. Therefore, we must devise methods to allowing appropriate access to that data while still protecting the privacy of individuals who have contributed their data. We also need to devise approaches to give appropriate scientific credit to those who collect the data, and a short time-limited window for them to achieve the first publication of results from it.
Open science should not, however, just be thought of as open data. The models and algorithms that process such data are also increasingly complex. We need more research into understanding how such systems work, how different methods compare with each other, and where biases and other problems may be introduced. As such, the algorithms used must be open so they can be understood and improved.
3. Promising directions for workforce development and new partnerships. Input might address such topics as Workforce Development and Diversity and New Stakeholder Partnerships.
New directions in data science must take into account the human workforce needed to lead discovery as well as apply it to achieve value. The best known data analytics workforce analyses from McKinsey  and IDC  are a few years old now, but both make a consistent point that we not only need a focused cadre of quantitative experts, but also 5-10 fold more professionals who can contribute to the design of analyses and apply their results in ways that improve patient and system outcomes. In other words, we need individuals who not only know the optimal methods for predictive uses, but also domain experts and applications specialists who can collaborate with the quantitative experts to achieve the best outcomes of data science.
In conclusion, there are many opportunities to put data science and data analytics to work for advancing health and healthcare. This work must not only build on past work done in biomedical informatics and other disciplines but also look to the future to best apply prediction in ways that improves maintanence of health and treatment of disease.
1. Davenport, TH (2015). Big Data at Work: Dispelling the Myths, Uncovering the Opportunities. Cambridge, MA, Harvard Business Review.
2. Amarasingham, R, Moore, BJ, et al. (2010). An automated model to identify heart failure patients at risk for 30-day readmission or death using electronic medical record data. Medical Care. 48: 981-988.
3. Futomaa, J, Morris, J, et al. (2015). A comparison of models for predicting early hospital readmissions. Journal of Biomedical Informatics. 56: 229-238.
4. Oakden-Rayner, L, Carneiro, G, et al. (2017). Precision radiology: predicting longevity using feature engineering and deep learning methods in a radiomics framework. Scientific Reports. 7: 1648. https://www.nature.com/articles/s41598-017-01931-w.
5. Rajpurkar, P, Irvin, J, et al. (2017). CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. https://arxiv.org/abs/1711.05225.
6. Liu, Y, Gadepalli, K, et al. (2017). Detecting cancer metastases on gigapixel pathology images. https://arxiv.org/abs/1703.02442.
7. Esteva, A, Kuprel, B, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. 542: 115-118.
8. Price, ND, Magis, AT, et al. (2017). A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nature Biotechnology. 35: 747-756.
9. Collins, FS and Varmus, H (2015). A new initiative on precision medicine. New England Journal of Medicine. 372: 793-795.
10. Manyika, J, Chui, M, et al. (2011). Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute. http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation.
11. Anonymous (2014). IDC Reveals Worldwide Big Data and Analytics Predictions for 2015. Framingham, MA, International Data Corporation. http://bit.ly/IDCBigDataFutureScape2015.