Statistics has of course been around for centuries, although this author drew from an event 50 years ago, a lecture by George Tukey. The informatics field has not been in existence for as many centuries, but one summary of its history by Fourman credits the origin of the term to Philip Dreyfus in 1962 . However, the Wikipedia entry for informatics attributes the term to a German computer scientist Karl Steinbuch in 1956. Fourman also notes that the heaviest use of the term informatics comes from its attachment to various biomedical and health terms .
If the informatics field is indeed 60 years old, I have been working in it for about half of its existence, since I started my National Library of Medicine (NLM) medical informatics fellowship in 1987. I have certainly devoted a part of my career to raising awareness of the term informatics, making the case for it as a discipline . Clearly the discipline has become recognized, with many academic departments, mostly in health science universities, and a new physician subspecialty devoted to it .
And now comes data science. What are we in informatics to make of this new field? Is it the same as informatics? If not, how does it differ? I have written about this before.
Donoho’s paper does offer some interesting insights . I get a kick out of one tongue-in-cheek definition he gives of a data scientist, whom he defines as a “person who is better at statistics than any software engineer and better at software engineering than any statistician.” Perhaps we could substitute informatician for software engineer, i.e., a data scientist is someone who is better at statistics than any informatician and is better at informatics than any statistician?
Donoho does later provide a more serious definition of data science, which is that it is “the science of learning from data; it studies the methods involved in the analysis and processing of data and proposes technology to improve methods in an evidence-based manner.” He goes on to further note, “the scope and impact of this science will expand enormously in coming decades as scientific data and data about science itself become ubiquitously available.”
Donoho goes on to note six key aspects (he calls them “divisions” of “greater data science”) that I believe further serve to define the work of the field:
- Data Exploration and Preparation
- Data Representation and Transformation
- Computing with Data
- Data Modeling
- Data Visualization and Presentation
- Science about Data Science
Like many informaticians, I do remain enthusiastic for the prospect of the growing quantity of data to advance our understanding of human health and disease, and how to treat the latter better. But I also have some caveats. I have concerns that some data scientists read too much into correlations and associations, especially in the face of so much medical data capture being imprecise, our lack of adoption of standards, and its inaccessibility when not structured well (which can lead us to try to “unscramble eggs”).
It is clear that informatics cannot ignore data science, but our field must also be among the leaders in determining its proper place and usage, especially in health-related areas. We must recognize the overlap as well as appreciate the areas where informatics can be synergistic with data science.
1. Donoho, D (2015). 50 years of Data Science. Princeton NJ, Tukey Centennial Workshop. https://dl.dropboxusercontent.com/u/23421017/50YearsDataScience.pdf.
2. Fourman, M (2002). Informatics. In International Encyclopedia of Information and Library Science, 2nd Edition. J. Feather and P. Sturges. London, England, Routledge: 237-244.
3. Hersh, W (2009). A stimulus to define informatics and health information technology. BMC Medical Informatics & Decision Making. 9: 24. http://www.biomedcentral.com/1472-6947/9/24/.
4. Detmer, DE and Shortliffe, EH (2014). Clinical informatics: prospects for a new medical subspecialty. Journal of the American Medical Association. 311: 2067-2068.
5. Rosenbaum, L (2015). Transitional chaos or enduring harm? The EHR and the disruption of medicine. New England Journal of Medicine. 373: 1585-1588.
6. Koster, J, Stewart, E, et al. (2016). Health care transformation: a strategy rooted in data and analytics. Academic Medicine. Epub ahead of print.