Friday, July 10, 2015

What is the Difference (If Any) Between Informatics and Data Science?

I am increasingly asked to describe the difference between data science and biomedical informatics. Distinguishing these disciplines takes on added importance with the recent publication of the NIH Advisory Committee to the Director, National Library of Medicine (NLM) Working Group, report on the future of the NLM, which calls for NLM to become a leader in data science at NIH. NLM has of course historically been a leader in research and training in biomedical informatics.

What is, if any, the difference between informatics and data science? Let me start with definitions. I have written my own definitions of biomedical informatics [1] but for the sake of the community, let me quote the latest consensus definition from our professional association, the American Medical Informatics Association (AMIA) [2]: "Biomedical informatics (BMI) is the interdisciplinary field that studies and pursues the effective uses of biomedical data, information, and knowledge for scientific inquiry, problem solving and decision making, motivated by efforts to improve human health."

How is data science defined? It is not as easy to find an "official" definition of data science, but a good starting point might be the definition from Wikipedia, which is the "extraction of knowledge from large volumes of data that are structured or unstructured." The Wikipedia article references that definition from a paper by Vasant Dhar [3] and a blog posting by Jeff Leek. A Google search also points out some highly-cited sources from O’Reilly & Associates Media and Forbes Magazine. The Forbes article quotes the famous information scientist Hal Varian, who has noted, "The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it - that’s going to be a hugely important skill in the next decades." I myself have written that the core competencies of data science are statistics, especially machine learning; data-oriented computer programming, especially of querying databases; domain understanding where the analysis and interpretation will be applied; business processes; and communications.

I believe that whether data science is distinct from, partially overlapping with, or a subset of informatics is defined by how broadly one defines informatics. I tend to take a very broad definition of informatics, because I understand that its "sociotechnical" nature [4] covers many facets of data, information, and knowledge, including their technological as well as social context. Informatics recognizes that many aspects of data influence its use, aggregation, and interpretation. I have expressed concern in the past that data scientists need to understand research methodology, in particular how we distinguish cause-and-effect from correlation.

One possible way to answer the question of distinction between data science and informatics is to think about areas of informatics that are not ordinarily considered part of data science. I can think of (at least) several. One is usability. We know through the recent massive adoption of the electronic health record EHR) that there are significant usability challenges of current EHRs [5]. These not only adversely impact workflow, another important informatics topic, but may compromise safety. Another area of informatics we have come to recognize as critical is adherence to standards so that we may achieve better system interoperability. Finally, we also know that informatics is riddled with challenging "people and organizational issues" as information systems profoundly impact healthcare and individual health in many ways [6].

There is no question that what we can do with data is important for informatics, larger healthcare, and society as a whole. Informatics has recognized this for decades, but it also knows that there is much context beyond the data itself, and to this end, we are best served by viewing data science as a proper subset of informatics, certainly in the biomedical and health domain.


1. Hersh, W (2009). A stimulus to define informatics and health information technology. BMC Medical Informatics & Decision Making. 9: 24.
2. Kulikowski, CA, Shortliffe, EH, et al. (2012). AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate education in the discipline. Journal of the American Medical Informatics Association. 19: 931-938.
3. Dhar, V (2013). Data science and prediction. Communications of the ACM. 56(12): 64-73.
4. Coiera, E (2007). Putting the technical back into socio-technical systems research. International Journal of Medical Informatics. 76(Supp 1): 98-103.
5. Zhang, J and Walji, M, Eds. (2014). Better EHR - Usability, workflow & cognitive support in electronic health records. Houston, TX, National Center for Cognitive Informatics & Decision Making in Healthcare.
6. Ash, JS, Berg, M, et al. (2004). Some unintended consequences of information technology in health care: the nature of patient care information system related errors. Journal of the American Medical Informatics Association. 11: 104-112.


  1. Well said Bill. In particular I agree that while it certainly may be that BMI involves lots of "data science", it is broader. As a biomedical informatician that spends essentially all of my time working on health care information standards and the implication of data capture on workflow (usability), if those are not informatics I'm without a home...

  2. I agree that the overlap between Data Science and Informatics depends on their definition. As an engineer turned research scientist, turned data scientist I recognize the peculiarity of Data Science in a unique intersection of skills (technical, communication, business) and cutting edge approaches (latest in machine learning, distributed computing and storage ) that may not be equally and synchronically required in Informatics. And yes every data scientist has to be at least familiar with research methodology.