Thursday, March 10, 2011

Natural Language Processing: A Dream That Won't Die … and Shouldn't

One of the longest-standing dreams of informatics, dating back to the early (i.e, 1960s) era of artificial intelligence, is the use of natural language processing (NLP) to extract data about patients from clinical narrative data (e.g., progress notes, discharge summaries, etc.) in the electronic health record (EHR). The notion that you can take the narrative language of clinicians and turn it into concrete facts that can be used for clinical decision support, clinical research, quality measurement, surveillance, etc. is immensely appealing.

Alas, that dream, at least in a generalizable way, is still a dream. You can count a number of my published papers over the years as a few among the many valiant efforts. Unfortunately, the variability (or some might say mangling, especially by physicians) of language, along with the hidden context and meaning "between the lines," makes NLP a very difficult task to program in a computer.

Some, however, have managed to succeed in focused ways. For example, generalizable decision support also never succeeded but it has been found that focused decision support works quite well and is used in EHRs daily. Likewise, there have a number of focused areas where NLP has provided useful data for clinical processes.

It is in this context that I am pleased to report on another contribution to the literature of clinical NLP, which is a paper that appeared in a recent issue of the Journal of the American Medical Informatics Association (JAMIA) and is lead-authored by a former student, Mary Stanfill [1]. I am a co-author. It is always a thrill to see a student publish a peer-reviewed paper, especially one that started as a term paper in one of my courses, advanced to a capstone project in a master's degree, and ultimately ended up in one of the leading journals in our field.

This paper also makes a valuable contribution of being a systematic review of all studies that report results of "automated coding and classification." The analysis shows that there have been many efforts performed using many methods in a variety of clinical domains, with a wide range of results. Of course, this gets to a gripe I have had with clinical NLP and related text mining researchers over the years, which is that evaluation studies have not advanced much beyond measuring the accuracy (e.g., recall, precision, sensitivity, specificity, etc.) of how well systems identify concepts in the text [2]. I would prefer to see the next step in systems being evaluated, such as how well NLP can impact the tasks it might be used for, such as quality measurement programs or facilitating clinical research studies. This would be akin to the "task-oriented" studies of information retrieval systems I performed years ago, which focused on how well searchers completed tasks using retrieval systems rather than just measuring how many relevant articles they retrieved [3].

The good news is that systems using NLP are starting to be deployed in operational clinical settings or clinical and translational research programs, and there is an ever-increasing amount of real data in electronic form for them to use. In addition to a growing number of individual studies, there are also large-scale projects of which NLP is a significant part. There include:
  • Informatics for Integrating Biology and the Bedside (i2b2) - a long-standing project to facilitate the use of clinical data for genomic and clinical research. One of its activities includes a yearly challenge evaluation that allows research to compare systems and results on a common task. The i2b2 challenge has looked at automatic de-identification [4], identification of smoking status [5], recognition of obesity and co-morbidities [6], and extraction of medication information [7].
  • Electronic Medical Records and Genomics (eMERGE) Network - a multi-center project focused on the use of data in EHRs to facilitate the study of how genetic variability contributes to health and disease [8]. One of the foci includes the use of NLP for extracting data from clinical narratives and integrating it with other data in the clinical record. One accomplishment of this research to date has been the ability to replicate four of seven known gene-disease associations [9].
  • SHARP 4 - one of the four collaborative research centers being funded under the HITECH Program to facilitate meaningful use of EHRs, with a focus on secondary use of EHR data.
Another development is the launching this year of a medical records track in the Text Retrieval Conference (TREC) annual information retrieval challenge evaluation. The track will use de-identified records developed by Wendy Chapman and colleagues.

It is also impossible to discuss this topic without acknowledging the discussion around the IBM Watson question-answering system, which recently proved its mettle in a television game show Jeopardy match [10]. IBM has announced some research partnerships that will apply Watson to medical data. This is an interesting research area, but we will need to see real research results to back up the hype [11].

While there are still challenges for clinical NLP, I believe we are seeing a convergence of new methods coupled with growing needs to make use of the increasing volume of clinical data as well as our desire to facilitate re-use of that data for many purposes, such as clinical decision support, quality measurement and improvement, clinical research, and public health reporting and surveillance. While there may be generalizable approaches yet to be discovered, I suspect that evolution will be much like clinical decision support, which has been more successful when engineered to specific domain areas. But as we have also seen with clinical decision support, the ability to perform those specific tasks successfully will be highly valuable to healthcare.


1. Stanfill, M., Williams, M., et al. (2010). A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association, 17: 646-651.
2. Hersh, W. (2005). Evaluation of biomedical text mining systems: lessons learned from information retrieval. Briefings in Bioinformatics, 6: 344-356.
3. Hersh, W., Crabtree, M., et al. (2002). Factors associated with success for searching MEDLINE and applying evidence to answer clinical questions. Journal of the American Medical Informatics Association, 9: 283-293.
4. Uzuner, O., Luo, Y., et al. (2007). Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 14: 550-563.
5. Uzuner, O., Goldstein, I., et al. (2008). Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 15: 14-24.
6. Uzuner, O. (2009). Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16: 561-570.
7. Uzuner, O., Solti, I., et al. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17: 514-518.
8. McCarty, C., Chisholm, R., et al. (2010). The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Genomics, 4(1): 13.
9. Denny, J., Ritchie, M., et al. (2010). PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics, 26: 1205-1210.
10. Ferrucci, D., Brown, E., et al. (2010). Building Watson: an overview of the DeepQA Project. AI Magazine, 31(3): 59-79.
11. Anonymous (2011). IBM to Collaborate with Nuance to Apply IBM’s "Watson" Analytics Technology to Healthcare. Armonk, NY, IBM.

No comments:

Post a Comment