Everyone, including this blog writer, has been touting the virtues of the vast troves of data already or soon to be available in the electronic health record (EHR), which will usher in the learning healthcare system [1, 2]. There is sometimes unbridled enthusiasm that the data captured in clinical systems, perhaps combined with research data such as gene sequencing, will effortlessly provide us knowledge of what works in healthcare and how new treatments can be developed [3, 4]. The data is unstructured? No problem, just apply natural language processing .
I honestly share in this enthusiasm, but I also realize that it needs to be tempered, or at least given a dose of reality. In particular, we must remember that our great data analytics and algorithms will only get us so far. If we have poor underlying data, the analyses may end up misleading us. We must be careful for problems of data incompleteness and incorrectness.
There are all sorts of reasons for inadequate data in EHR systems. Probably the main one is that those who enter data, i.e., physicians and other clinicians, are usually doing so for reasons other than data analysis. I have often said that clinical documentation can be what stands between a busy clinician and going home for dinner, i.e., he or she has to finish charting before ending the work day.
I also know of many clinicians whose enthusiasm for entering correct and complete data is tempered by their view of the entry of it as a data blackhole. That is, they enter data in but never derive out its benefits. I like to think that most clinicians would relish the opportunity to look at aggregate views of their patients in their practices and/or be able to identify patients who are outliers in one measure or another. Yet a common complaint I hear from clinicians is that data capture priorities are more driven by the hospital or clinic trying to maximize their reimbursement than to aid clinicians in providing better patient care.
Another challenge for clinicians is the time required for electronic data entry. There is no question that the 20th century means of clinical documentation, mostly consisting of scribbling illegible notes on paper, was much easier and faster than typing and/or clicking. While I think that few clinicians want to go back to hand-written notes, there is an appeal of their ease of use, at least for the person doing the entry.
Related to the time for electronic data entry is the "tension" between structured data, which makes aggregation and analysis easier, and "flexible" (or narrative) data, which allows the clinician to tell the story of the patient . Many clinicians report that excess structuring of data (i.e., pointing and clicking) loses the story of the patient, although those who process the data know that structured data is easier to analyze.
An additional challenge for electronic data entry for clinicians is the shift of the focus from the patient to the computer. This was exemplified in a cartoon published earlier this year in JAMA that showed a 7-year-old's sketch of an exam room with the physician hunched over the computer, his back turned away from the patient and her family  (the sketch viewable at http://jama.jamanetwork.com/article.aspx?articleid=1187932).
An excellent example of the promise but limitations of current data entry systems was recently documented by Parsons et al. , who found in a wide sample of primary care EHRs in New York City that the accuracy of data for measuring breast cancer screening quality measures was highly variable due to differing practices in documentation, workflow, and related factors. While some physicians had the quality of their care measured accurately, for many others it was underestimated due to data limitations and not the care they provided.
I cannot claim to have easy answers to this grand challenge, but two related aspects of it sit in front of us:
- We need to find better and faster ways for clinicians to enter data into the EHR that allow data whose quality is good enough to be re-used for other purposes, such as research, quality measurement and improvement, and public health.
- We must reward clinicians for their efforts in entering high-quality data. We must allow them to see aggregate views of patients in their practices and be able to identify outliers. We must also engage them in research, quality improvement, and other system uses of their data.
In short, the concept of "garbage in, garbage out" still remains a problem for computers and information technology nearly a half-century after it was coined. In healthcare, we must give clinicians the best tools and incentives for them to participate in the learning healthcare system. For informatics, the problem of data entry is a grand challenge every bit as important as how to make use of its growing quantity, since the knowledge derived from that data will only be as good as the quality of what is input.
1. Friedman, C., Wong, A., et al. (2010). Achieving a nationwide learning health system. Science Translational Medicine, 2(57): 57cm29. http://stm.sciencemag.org/content/2/57/57cm29.full.
2. Greene, S., Reid, R., et al. (2012). Implementing the learning health system: from concept to action. Annals of Internal Medicine, 157: 207-210.
3. McCarty, C., Chisholm, R., et al. (2010). The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Genomics, 4(1): 13. http://www.biomedcentral.com/1755-8794/4/13.
4. Rea, S., Pathak, J., et al. (2012). Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of Biomedical Informatics, 45: 763-771.
5. Nadkarni, P., Ohno-Machado, L., et al. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18: 544-551.
6. Rosenbloom, S., Denny, J., et al. (2011). Data from clinical notes: a perspective on the tension between structure and flexible documentation. Journal of the American Medical Informatics Association, 18: 181-186.
7. Toll, E. (2012). The cost of technology. Journal of the American Medical Association, 307: 2497-2498.
8. Parsons, A., McCullough, C., et al. (2012). Validity of electronic health record-derived quality measurement for performance monitoring. Journal of the American Medical Informatics Association, 19: 604-609.