Wednesday, September 12, 2018

Artificial Intelligence in Medicine: 21st Century Resurgence

I first entered the informatics field in the late 1980s, at the tail end of the first era of artificial intelligence (AI) in medicine. Initial systems focused on making medical diagnoses using symbolic processing, which was appropriate for a time of relatively little digital data, both for individual patients and healthcare as whole, and underpowered hardware. Systems like MYCIN [1], INTERNIST-1/QMR [2], and DXPLAIN [3] provided relatively accurate diagnostic performance, but were slow and difficult to use. They also provided a single likely diagnosis, which was not really what clinicians needed. Because of these shortcomings, they never achieved significant real-world adoption, and their "Greek Oracle" style of approach was abandoned. [4]. There was also some early enthusiasm for neural networks around that time [5], although in retrospect those systems were hampered by lack of data and computing power.

Into the 1990s, informatics moved on to other areas, such as information retrieval (search) from the newly evolving World Wide Web and more focused (rule-based) decision support. At the start of the new century, I started to wonder whether I should still even cover those early AI systems in my well-known introductory informatics course. I kept them included, mainly out of a sense of historical perspective, since those systems were a major focus of work in the field in its early days. However, the term "AI" almost seemed to disappear from informatics jargon.

In recent years, however, AI in medicine (and beyond) has re-emerged. Driven by much larger quantities of data (through electronic health records, curated data sets - mainly images, and personal tracking devices) and much more powerful hardware (mainly networked clusters of low-cost computers and hard disks as well as mobile devices), there has been a resurgence of AI, although with a somewhat different focus from the original era. There has also been a maturing of machine learning techniques, most prominently neural networks applied in complex formats known as deep learning [6, 7].

The most success for use of deep learning has come in image processing. The well-known researcher and author Dr. Eric Topol keeps an ever-growing list of systems for diagnosis and their comparison with humans (to which I have contributed a few, and to which I add studies that have only been published as preprints on bioArXiv.org):
  • Radiology - diagnosis comparable to radiologists for pneumonia [8] tuberculosis [9], intracranial hemorrhage [10]
  • Dermatology - detecting skin cancer from images [11-13]
  • Ophthalmology - detecting diabetic retinopathy from fundal images [14-15], predicting cardiovascular risk factors from retinal fundus photographs [16]; diagnosis of congenital cataract [17], age-related macular degeneration [18], plus disease [19]; and diagnoses of retinal diseases [20] and macular diseases [21]
  • Pathology - classifying various forms of cancer from histopathology images [22-25], detecting lymph node metastases [26]
  • Cardiology - cardiac arrhythmia detection comparable to cardiologists [27] and classification of views in echocardiography [28]
  • Gastroenterology - endocytoscope images for diagnose-and-leave strategy for diminutive, nonneoplastic, rectosigmoid polyps [29]
Organized medicine has taken notice of AI. Journal of the American Medical Association recently published two perspective pieces [30, 31] as well as editorial [32] on how AI and machine learning will impact medical practice. I have heard anecdotally that some of the most heavily attended sessions at radiology meetings are those devoted to AI. I am sure there is a mixture of intellectual excitement tinged with some fear of future livelihood.

The success of these systems and the technology underlying them are exciting, but I also would tell any thoughtful radiologist (or pathologist, dermatologist, or ophthalmologist) not to fear for his or her livelihood. Yes these tools will change practice, maybe sooner than we realize. However, I always think that high-tech medicine of the future will look like how it is used the doctors of Star Trek. Yes, those physicians have immense technology at their disposal, not only for diagnosis but also for treatment. But those tools do not remove the human element of caring for people. Explaining to patients their disease process, describing the prognosis as we know it, and shared decision-making among the diagnostic and treatment options are all important in applying advanced technology is medicine.

I also recognize we have a ways to go before this technology truly changes medicine. For several years running, I have expressed both my intellectual excitement at predictive data science while also noting that prediction is not enough, and we must demonstrate that what is predicted must be demonstrated to be able to be applied to improve the delivery of care and patient health.

This notion is best elaborated by some discussion of another deep learning paper focused on a non-image domain, namely the prediction of in-hospital mortality, 30-day unplanned readmission, prolonged length of stay, and the entirety of a patient’s final diagnoses [33]. The paper demonstrates the value of deep learning, the application of Fast Healthcare Interoperability Resources (FHIR) for data points, and efforts for the neural network to explain itself along its processing path. I do not doubt the veracity of what the authors have accomplished. Clearly, deep learning techniques will play a significant role as described above. These methods scale with large quantities of data and will likely improve over time with even better algorithms and better data.

But taking off my computer science hat and replacing it with my informatics one, I have a couple of concerns. My first and major concern is whether this prediction can be turned into information that can improve patient outcomes. Just because we can predict mortality or prolonged length of stay, does that mean we can do anything about it? Second, while there is value to predicting across the entire population of patients, it would be interesting to focus in on patients we know are more likely to need closer attention. Can we focus in and intervene for those patients who matter?

Dr. Topol recently co-authored an accompanying editorial describing a study that adheres to the kind of methods that are truly needed to evaluate modern AI in clinical settings [34]. The study itself is to be commended; it actually tests an application of an AI system for detection of diabetic retinopathy in primary care settings [35]. The system worked effectively, though it was not flawless, and other issues common to real-world medicine emerged, such as some patients being non-imageable and others having different eye diseases. Nonetheless, I agree with Dr. Topol that this study sets the bar for how AI needs to be evaluated before its widespread adoption in routine clinical practice.

All of this AI in medicine research is impressive. But its advocates will need to continue the perhaps more mundane research of how we make this data actionable and actually act on it in ways that improve patient outcomes. I personally find that kind of research more interesting and exciting anyways.

References

1. Miller, RA (2010). A history of the INTERNIST-1 and Quick Medical Reference (QMR) computer-assisted diagnosis projects, with lessons learned. Yearbook of Medical Informatics. Stuttgart, Germany: 121-136.
2. Shortliffe, EH, Davis, R, et al. (1975). Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Computers and Biomedical Research. 8: 303-320.
3. Barnett, GO, Cimino, JJ, et al. (1987). DXplain: an evolving diagnostic decision-support system. Journal of the American Medical Association. 258: 67-74.
4. Miller, RA and Masarie, FE (1990). The demise of the "Greek Oracle" model for medical diagnostic systems. Methods of Information in Medicine. 29: 1-2.
5. Rumelhart, DE and McClelland, JL (1986). Parallel Distributed Processing: Foundations. Cambridge, MA, MIT Press.
6. Alpaydin, E (2016). Machine Learning: The New AI. Cambridge, MA, MIT Press.
7. Kelleher, JD and Tierney, B (2018). Data Science. Cambridge, MA, MIT Press.
8. Rajpurkar, P, Irvin, J, et al. (2017). CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv.org: arXiv:1711.05225. https://arxiv.org/abs/1711.05225.
9. Lakhani, P and Sundaram, B (2017). Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 284: 574-582.
10. Arbabshirani, MR, Fornwalt, BK, et al. (2018). Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. npj Digital Medicine. 1: 9. https://www.nature.com/articles/s41746-017-0015-z.
11. Esteva, A, Kuprel, B, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. 542: 115-118.
12. Haenssle, HA, Fink, C, et al. (2018). Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology. 29: 1836-1842.
13. Han, SS, Kim, MS, et al. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology. 138: 1529-1538.
14. Gulshan, V, Peng, L, et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Journal of the American Medical Association. 316: 2402-2410.
15. Ting, DSW, Cheung, CYL, et al. (2017). Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Journal of the American Medical Association. 318: 2211-2223.
16. Poplin, R, Varadarajan, AV, et al. (2017). Predicting cardiovascular risk factors from retinal fundus photographs using deep learning. arXiv.org: arXiv:1708.09843. https://arxiv.org/abs/1708.09843.
17. Long, E, Lin, H, et al. (2017). An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nature Biomedical Engineering. 1: 0024. https://www.nature.com/articles/s41551-016-0024.
18. Burlina, PM, Joshi, N, et al. (2017). Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmology. 135: 1170-1176.
19. Brown, JM, Campbell, JP, et al. (2018). Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmology. 136: 803-810.
20. DeFauw, J, Ledsam, JR, et al. (2018). Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine: Epub ahead of print. https://www.nature.com/articles/s41591-018-0107-6.
21. Kermany, DS, Goldbaum, M, et al. (2018). Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 172: 1122-1131.E1129.
22. Bejnordi, BE, Zuidhof, G, et al. (2017). Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging. 4(4): 044504. https://www.spiedigitallibrary.org/journals/journal-of-medical-imaging/volume-4/issue-04/044504/Context-aware-stacked-convolutional-neural-networks-for-classification-of-breast/10.1117/1.JMI.4.4.044504.full?SSO=1.
23. Liu, Y, Gadepalli, K, et al. (2017). Detecting cancer metastases on gigapixel pathology images. arXiv.org: arXiv:1703.02442. https://arxiv.org/abs/1703.02442.
24. Yu, KH, Zhang, C, et al. (2017). Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications. 7: 12474. https://www.nature.com/articles/ncomms12474.
25. Capper, D, Jones, DTW, et al. (2018). DNA methylation-based classification of central nervous system tumours. Nature. 555: 469–474.
26. Bejnordi, BE, Veta, M, et al. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Journal of the American Medical Association. 318: 2199-2210.
27. Rajpurkar, P, Hannun, AY, et al. (2017). Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv.org: arXiv:1707.01836. https://arxiv.org/abs/1707.01836.
28. Madani, A, Arnaout, R, et al. (2018). Fast and accurate view classification of echocardiograms using deep learning. npj Digital Medicine. 1: 6. https://www.nature.com/articles/s41746-017-0013-1.
29. Mori, Y, Kudo, SE, et al. (2018). Real-time use of artificial intelligence in identification of diminutive polyps during colonoscopy: a prospective study. Annals of Internal Medicine: Epub ahead of print.
30. Hinton, G (2018). Deep learning—a technology with the potential to transform health care. Journal of the American Medical Association: Epub ahead of print.
31. Naylor, CD (2018). On the prospects for a (deep) learning health care system. Journal of the American Medical Association: Epub ahead of print.
32. Stead, WW (2018). Clinical implications and challenges of artificial intelligence and deep learning. Journal of the American Medical Association: Epub ahead of print.
33. Rajkomar, A, Oren, E, et al. (2018). Scalable and accurate deep learning for electronic health records. npj Digital Medicine. 1: 18. https://www.nature.com/articles/s41746-018-0029-133.
34. Keane, PA and Topol, EJ (2018). With an eye to AI and autonomous diagnosis. npj Digital Medicine. 1: 40. https://www.nature.com/articles/s41746-018-0048-y.
35. Abràmoff, MD, Lavin, PT, et al. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digital Medicine. 1: 39. https://www.nature.com/articles/s41746-018-0040-6.