Wednesday, November 24, 2021

A Part of Informatics History

Although I am no historian, I have always enjoyed reading history, which often provides insights into why the world is the way it is in current times. Although society in the 21st century is changing rapidly, particularly with regards to technology, we can still learn from what happened in the past, both to appreciate what we have now and to understand how we got here.

To this end, I am delighted to see publication of a new volume devoted to the history of biomedical and health informatics. Although the book provides some historical overview of the field, its main content consists of about 160 personal stories of what led many current senior leaders to end up working in this field. I am delighted that my own story is one of those in the collection and am also pleased that the book is published in an open-access manner and is freely available as a PDF on the Web site of the International Medical Informatics Association (IMIA).

The citation for the book is, International Medical Informatics and the Transformation of Healthcare, Casimir A. Kulikowski, Editor-in-Chief; George I. Mihalas, Associate Editor-in-Chief; Robert A. Greenes, Editor; Hyeoun-Ae Park, Editor; Valerio Yácubsohn, Editor. ISSN 1485-7375. Copyright, 2021 by Healthcare Computing & Communications Canada & IMIA. The cover is shown below.

History book cover
Because the book is open-source, I am taking the liberty of copying the text of my chapter into this posting verbatim (resisting temptation to edit or update):

A Passion and a Calling

My interest in biomedical and health informatics goes back to my high school days in the northern suburbs of Chicago in the early 1970s. I was introduced to computers when my school acquired a Hewlett-Packard 9830A, which was the size of a suitcase and had a built-in single-line LED display, thermal printer on top, and cassette tape storage unit. I learned how to program it in BASIC. I was also a cross-country and track runner in high school, which led to my interest in health and medicine. Running also taught me self-discipline, which helped me achieve goals later in life.

I went off to college at the University of Illinois Champaign-Urbana, where I intended to major in computer science (CS). However, I found CS to be very different in college than in high school. My first courses using punch cards and the PL/1 programming language did not excite me. I did, however, enjoy working with PLATO, a networked system with (primitive, by today’s standards) bit-mapped graphics. Two years into college, I left CS to pursue a medical career. My interest in health and preventive medicine, together with youthful rebellion, provided the foundation for my interest in evidence-based medicine.

In medical school, also at University of Illinois, I met my first informatics faculty member, Dr. Allan Levy, who nurtured my interests. Of all my education, medical school was the least enjoyable. I did not like the massive amount of rote memorization, which contributed to my later attraction to informatics. In my third year of medical school in 1983, I purchased my first computer, a Commodore 64: I hooked it up to my television as a monitor and to my phone via a 300-baud modem, and connected to Compuserve, which had a medical bulletin board called MedSIG. There I met Col. Gordon Black, who encouraged a number of us early informaticians, including long-time colleague, Rob McClure.

The reigniting of my interest in computers continued to grow as I started an internal medicine residency in 1984 at University of Illinois Hospital in Chicago. During my residency, it became apparent to me that I wanted to combine medicine and computers in my career. I came to learn about a field called “informatics,” but without Google or other search engines, there was no easy way to find more information. This led me to write letters and make phone calls to people like Ted Shortliffe, Bob Greenes, Clem McDonald, Perry Miller, and Scott Blois. I ultimately learned about National Library of Medicine (NLM)–funded informatics fellowships and chose to pursue one in the Harvard program under Bob Greenes at Brigham and Women’s Hospital in Boston. In 1987, after having lived my whole life in Illinois, I headed off with my wife to start my informatics fellowship. It was quite a change for me, with my previous daytime focus on medicine and intermittent nights-and-weekends focus on computing now flipped. In the fellowship, I could do computing almost all the time and practice medicine on the allowed one day per week. During this fellowship, it quickly became clear that informatics would become my life’s calling.

Like many working in informatics in the 1980s, I initially tried to find a research interest and niche in artificial intelligence (AI) systems of the day. One early attraction was knowledge representation, and this led to Bob involving me in his work on the Unified Medical Language System (UMLS) project that had been launched by the NLM in 1986. But the progress of the first generation of AI was sputtering by then, and almost by accident I came across a report on the topic of information retrieval (IR) authored by Bruce Croft, a computer science professor at the University of Massachusetts at Amherst. There was very little research going on in IR in medical informatics, and the main work emanated from the development of MEDLINE (although Mark Frisse had done some important work during that time in applying IR to the emerging world of hypertext). Croft’s report steered me to the most prolific researcher and author in the IR field, Gerard Salton. Many current senior leaders in IR trained as PhD students under Salton, and I was also profoundly influenced by his work. I had the chance to meet Salton when he came to give a talk at Harvard. He was intrigued by my interest in IR applied in the medical domain. I have always thought it was most unfortunate that Salton never lived to see the wide reach and impact of his ideas and work in modern search engines, as he passed away in 1995.

My clinical background dampened my enthusiasm for the relatively clunky and time-consuming AI systems of the 1980s and heightened it for IR. I was intrigued by the idea of physicians and others being able to access knowledge at the point of care. My perception of IR systems at the time was that they were limited, with systems doing just word-based searching on text or requiring complex Boolean queries over human- assigned indexing terms. My interest in IR, combined with the advancing UMLS project, led me to pursue a line of research that combined concept-based automated indexing to enhance retrieval that applied the statistical approaches developed by Salton and others. This led to me to develop and implement a system called SAPHIRE, which was the focus of my early research.

During my fellowship, I was also briefly involved with a project that would later become a highly successful commercial product. Bob had been visited by Burton Rose, a nephrologist at Brigham and Women’s who was enamored with a new tool that shipped with the Mac called Hypercard. He believed that small chunks of information on each “card” in a Hypercard “stack” could be highly useful to physicians. But as the quantity of information grew, he needed a search capability that was better than that which shipped with Hypercard. I programmed the search capability for the first version of what would later be called UpToDate, which ultimately achieved great commercial success. At the end of my fellowship, I handed this project off to another fellow, Joseph Rush, who continued to work on UpToDate for many years.

As my fellowship was ending, I knew that I wanted to pursue a career in academic medical informatics. One person I came to know was Bob Beck, who at the time was heading the informatics program at Dartmouth College. By the fall of my last year of fellowship, Bob had moved to Oregon Health Sciences University (OHSU) to start a new program there funded by the NLM IAIMS program, bringing with him another faculty member, Kent Spackman.

While I had some other job possibilities, my wife and I, now with a one-year-old daughter, packed up and moved to Portland in July 1990. My first activity in the new job was to submit an NIH R29 proposal that I had been working on in the latter months of my fellowship. Also called a FIRST Award, this type of grant was a common pathway for new researchers to launch their careers. Several months later, I was notified that it would be funded, which jump-started my academic career.

In 1990, Oregon voters passed a property tax limitation measure which ultimately led to Bob Beck losing resources and leaving in 1992. This left behind a very junior faculty, led by Kent, but as Kent wasn’t interested in building a program, he devolved the leadership to me. By 1996, our young academic group was starting to achieve sustained success. This led the Dean of the OHSU School of Medicine at the time, Joseph Bloom, to encourage our unit to become more visible on campus. The usual way of doing this at OHSU was establishing a so-called free-standing division, which was the path to establishing a department. This also provided me a seat at the table of clinical department chairs, which I maintain to this day.

I was interested in teaching from the beginning of my faculty career, and when Kent asked me to organize the introductory informatics course—something I still teach to this day—it led to many others, like the one I teach in the 10x10 program. When I started my fellowship, and then my faculty position at OHSU, I never realized how much of a passion teaching would become for me. I always enjoyed teaching because it gave me a chance to learn as well as develop a coherent organization for various topics. My path down the road to my current leadership in education was also greatly influenced by those I taught. In particular, while I assumed that our educational program would be small and aim to produce researchers like myself, there were a number of students who were interested in more varied careers, such as the small but growing number of professional positions in healthcare settings or industry. This resulted in our new Master of Science program taking on a more practical orientation. But that was fine, as the research of many of our faculty, such as Paul Gorman and Joan Ash, was motivated by real-world concerns in the application of informatics.

Even with my growing interest in education and my leadership responsibilities in our emerging program, I still maintained my interest in research. While it became more difficult to develop new IR systems when giants like Google and PubMed emerged, my interest in evaluating how well people used IR systems for health and medical reasons became the main focus of my research. In 1996, I published the first edition of my book, Information Retrieval: A Health Care Perspective.

By 1999, as I was contemplating ways to expand our educational program, a number of people had asked if we planned to offer our courses via distance learning. I decided to offer my now-mature introductory course in this manner, which was quite successful. There was an untapped market for distance learning in informatics, and the success of my initial course led me to convince the faculty to add this format to the program. This foray into distance learning distracted us from another goal we had in the late 1990s, which was to establish a PhD program. We finally accomplished this when our NLM training grant was renewed in 2002. At this point I became PI of the training grant.

Another pivotal career event for me came when Charlie Safran was President of AMIA (back in the days when the AMIA President was an elected position). He was convinced that the US needed more professionals, especially physicians and nurses, trained in informatics. Charlie believed the US needed at least one physician and one nurse trained in informatics in each of the nearly 6000 hospitals in the US. Also at this time, AMIA was looking to develop some sort of introductory course in biomedical informatics. However, the prices quoted to them by vendors were beyond their means. As I already had my introductory course from our graduate program, I proposed to AMIA that we repackage my online course. I came up with a name, 10x10 (pronounced “ten by ten”), based on Charlie’s one physician and nurse in 5000+ hospitals, and set a goal for doing so by 2010. Because the course already existed, we were able to put in place a Memorandum of Understanding between OHSU and AMIA and launch the first offering of the course in just a few months. The next President of AMIA, Don Detmer, called 10x10 one of the association’s most successful programs ever.

My interest in education and training spurred my interest in workforce development for the field. In 2006, I was invited to organize the surprise retirement event for long- time academic leader, originally from Germany and later from Victoria, Canada, Jochen Moehr. I gave a talk entitled Who are the Informaticians, What We Know and Should Know, which I later published in JAMIA. This interest was fortuitous, since the US economy would soon enter free fall, leading to the American Recovery and Reinvestment Act (ARRA), the economic stimulus bill that included the Health Information Technology for Economic and Clinical Health (HITECH) Act. While HITECH was best known for its $30 billion “meaningful use” program of incentives for EHR adoption, it also included $118 million for workforce development, motivated in part by some research I published showing a need for more informatics professionals. I played a large role in the grants that were competitively awarded by the HITECH Workforce Development Program, including being funded as the National Coordination and Dissemination Center for the health IT curriculum that was funded through the program.

During and after HITECH, I continued to provide leadership for informatics education and its relationship to other careers in the field. I was also a leader in the new clinical informatics physician subspecialty, being appointed by AMIA to direct the Clinical Informatics Board Review Course (CIBRC), which was offered in time for the first board examination in 2013. The next year I laid the groundwork at OHSU to establish one of the first four Accreditation Council for Graduate Medical Education (ACGME)–accredited fellowships for the new subspecialty, which launched in 2015. Around this time, I also had the opportunity to develop informatics education for non- informaticians, namely medical students. Along with colleagues at OHSU, we began to implement informatics education in the MD curriculum (just in time for my younger daughter to become a medical student!).

I have now been at OHSU for nearly 30 years, where I have had the opportunity to continue my research and teaching, and lead my department. Another critical activity of mine now is to mentor young faculty, who one day will sustain and lead our program.

Monday, November 15, 2021

A New Systematic Review Highlights the Current State and Limitations of Clinical AI Use and Efficacy

When I teach about the features of search engines like PubMed, I often quip that if you use the limit function to narrow your search to randomized controlled trials (RCTs), which are the best evidence for medical and health interventions, and you still have many retrievals, there is probably some enterprising researcher who has done a systematic review on the topic. Some actually worry that we have too many systematic reviews these days, not always of the greatest quality.(1) But such reviews, especially when done well, can not only be important catalogs of research on a given topic but also provide an overview of the breadth and quality of studies done.

Sure enough, we have started to see systematic reviews on artificial intelligence (AI) and machine learning (ML) applications. A new systematic review covers all of the RCTs of interventions of AI applications.(2) I hope the authors will keep the review up to date, as one limitation of systematic reviews published in journals is that they become out of date quickly, especially in rapidly moving areas such as AI.

As we know from evidence-based medicine (EBM), the best evidence for the efficacy of interventions (treatment or prevention) comes from RCTs. Ideally, these trials are well-conducted, generalizable, and well-reported. EBM defines four categories of questions that clinicians ask: intervention, diagnosis, harm, and prognosis. As such, there are other clinical questions that can be answered about AI beyond those about interventions. For example, can AI methods improve the ability to diagnose disease? Can AI identify harms from environment, medical care, etc.? And finally, can AI inform the prognosis of health and disease? Ultimately, however, AI interventions must be demonstrated experimentally to benefit patients, clinicians, and populations. There are of course some instances when RCTs are infeasible so observational studies may be justified.

In this context, we can review a recently published systematic review of interventions using AI clinical prediction tools of Zhou et al.(2) This systematic review categorized AI methods into three groups: traditional statistical (TS), mostly regression; machine learning (ML), all ML but deep learning; and deep learning (DL), i.e., applications using multi-layered "deep" neural networks. TL and MS tools were found to be used for three functions: assistive treatment decisions, assistive diagnosis, and risk stratification, whereas DL tools were only assessed for assistive diagnosis.

Typical as happens in most systematic reviews, the authors found over 26,000 papers published and retrieved by their broad MEDLINE search, but of those, there were only 65 RCTs identified. Once identified, the 65 trials were reviewed for a number of characteristics. One important characteristic was whether or not studies demonstrated a benefit for AI, i.e., had a positive result. Of course, counting numbers of positive vs. negative results is not necessarily an indicator of the value or generalizability of a particular method of AI or any other clinical intervention for that matter. Nonetheless, the authors did find that 61.5% of the RCTs had positive results and 38.5% negative results.

As AI can be used for many conditions and functions in medicine, it is important to get a sense of what was studied and what tools were used. The authors found use for AI in a variety of disease categories: acute disease (29%), non-cancer chronic disease (28%), cancer (17%), primary care (14%), and other conditions (12%). Of the predictive tool function used, use was most often for assistive treatment decisions (54%), followed by assistive diagnosis (25%) and risk stratification (19%). There were the most studies used for TS (57%), followed by ML (26%) and DL (17%). These differences may reflect the more recent development and use of ML and especially DL. The rates of positive studies for the tool types were highest for DL (82%), followed by ML (71%) and TS (51%), although it should be noted that the rate of positive results was also inversely related to the number of trials for each tool type.

A table in the paper shows that there were differences by tool categories. TS tools were mostly likely to be used with clinical quantitative data (97%), applied in acute disease (43%) and primary care (24%), and used for assistive treatment decisions (60%) followed by risk stratification (30%). ML tools were also most likely to be used by clinical quantitative data (94%), applied in chronic disease (77%), and used for assistive treatment decisions (77%). DL tools were most likely to be used with imaging data (91%), applied in cancer (91%), and used exclusively for assistive diagnosis (100%). In particular, the DL studies almost exclusively evaluated assistance of gastrointestinal endoscopy, with all nine such RCTs showing positive results and the two trials of other applications and diseases having negative results. Also of note, only two of the 65 RCTs made use of natural language data for input, one ML and one DL.

Systematic reviews typically appraise included studies for risk of bias, or in other words, the quality of their methods to produce an unbiased result. This provides confidence that the research methods were robust and well-reported so readers can have confidence that the results obtain are true. Unfortunately, there were a number of concerns that led to 48 (74%) of the trials being classified as having high or indeterminate risk of bias. This was due to a number of factors:

  • One-third of the trials carried out no sample size estimation to determine what would be the number of subjects needed to achieve a statistically significant benefit
  • Three-fourths of the trials were open-label, so had no masking of the AI system from its users
  • Three-fourths did not reference the CONSORT statement, a 37-item checklist widely used for reporting the details of RCTs and recently extended for AI trials
  • Three-fifths did not apply an intent-to-treat analysis, which evaluates subjects in the study groups into which they were originally assigned
  • Three-fourths did not provide reference to a study protocol for the trial

The rate of outcomes of studies for low risk of bias trials was somewhat comparable to the overall rates, with positive outcomes in 63% of TS, 25% of ML, and 80% of DL trials.

What can be concluded from this systematic review? We certainly know from the vast amount of other literature that a large number of predictive models have been built using AI techniques and shown to function well for a wide variety of clinical conditions and situations. We probably cannot do an RCT of every last application of AI. But at this point in time, the number and variety of RCTs assessing benefit for interventions of AI is modest and uneven. While a number of positive results have been demonstrated, the studies published have not been dispersed across all of the possible clinical applications of AI, and three-fourths of the reports of the trials show indeterminate or high risk of bias. DL methods in particular must be assessed in the myriad of areas in which data sets have been developed and models trained.

There are some problems with the systematic review itself that mar the complete understanding of the work. Table 2 of DL interventions has data missing in its leftmost column that connects the data in the column to its original reference. This table also does not include a recent paper by Yao et al.,(3) which was likely published after the review was completed. It is also difficult to use the data in Supplementary Table 4 of ML interventions, which is provided in a PDF file that is difficult to read or browse. In addition, while the paper references a high-profile study by Wijnberge et al.,(4) it is not listed in ML table. This study may well be classified as TS, but this demonstrates another limitation of the systematic review, which is that there is no data or table that details TS interventions. The authors were kind enough to provide Excel files of the DL and ML tables, but they really should be part of the online materials for the systematic review. I do hope they or someone will keep the review up to date.

As it stands, this systematic review does give us a big-picture view of the clinical use and benefit for AI at this point in time, which is modest, disproportionate, and based on studies using suboptimal methods. We can conclude for now that AI predictive tools show great promise in improving clinical decisions for diagnosis, treatment, and risk stratification but comprehensive evidence for the benefit is lacking.

This systematic review also highlights a point I have written about in this blog before, which is that AI interventions need translation from basic science to clinical value. In particular, we need clinically-driven applications of AI that are assessed in robust clinical trials. There of course must also be attention to patient safety and to clinician workflow. In general, we need robust AI and RCT methods that are replicable and generalizable, and of course we must conduct implementation and trials from a health equity standpoint.

References

1.     Ioannidis JPA. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. Milbank Q. 2016 Sep;94(3):485–514.
2.     Zhou Q, Chen Z-H, Cao Y-H, Peng S. Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. NPJ Digit Med. 2021 Oct 28;4(1):154.
3.     Yao X, Rushlow DR, Inselman JW, McCoy RG, Thacher TD, Behnken EM, et al. Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial. Nat Med. 2021 May;27(5):815–9.
4.     Wijnberge M, Geerts BF, Hol L, Lemmers N, Mulder MP, Berge P, et al. Effect of a Machine Learning-Derived Early Warning System for Intraoperative Hypotension vs Standard Care on Depth and Duration of Intraoperative Hypotension During Elective Noncardiac Surgery: The HYPE Randomized Clinical Trial. JAMA. 2020 Mar 17;323(11):1052–60.

Saturday, November 13, 2021

This Year's Eco-Event: An E-Bike

If last year's eco-event for me was installing solar panels on the roof of my house, this year's event was the purchase of an electric bike (e-bike). I have to admit I am not completely virtuous when it comes to my carbon footprint. While my time spent in airplanes has reduced dramatically during the pandemic, it is likely to increase, although probably not to pre-pandemic levels, as we all return to travel.

But one thing I do hope as there is eventual return to working in the office is to commute in all but the worst weather by e-bike. One of the challenges for bicycle commuting for me is that although I only live 4-5 miles (depending on the route) away from my office, I must ascend over 500 feet vertically to get between my home and office. (Those in Portland know the Fairmount Loop just below the Council Crest hill that sits above Oregon Health & Science University.)

Likewise, a similar hill separates my home from the short distance to downtown and other parts of Portland that would be wonderful to leisurely ride to on a bicycle. I sometimes do this ride on a regular bicycle, but it is a real workout. There are other times when I would just rather get somewhere without breaking a major sweat.

After doing much online research, I decided to test drive, and eventually purchase, an Aventon Pace 500 Step-Through e-bike. There are many e-bike options, and the market for these products is not yet fully mature. But I have enjoyed my e-bike, and I have to say it is actually quite fun to ride. There is nothing like a little electric assist when riding a bike, especially in the hills of the west side of Portland.

My e-bike
Another aspect of the Pace 500 I enjoy is it being a Class 3 bike, which allows it to have a throttle. While I use the throttle sparingly, I do find it of great value when I need to accelerate quickly, such as at a traffic light turning green or starting up a hill. The Class 3 status also allows the motor to speed the bike to up to 28 mph in its highest (5) pedal assist level, although I try to ride at no higher than pedal assist 2, which gives me plenty of power even up hills.