Wednesday, August 21, 2019

An Information Retrieval Researcher’s Peer Review of Recent Studies of Search Engine Influence on Voting Behavior

A good part of my informatics research work over three decades has focused on the evaluation of search, also called information retrieval or IR. I have been amazed as the reach of search systems has become a mainstream part of our society, especially given that when I started, IR systems were only used by those who had computers with accounts from companies offering subscription search services.

Now, however, searching is ubiquitous. Indeed, it is almost impossible not to search, as it is offered in the address bar of most Web browsers. In addition, the name of one famous search engine, Google, has become a verb that synonymous with searching, i.e., Googling. Few of us can imagine a world without information on almost any topic being available nearly instantaneously.

It was therefore of interest this week when the President of the United States latched on to some research purporting that manipulation of Google was responsible for shifting three or more million votes to Hillary Clinton, which happens to be the amount of popular votes that she received over Donald Trump in the 2016 election (despite his narrow victory in the Electoral College).

This research has been put forth by Robert Epstein, PhD, who claims to be a liberal Democrat, as if that somehow indicates his analysis is not biased. Of course, one’s political views should not have any influence over the outcomes of their research.

Let’s look at Epstein’s multifaceted claims and the evidence supporting them from the standpoint of an IR researcher. First is the “finding” that Google manipulated search results to retrieve information “biased” toward Clinton. And second is that the retrieval of this information resulted in shifting of votes from Trump to Clinton.

The finding of manipulated search results comes from the paper posted as a PDF to Epstein’s Web site. As such, it is not peer-reviewed. The paper claims to show that in the run-up to the 2016 election and afterwards, 95 individuals, 21 of them of whom designated themselves as “undecided,” had their Google searches tracked and sent to a crowdsourcing site, Mechanical Turk, for rating as to whether they were biased toward Clinton or Trump. They eliminated searches from people who had Gmail accounts due to an unsubstantiated assertion that Google provided such users different results (which the company denies).

If I were sent this paper for peer review by an IR journal, I would ask the following: How did the researchers choose the individuals for the study? What evidence supports excluding those who had Gmail accounts? Who were the people on Mechanical Turk who did the ratings for the study? How were they instructed by the researchers to determine “bias?” I would certainly demand answers to questions like these before I would recommend acceptance for publication. The Methods section of the paper would need to be substantially expanded.

Let’s say, however, that the authors came back with acceptable answers to my questions, and the study were published. What about the second claim that this bias could lead to “manipulating” anywhere from 2.6-10.4 million votes in Clinton’s favor? The evidence for this comes from a paper that was published in a peer-reviewed journal, a prestigious one at that, the Proceedings of the National Academy of Sciences (PNAS). That study, published in 2015, looked at five randomized trials assessing the “search engine manipulation effect” (SEME).

These studies may be credible, but it is dubious whether they can be used to claim biased search results may have impacted voting in the US 2016 election. The first three experiments in the PNAS paper recruited individuals in the San Diego, CA area to rate who they might vote for the two candidates in the Australian Prime Minister election (chosen because most in San Diego would be unlikely to have prior knowledge). A fourth experiment replicated the first three with a national audience of individuals recruited from Mechanical Turk, while a fifth experiment recruited undecided voters to assess information about candidates in a local election in India. There should be no question that any kind of exposure to information can influence one’s decision about voting, although it would be questionable whether these sorts of results could be applied to a national US election where these same people would be bombarded by articles, reports, advertising, and other sorts of information, perhaps even Fake News promulgated by foreign entities on Facebook or Twitter.

Epstein fused the results of this research together to claim that biased search results moved several million votes in the direction of Clinton in the 2016 election. He took this conclusion to a receptive audience of Republicans in the US Senate. The outcome was predictable, with no skepticism whatsoever. And then came the crowning glory of it all, a Presidential tweet.

The mainstream fact checkers had a field day with these claims. Clearly one incompletely reported and probably highly flawed study, fused with another one showing that in some instances, search results can influence voting behavior, is hardly evidence that alleged bias by Google moved votes to Clinton in 2016. Here are some of their assessments:
The results of all this remind of the famous joke by comedian Stephen Colbert, who once noted, reality has a liberal bias. I do believe it is important and I strive to keep political biases out of our research. Even in my teaching, I aim to present opposing points of view, although not aiming to give equivalence to all points of view. But research like this needs to be called out for its thinly veiled political goals, and I suppose on that front, its “results" can be called successful.

Sunday, August 4, 2019

Living the Asynchronous Life

I have always been highly productive in my academic career, finding time to teach, carry out research, write, manage my department, and mentor faculty and students. I believe that one of the reasons for my productivity is my leading a relatively asynchronous life.

What do I mean by asynchronous? In essence, most of my work is done by myself, wherever and whenever. And if I hit a block (or get tried of something), I can leave one task and move to another.

A good part of the asynchronous nature of my work comes from my teaching, which these days is mostly online. Most of my courses involve recorded (though frequently updated) lectures delivered online, followed by online discussion through the Oregon Health & Science University (OHSU) learning management system. I also interact with students by email and even sometimes by phone.

This mostly asynchronous work is also amenable to my lifestyle. Being a (very early) morning person, I can get most of my important and creative work done early in the day. A typical day for me involves 2-4 hours of work, exercise, and arriving at the office between 9-10 am. (Leading some to believe I just roll into the office late, when I have in fact have already been working for several hours.)

Another advantage of my mostly asynchronous work is that it allows me to pursue other aspects of my job that I enjoy, such as travel. I have often noted that I get some of my best work done in hotel rooms during travels. In almost every corner of the world, I can get access to the Internet, all my internal OHSU resources, and OHSU's learning management system. Having 3-4 hours uninterrupted in a hotel room can result in completing a day’s worth of work. And while not all of my work is amendable to being done on airplanes, I can choose work that is when flying. (It does not hurt to have status that gives me more roomy seating (and even sometimes upgrades).

There are some downsides to the asynchronous life. One is that the capacity for switching across tasks is not infinite. As with multitasking computers, too many tasks, and switching between them too frequently, can lead to so-called deadlock.

Another downside is that it can sometimes seem that work never ends. Even when a project is completed, there are others still demanding attention.

Nonetheless, given a combination of my satisfaction of creating academic papers, talks, and hire, plus my enjoyment of getting to see the world, I will always enjoy living the asynchronous life.

Tuesday, July 16, 2019

The Next Chapter in Continuing Education for Informatics

This week, the Department of Medical Informatics and Clinical Epidemiology (DMICE) of Oregon Health & Science University (OHSU) launched a new annual continuing education (CE) activity in clinical informatics. With the first offering of the OHSU Annual Update in Clinical Informatics, a selection of important topics will be covered to provide an update for all clinical informatics professionals. For physicians, the course will provide continuing medical education (CME) credit. For physicians certified in the clinical informatics subspecialty, the course will provide MOC-II/LLSA credits.

The field of clinical informatics (and closely related health informatics) is a growing profession that plays an important role in healthcare and other health-related areas [1]. Informatics professionals insure that data and information are used most effectively to improve healthcare, public health, individual health, and research. The certification initially of physicians [2] and soon others in the field [3] requires that all informatics professionals maintain and expand their knowledge and skills.

This course builds off the extensive informatics education offerings of DMICE, from our biomedical informatics graduate program that has awarded 831 degrees and certificates over more than 20 years to our other innovative activities such as the AMIA 10×10 (“ten by ten”) program, the development of online learning in informatics, and launching one of the first clinical informatics subspecialty fellowships for physicians [4].

The learning activity consists of 7 modules that are estimated to take a total of 8 hours to complete. The activity is completely online, and consists of lectures and self-assessment quizzes. The topics for the 2019 annual update were selected by DMICE faculty. Topics for future annual updates will be chosen with input from those who completed previous annual update courses.

After taking this learning activity, clinical informatics professionals will be able to (1) be aware of current advances in clinical informatics. (2) apply these advances to their professional practice, and (3) meet required competencies that are related to the domain of clinical informatics in the practice of their profession.

The activity will consist of a number of talks given by DMICE faculty that will focus on recent developments in the field. The activity will be hosted on OHSU’s Sakai learning management system as enduring learning material. Once learners enroll in the activity, they will have access to Sakai and be able to complete the activity and evaluations at their own pace. Each talk will be accompanied by a post-test (multiple choice), and learners will also need to complete a course evaluation at the end of their learning. The 2019 course activities must be completed by June 30, 2020.

The topics covered in this year’s offering of the course include:
  • Operational Clinical Informatics
  • Organizational Behavior
  • Data Science and Machine Learning
  • Clinical Research Informatics
  • Informatics Education
  • Nursing Informatics
Details of this online CE experience are available at:

This is not the only continuing education activity in clinical informatics that will be offered by OHSU. In the coming year, we will also offer for CME and ABPM MOC-II credit our monthly clinical informatics journal clubs and grand rounds.


1. Fridsma, D. (2019). Strengthening our profession by defining clinical and health informatics practice. Journal of the American Medical Informatics Association, Epub ahead of print.
2. Detmer, D., & Shortliffe, E. (2014). Clinical informatics: prospects for a new medical subspecialty. Journal of the American Medical Association, 311, 2067-2068.
3. Gadd, C., Williamson, J., Steen, E., & Fridsma, D. (2016). Creating advanced health informatics certification. Journal of the American Medical Informatics Association, 23, 848-850.
4. Longhurst, C., Pageler, N., Palma, J., Finnell, J., Levy, B., Yackel, T., . . . Hersh, W. (2016). Early experiences of accredited clinical informatics fellowships. Journal of the American Medical Informatics Association, 23, 829-834.

Monday, July 8, 2019

Kudos for the Informatics Professor - Winter/Spring 2019 Update

I have had a busy but productive early 2019, with invited talks, publications, and other happenings.

I gave a few invited talks:
The latter was a real honor, as it took place at my medical school alma mater, University of Illinois Chicago, and it was fun to see both informatics colleagues as well as some former classmates who attended the lecture.

I also had the opportunity to attend the inaugural induction of Fellows of the American Medical Informatics Association (FAMIA) at the AMIA Clinical Informatics Conference in Atlanta, GA on May 1, 2019. The initial group of FAMIA included 15 alumni and faculty of the OHSU Biomedical Informatics Graduate Program, comprising over 11% of the inaugural fellows. Below is a picture of OHSU alumni and myself who attended the induction ceremony.

I continue to serve on several scientific advisory boards:
  • Pan African Bioinformatics Network for H3Africa (H3ABionet), which provides bioinformatics support for the Human Heredity and Health in Africa Project (H3Africa). I will be attending my second meeting of the board in Cape Town, South Africa in July.
  • RCMI Multidisciplinary And Translational Research Infrastructure EXpansion (RMATRIX), a translational research center of the John A. Burns School of Medicine of the University of Hawaii. The grant funding of this is ending, so this board will also be ending.
  • ECRI Guidelines Trust (EGT) Technical Advisory Panel (TAP), a publicly available web-based repository of objective, evidence-based clinical practice guideline content that succeeds the Agency for Healthcare Quality & Research (AHRQ) National Guidelines Clearinghouse. (I also served on the Technical Expert Panel of National Practice Guidelines Clearinghouse when it was developed and launched by AHRQ from 1998-2002.)
I am also co-author on a couple papers that were published:
I also had the pleasure, as I do every June, of seeing a new group of graduates from the OHSU Biomedical Informatics Graduate Program. This year saw 45 new alumni of the program honored. With these new graduates, the program has now awarded a total of 831 degrees and certificates dating back 22 years to the first graduates in 1998 (who started when the program launched in 1996). As some have completed more than one program degree or certificate (e.g., the Graduate Certificate and Master’s or the Master’s and PhD; one person has done all three!), the program has a total of 746 alumni.

Finally, AMIA has produced and posted videos for several of the courses in the 10x10 program. A video of myself describing the OHSU course has been posted on the main page for the program on the AMIA Web site.

Monday, June 24, 2019

Introducing Informatics.Health

About a year ago, the Internet Assigned Numbers Authority (IANA) launched a new set of top-level Internet domains. One of these was .health. When to my surprise, the domain name was still available, I immediately grabbed it. I am now pleased to launch my first use of the domain name, which is a re-direct to my well-known site, What is Biomedical and Health Informatics?

In honor of this launch, I have completely updated the "What is...?" site, which I use to both provide an overview of the field to those interested and also demonstrate the online learning technologies that we use in our Biomedical Informatics Graduate Program at OHSU.

The main part of the site consists of the following lecture segments (time in parentheses):
  • What is Biomedical and Health Informatics (1)? (23:30)
  • What is Biomedical and Health Informatics (2)? (17:41)
  • A Short History of Biomedical and Health Informatics (21:33)
  • Resources for Field: Organizations, Information, Education (24:37)
  • Examples of the Electronic Health Record (EHR) (24:08)
  • Data Science and Machine Learning (1) (14:26)
  • Data Science and Machine Learning (2) (20:09)
  • Information Retrieval (Search) (24:05)
  • Information Retrieval Content (29:26)
The site also contains links to books, articles, organizations, and educational Web site.

Over time, I will probably move the site to a new server, and eventually I may develop different content for it. However, I will always want the site to be an overview of the biomedical and health informatics field. 

Saturday, June 22, 2019

Recovering from a Computer Demise, 21st Century Edition

My professional and personal lives have involved computer use for many decades. As time has gone on, the proportion of my life tied to computing devices has increased. Whereas in the early days I mainly saved programs and simple documents, my life is now intertwined with computers, smartphones, and tablets; covering virtually everything I do professionally and also a large amount of personal activities, from pictures to music to documents and more.

Although computers are more reliable than ever, the impact of a failure is more catastrophic in the present when so much of one's life is tied up in them. The recent sudden demise of my MacBook Pro reminded me of this.

Back in the early days, as well as now, the most critical problem where is a computer failure is loss of one's data. Because data loss was more common in those early days, I have always backed up my data frequently. In modern times, the easiest way to do this as a Mac user is through the use of Time Machine, which is built into the operating system. I keep two hard disks for this purpose, one to take with me whenever I travel and another to stay home in case everything is lost or stolen on a trip. Then and now, I have always backed up my data about once a day. This habit fortunately minimized the impact of my recent computer demise.

I have actually experienced only a handful of computer failures in my decades of using them. But a few weeks ago, while flying home from Singapore, fortunately on the last leg of the trip from San Francisco to Portland, my MacBook Pro just died. No amount of trying to reset the System Management Controller (SMC) or anything else helped. The battery was far from dead, so plugging in the computer did not help either. The death was verified by the OHSU IT department after I landed and brought it in to them.

Fortunately my department had a recently re-imaged MacBook Pro for me to use immediately. But most fortunately, I had backed up my now-dead machine about 24 hours earlier.

When I get a new Mac, I generally prefer not to restore the entire computer image in Time Machine. Even though it would be faster, I know of those who have had problems with this approach, and I prefer to re-build my machine by re-installing the data and then the individual apps. I also like the opportunity to do some "housecleaning" to get rid of applications I am not actively using and mostly clutter machine.

(I also have a systematic method for backing up all data I want to maintain in the long run. I have been doing this as well since the late 1980s and have an archive of essentially my entire career, even though some files from those early days no longer have applications that can open them. Microsoft Office, for example, will not open those Word/Excel/Powerpoint-format files from those times, but the files can be opened by a text editor.)

With the new MacBook Pro, I was quickly able to restore my data from my Time Machine disk, which gave me all of my data from about 24 hours prior. This meant I would be losing all work I had done in the last 24 hours since that backup, which was not insubstantial, since I had been working during my last hours in Singapore and then my long trans-Pacific flight. I was able to retrieve a few things I had done within those 24 hours, for example documents I composed and sent by email, which were in the outbox of my mail client. (Doesn’t hurt to have had wifi on my long flight!)

It generally takes me a couple weeks to get a new computer fully restored from a prior one, and this case was no different. The MacBook Pro was sent to Apple for repairs and they had to replace the entire innards, so there was no way to recover anything from the old computer. But due to careful backing up and other processes, my computer demise was fortunately not too painful, and I re-learned the lesson of regularly backing up one's work.

I know there are also in modern times some processes that eliminate the need for users to actively back up their work, such as to a cloud-based location. But even this would be imperfect for me, since my Internet access is not yet completely ubiquitous (such as when the airplane wifi does not work or a local connection is too expensive or otherwise not available). So I imagine that my habit of regularly backing up my data will be a good one to keep for some time to come.

Thursday, April 11, 2019

Beyond Images and Waves: How Will Deep Learning Benefit Health or Healthcare?

Scarcely a week goes by without another study published of a deep learning algorithm that achieves accuracy comparable to or sometimes better than human experts. The vast majority of these studies focus on some application to diagnostic imaging or waveform interpretation. There is no question that the practice of medicine will be highly impacted by these systems, especially for practitioners in fields that directly use them, such as radiologists, pathologists, dermatologists, and so forth.

What about deep learning applied to other areas of medicine? One group of studies has applied deep learning to retrospective electronic health record (EHR) data. A number of studies have shown impressive abilities to use EHR data to predict or diagnose:
  • Several dozen diseases [1]
  • Length of stay, mortality, readmission, and diagnosis at two large medical centers [2]
  • Prognosis in palliative care [3]
  • 30-day readmission in heart failure [4]
  • Patient mortality from coronary artery disease more accurately than traditional cardiovascular risk models [5]
  • Early risk of chronic kidney disease in patients with diabetes [6]
  • Many pediatric diagnoses at a major referral center [7]
  • Clinical outcomes in rheumatoid arthritis [8]
One can see obvious use cases for these types of systems, such as being able to intervene when patients might fare poorly when hospitalized or are at risk for readmission or more serious outcomes after they are discharged. But making a faster or more accurate diagnosis from an imaging or waveform study is a different matter than trying to determine the best use of an algorithm that tells a clinical team that a patient may be heading toward a bad outcome. When do we apply the results of the system? Once we intervene, does this change the nature of future recommendations? These are fascinating research questions, but also big unknowns in terms of how to apply such data in the clinical setting.

I wrote back in 2014, and more recently in 2017, that these systems must go beyond their ability to predict and actually be used in the context of prescriptive systems that results in better outcomes for the patient and/or the healthcare system. I hope to see studies going forward where these sorts of systems become part of the patient care team, and lead to demonstration of Friedman’s Fundamental Theorem of Informatics [9], which is that humans aided by machines do better than humans or machines alone. A nice roadmap for such studies was recently published that notes the need for studies that include meaningful endpoints, appropriate benchmarks, transportable to other settings and systems, and including legal and ethical monitoring [10].

This sentiment is echoed in the excellent new book by Dr. Eric Topol, Deep Medicine [11]. Dr. Topol raises the notion of deep learning benefitting patient care in even additional ways. One could solve a problem that has vexed healthcare with the widespread adoption of EHRs, which is the introduction of a third entity into the patient-physician encounter, namely the computer. Clinicians now unfortunately spend too much time “feeding the beast,” especially when that beast distracts from the patient and has additional compliance and billing burdens, contributing heavily to our epidemic of burnout in clinicians [12].

Perhaps one area where deep learning might help reduce the clinician burden is in data entry. An intriguing new study was recently published by one of the same authors who contributed some of the EHR suites above, automatically charting symptoms from patient-physician conversations [13]. One also wonders whether the level of a patient visit for billing purposes, currently determined by the presence of various elements document in the medical record, could be replaced by other data easy to collect in the modern medical office, such as time spent with the patient, time with other aspects of care, and other requirements of care. While we may never be able to achieve a “computer-free” patient examination room, we will hopefully find ways to reduce its impact and burden.

Dr. Topol suggests a second major area of benefit for deep learning, which is monitoring patients in much more depth than our current approach to providing episodic healthcare. Of course, this must be done in ways that provide actionable information presented in ways that do not further overburden clinicians. But it is not beyond the pale to envision his view of algorithms the coalescence of -omics, personal sensors, physiological measurements, healthcare, and public health data coming together to give diagnostic, therapeutic, and prognostic advice. There are still many issues around cost, ethics, practicality, and so forth, so the benefits will need to outweigh the risks.

Thus, while we will likely see machine assistance in diagnosis from images and wave forms in the near future, the impact of what deep learning may do with patient data in the EHR and with the patient will likely take longer. I see exciting opportunities for research and development focus on how to prospectively determine how such systems fit into the workflow of patient interaction within and outside the healthcare system. A nice overview

1. Miotto, R, Li, L, et al. (2016). Deep Patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports. 2016: 26094.
2. Rajkomar, A, Oren, E, et al. (2018). Scalable and accurate deep learning for electronic health records. npj Digital Medicine. 1: 18.
3. Avati, A, Jung, K, et al. (2018). Improving palliative care with deep learning. BMC Medical Informatics & Decision Making. 18: 122.
4. Golas, SB, Shibahara, T, et al. (2018). A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data. BMC Medical Informatics & Decision Making. 18: 44.
5. Steele, AJ, Denaxas, SC, et al. (2018). Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE. 13(8): e0202344.
6. Ravizza, S, Huschto, T, et al. (2019). Predicting the early risk of chronic kidney disease in patients with diabetes using real-world data. Nature Medicine. 25: 57-59.
7. Liang, H, Tsui, BY, et al. (2019). Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nature Medicine. 25: 433-438.
8. Norgeot, B, Glicksberg, BS, et al. (2018). Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Network Open. 2(3): e190606.
9. Friedman, CP (2009). A 'fundamental theorem' of biomedical informatics. Journal of the American Medical Informatics Association. 16: 169-170.
10. Parikh, RB, Obermeyer, Z, et al. (2019). Regulation of predictive analytics in medicine. Science. 363: 810-812.
11. Topol, E (2019). Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. New York, NY, Basic Books.
12. Gardner, RL, Cooper, E, et al. (2019). Physician stress and burnout: the impact of health information technology. Journal of the American Medical Informatics Association. 26: 106-114.
13. Rajkomar, A, Kannan, A, et al. (2019). Automatically charting symptoms from patient-physician conversations using machine learning. JAMA Internal Medicine. Epub ahead of print.

Friday, March 29, 2019

Data Science, Biomedical Informatics, and the OHSU Department of Medical Informatics & Clinical Epidemiology

(The following is reposted from Health, Data, Information and Action, the blog of the Oregon Health & Science University Department of Medical Informatics & Clinical Epidemiology.)

Data Science is a broad field that intersects many other fields within and outside of biomedicine and health, including biomedical informatics. Data science is certainly an important component of research and educational programs in the OHSU Department of Medical Informatics & Clinical Epidemiology (DMICE).

What exactly is data science? There are many methods, but one consensus is, “the multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured” [1].

The definition of data science is somewhat different from the definition of biomedical informatics, which is “the interdisciplinary field that studies and pursues the effective uses of biomedical data, information, and knowledge for scientific inquiry, problem solving and decision making, motivated by efforts to improve human health” [2].

Clearly there is overlap as well as complementarity. As noted by Payne et al., biomedical informatics deals with a broader spectrum of data and information tasks, focused not only on what is learned from data but also how that is applied in a broader sociotechnical context [2].

Many DMICE research programs focus on aspects of Data Science:
  • Re-use of data from EHR (William Hersh, Aaron Cohen, Steven Bedrick) – leveraging data in EHR to identify patients as candidates for research studies and signals for rare diseases (porphyria) [3]
  • Documenting genomic variation in leukemia (Shannon McWeeney) – allowing for repurposing of drugs [4]
  • Quality of data for clinical care and research (Nicole Weiskopf) – methods for insuring completeness and comprehensiveness of data for use in research, quality measurement, and other tasks [5]
  • Urinary microbiome in health and disease (Lisa Karstens) – identifying role of microbiome and how its genetics can be leveraged for diagnosis and treatment [6]
  • Use of ambient data to detect and manage clinician strain – Dana Womack, Paul Gorman [7]
DMICE educational programs include Data Science in many of their courses. Our Bioinformatics & Computational Biomedicine (BCB) major includes:
  • Data Harmonization and Standards for Translational Research - BMI 533/633 (Instructors: Melissa Haendel, Ph.D., Ted Laderas, Ph.D., Christina Zheng, Ph.D.)
  • Management and Processing of Large Scale Data –BMI 535/635 (Instructors: Michael Mooney, Ph.D., Christina Zheng, Ph.D.)
  • Computational Genetics -BMI 559/659 (Instructor: Shannon McWeeney, Ph.D.)
  • Bioinformatics Programming and Scripting - BMI 565/656 (Instructor: Michael Mooney, Ph.D.)
  • Network Science and Biology- BMI 567/667 (Instructor: Guanming Wu, Ph.D.)
  • Data Analytics –BMI 569/669 (Instructors: Brian Sikora, Delilah Moore, Ted Laderas, Ph.D.)
Our Health & Clinical Informatics (HCIN) major includes:
  • Introduction to Biomedical and Health Informatics - BMI 510/610 (Instructor: William Hersh, M.D.)
  • Analytics for Healthcare - BMI 524/624 (Instructors: Abhijit Pandit, M.B.A., Tracy Edinger, N.D.)
  • Clinical Research Informatics- BMI 523/623 (Instructor: Nicole Weiskopf, Ph.D., Robert Schuff)
We also have developed ample instructional materials in Data Science for other learners:

1. Donoho, D (2017). 50 years of Data Science. Journal of Computational and Graphical Statistics. 26: 745-766.
2. Payne, PRO, Bernstam, EV, et al. (2018). Biomedical informatics meets data science: current state and future directions for interaction. JAMIA Open. 1: 136-141.
3. Wu, S, Liu, S, et al. (2017). Intra-institutional EHR collections for patient-level information retrieval. Journal of the American Society for Information Science & Technology. 68: 2636-2648.
4. Tyner, JW, Tognon, CE, et al. (2018). Functional genomic landscape of acute myeloid leukaemia. Nature. 562: 526-531.
4. Weiskopf, NG, Bakken, S, et al. (2017). A data quality assessment guideline for electronic health record data reuse. eGEMS. 5(1): 14.
6. Karstens, L, Asquith, M, et al. (2016). Does the urinary microbiome play a role in urgency urinary incontinence and its severity? Frontiers in Cellular and Infection Microbiology. 6:78.
7. Womack, D. (2018). Subtle cues: Qualitative elicitation of signs of strain in the hospital workplace. PhD Dissertation, Oregon Health & Science University.

Friday, March 8, 2019

Have We Passed the High-Water Mark of Seamless Technology?

Have we passed the high point of technology being seamless and easy to use? I recently had to give a presentation to faculty that included showing a few slides, and I assumed I could just bring my MacBook Pro, with its HDMI port, and plug it into the projector that was in the conference room on my campus where the meeting was taking place.

It was not meant to be. Despite having the right cables, my computer would not sync to the projector. That led to the search for other options. Which then led us into dongle hell. Maybe we should try the VGA adapter also available in the room. But of course my newer-model Mac had USB-C and not the older standard Mac connector that is still prevalent, at least among the dongles.

We finally got it all to work, although not without a lot of wasted time. The episode also got me thinking and wondering if we have passed with high-water mark of technology working seamlessly. Are those golden days overs?

I used to feel like an old-timer when I would reminisce about how technology used to be so awful in the late 20th century. I am sure I got snickers from the younger crowd when I bemoaned the old days of having to connect to my email over telephone modems, and trying to figure out the right number of commas in the ATDT command set of telephone modems to time when the lines would ask for things like long-distance codes to be entered. I also remember the days of bringing overhead transparencies of presentations just in case my computer would not sync with the archaic projectors. I recall the days before USB drives when transferring files between computers was sometimes impossible, especially with the disappearance of floppy disks (and presentation file sizes exceeding their capacity).

I believe that the golden days peaked shortly after the start of the 21st century. I would marvel that computers always now seemed to connect to and sync to projectors. First with wired Ethernet and then with the emergence of wifi, we were no longer at the mercy of telephone modems and noisy phone lines (or needing to find Internet cafes). We could connect to our email and institutional servers without make long-distance phone calls. We could even talk on the phone from almost anywhere via Skype.

But now we seem to be regressing. Newer projectors have more resolution, but don’t always work with older computers. Dongle hell is worst for the Mac, but the PC world is not immune. We have multiple connector types for projects, and different types of USB. An additional wrinkle is the need for encryption, and we can move files that pose very small risk (e.g., Powerpoint presentations) easily from where we create them to where we need to present them.

Technology still is marvelous when it works seamlessly. Hopefully the proliferation of cables, connectors, and security protocols will not make the golden era a distant memory

Thursday, February 14, 2019

Summary of New ONC and CMS Notices of Proposed Rulemaking for Health IT

This week, on the eve of the Healthcare Information Management Systems Society (HIMSS) conference, the Office of the National Coordinator for Health IT (ONC) and Centers for Medicare and Medicaid Services (CMS) each dropped a Notice of Proposed Rule Making (NPRM) concerning functionality, interoperability, certification, and others aspects of electronic health record (EHR) use.

These rules are required as described in the 21st Century Cures Act, which was major legislation to modernize aspects of biomedical research and health IT that passed with wide bipartisan majorities. I saw the act as a chance to "clean up" some of adverse and unintended effects of the Health Information Technology for Clinical and Economic Health (HITECH) Act (aka, meaningful use).

The Web page for the ONC NPRM not only has the 724-page proposed rule, but also a set of nine readable and understandable fact sheets about the rule and another seven devoted to aspects of the information blocking rule. A slide presentation from the HIMSS conference has a number of nice figures that diagram aspects of big picture. The fact sheets and slide presentation present a nice segue to then word-search in the 724-page PDF to find specific details.

Here is my summary of the key points of the ONC rule:
  • A new US Core Data for Interoperability (USCDI) that adds provenance information, 8 types of clinical notes, additional demographic information, and pediatric vital signs to the former Core Clinical Data Set
  • APIs to access USCDI using FHIR, with a base set of 13 Resources and two specific data fields within the Patient Resource that must be supported
  • SMART Application Launch Framework, using the OAuth2 and OpenID standards
  • Conditions and maintenance of EHR certification, with additional pediatric-specific criteria
  • Requirement to be able to export data for a patient and for all patients (when provider changing EHRs)
  • Rules for information blocking and allowable exceptions
  • Allowable fees to support information exchange (but not to the patient to access their data)
  • Standards Version Advancement Process to allow developers to choose among the versions of standards currently approved by National Coordinator
  • No developer gag clauses that prohibit discussion and demonstration of problems
The CMS NPRM is a companion rule that goes into more detail on some aspects, per its headings, some of which include:
  • Patient Access Through Application Programming Interfaces (APIs)
  • Health Information Exchange and Care Coordination Across Payers
  • API Access to Published Provider Directory Data
  • Care Coordination Through Trusted Exchange Networks
  • Public Reporting and Prevention of Information Blocking
  • Provider Digital Contact Information
  • Advancing Interoperability in Innovative Models
Clearly these rules are a necessary recalibration of our EHRs and their data, aiming to make them more patient-centric, clinician-friendly, and responsive to the EHR marketplace. While I suspect some smart people will come up with some good ideas why one thing or another should be changed in the final rules, my interpretation of the NPRMs is that they pretty much hit the target. I am confident they will lead to improved systems and our ability to do good with the data.

Monday, December 31, 2018

Annual Reflections at the End of 2018

I always use my last posting of the year to this blog to reflect on the year past. As I have noted each year, this blog started at the time of a major transformation for the informatics field, namely the Health Information Technology for Economic and Clinical Health (HITECH) Act. Now, almost 10 years later (10-year anniversary post coming in March!), the era of “meaningful use” is drawing to a close.

The year 2018 was a year of milestones and gratitude for me. I celebrated my 60th birthday, grateful for my personal health and well-being. My immediate family, my wife and two children, are also doing very well in their personal as well as professional lives. This year also marked the 15th year of the Department of Medical Informatics & Clinical Epidemiology (DMICE) in the School of Medicine at Oregon Health & Science University (OHSU), the department of which I have served as its one and only Chair. With DMICE, I am grateful not only of my own academic success but also providing an environment for faculty, students, and staff to achieve their accomplishments and gratitude.

Another milestone for 2018 was my 28th year at OHSU. It is somewhat uncommon these days for a high-profile academic to spend a whole career at a single institution. I have certainly been asked to look at other jobs over the years, as most academics always are, but nothing has ever appealed to me enough to consider leaving not only OHSU, but also Portland, Oregon. Since the Biomedical Information Communication Center (BICC) Building opened in 1991, I have had only two offices, and have been in my current one for over 20 years.

I am happy to report that despite my relatively static work location, I have changed and grown in place. In academia, like work in almost every other knowledge field, one must evolve their knowledge and skills with the evolution of their field. I am grateful that my job has afforded me the ability to grow professionally and intellectually. In fact, there are few things more exciting than being immersed in the field as new ideas and technologies emerge. A decade ago it was the emergence of the value of the electronic health record (EHR); today it is the growth of data and how we can put it to good use, such as via machine learning. But just as we learned with EHR adoption during the HITECH Act, implementing technology, especially in healthcare, does not always go according to plan. While the emergence of machine learning is exciting, it will be interesting to see how it will impact day-to-day medical practice.

Life does not last forever, but as long as I continue to enjoy my work and do it competently, I certainly have no plans to stop. It will also be interesting to see what new advances come down the pike in informatics, some of which we might be able to predict but others that will emerge out of nowhere.

Wednesday, December 19, 2018

Preserving What We Hold Dear About the Internet?

Hardly another day goes by without some explosive report in the news about modern Internet platforms and their adverse effects on our personal lives or on our political or economic systems. But with our personal and professionals lives so deeply intertwined with them, going “off the grid” is hardly an answer. How do we preserve what is good about our networked lives while trying to identify and eliminate the bad? I do not have answers but hope to raise discussion on the question.

Even though I am way too old to be a “digital native,” computers and the Internet have played a large role in my personal and professional life for several decades. I received my first Internet email address in the late 1980s as a postdoctoral fellow. I often tell the story of my jaw dropping the first time I saw the graphical Web browser, NCSA Mosiac, in 1992. While I had read articles about this new World Wide Web, I was initially skeptical because I could not envision the Internet of the time being able to support interaction (e.g., downloading and rendering Web pages) in real-time. But seeing Mosaic made me instantly realize how transformative the Web would be. Fastforwarding a few years, with the emergence of Google, I sometimes joke that my life would be very different had I come up with the idea of ranking Web search output by links in my own information retrieval research. At the end of the decade, a seemingly minor decision to put my course online in 1999 led to a major transformation of my career into a passion for educational technology. Now in modern times, my personal life has fused with Facebook, in which I can easily share parts of my life with family, friends, and colleagues. In addition, most of my teaching is online, I enjoy sharing running routes with fellow runners, and the ubiquitous worldwide reach of cellular and wifi makes travel and just about everything else much easier.

But clearly there are downsides to the Internet and the proliferation of all of our computational devices, including all of our data they hold, that are connected to it. The biggest current news, of course, is the manipulation of social media and search engines by the Russian government. Right behind that are other concerns about the business practices of Facebook and how they selectively share our data, especially with certain business partners. There are also concerns about the ease by which hate groups disseminate content to their followers, for example on YouTube and Twitter. Another worry is the growing commerce monopoly of Amazon, despite the fact many of us find it so convenient for many things we need. There is also the growing concern about what is done with the detailed digital activities of ours that are tracked and used, sometimes for good but other times not.

The solutions to these problems are not easy. Sure, we can try to maintain a balance between our real and virtual lives. We can consider more regulation of these platforms, but I get nervous when we discuss regulating free speech. The question is how to discern between freedom of expression versus not allowing manipulation of news and elections by “bots” and other approaches. Education is certainly important, making sure the general population understands how these platforms work and how they can be used to manipulate public and political opinion. There is also the question of how to economically regulate these platforms that achieve monopoly status. There is no question that these issues will attract further attention from the new media, lawmakers, and others going forward.

Sunday, December 16, 2018

Kudos for the Informatics Professor - Fall 2018 Update

I had a busy summer and fall of 2018, with a number of talks and publications.

In September, I had the opportunity to be interviewed by The Jefferson Exchange, which is part of Jefferson Public Radio in southern Oregon (direct link to MP3 of interview).

I was also featured for the work I have contributed for over 10 years in partnership with the Emergency Medicine Informatics Section of the American College of Emergency Physicians (ACEP) to deliver a version of the 10x10 course. It was another successful year in general for the 10x10 course, with a total of 2517 people having completed the OHSU offering since 2005.

I was very busy at this year's American Medical Informatics Association (AMIA) Annual Symposium. I was among the 130 people inducted in the inaugural class of Fellows of AMIA (FAMIA). I also had a chance to describe our educational program at the Learning Showcase. In addition, I participated in a panel with three other academic colleagues entitled, Collaborative Science Within Academic Medical Centers: Opportunities and Challenges for Informatics.

I also had the opportunity to present OHSU Department of Medicine Grand Rounds on December 11, 2018, delivering the topic, Update in Clinical Informatics: Machine Learning, Interoperability, and Professional Opportunities (video and slides available).

Earlier in the year, I joined colleague Robert Hoyt, MD as a Co-Editor of the textbook, Health Informatics: Practical Guide (7th Edition), which is available both in print and eBook formats. The book is also available for the Amazon Kindle, as are other books of mine.

Also during this time period, I published a paper in the new AMIA journal, JAMIA Open.

Tuesday, December 11, 2018

Response to NIH RFI: Proposed Provisions for a Draft Data Management and Sharing Policy for NIH Funded or Supported Research

Earlier this year, I submitted a response (and posted it in this blog) to a National Institutes of Health (NIH) Request for Information (RFI) on a draft of their Strategic Plan for Data Science. My main concern was that while there was nothing in the report I did not agree with, I believed there needed to be more attention to the science of data science.

In October, the NIH released another RFI, this one entitled, Proposed Provisions for a Draft Data Management and Sharing Policy for NIH Funded or Supported Research. Similar to the Strategic Plan for Data Science, most of what is in this draft plan is reasonable in my opinion. But what concerns me more is, similar to the earlier RFI, what is left out.

My main concerns have to do with the definition and use of “scientific data.” Early on, the plan defines “scientific data” as “the recorded factual material commonly accepted in the scientific community as necessary to validate and replicate research findings including, but not limited to, data used to support scholarly publications.” The draft further notes that “scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens. For the purposes of a possible Policy, scientific data may include certain individual-level and summary or aggregate data, as well as metadata. NIH expects that reasonable efforts should be made to digitize all scientific data.”

The draft report then runs through the various provisions. Among them are:
  • Data Management and Sharing Plans - new requirements to make sure data is FAIR (findable, accessible, interoperable, and reusable)
  • Related Tools, Software and/or Code - documentation of all the tools used to analyze the data, with a preference toward open-source software (or documentation of reasons why open-source software is not used)
  • Standards - what standards, including data formats, data identifiers, definitions, and other data documentation, are employed
  • Data Preservation and Access - processes and descriptions for how data is preserved and made available for access
  • Timelines - for access, including whether any is held back to allow publication(s) by those who collect it
  • Data Sharing Agreements, Licensing, and Intellectual Property - which of these are used and how so
All of the above are reasonable. However, my main concern is what appears to be a relatively narrow scope of what constitutes scientific data. As such, what follows is what I submitted in my comments to the draft policy. (These comments were also incorporated into a larger response by the Clinical Data to Health [CD2H] Project, of which I am part.)

The definition of scientific data implies that such data is only that which is collected in active experimentation or observation. This ignores the increasing amount of scientific research that does not come from experiments, but rather is derived from real-world measurements of health and disease. This includes everything from data routinely collected by mobile or wearable devices to social media to the electronic health record (EHR). A growing amount of research analyzes and makes inferences using such data.

It could be argued that this sort of data derived “from the wild” should adhere to the provisions above. However, this data is also highly personal and usually highly private. Would you or I want our raw EHR in a data repository? Perhaps connected to our genome data? But if such data are not accessible at all, then the chances for reproducibility are slim.

There is also another twist on this, which concerns data used for informatics research. In a good deal of informatics research, such as the patient cohort retrieval work I do in my own research [1], we use raw, identifiable EHR data. We then proceed to evaluate the performance of our systems and algorithms with this data. Obviously we want this research to be reproducible as well.

There are solutions to these problems, such as Evaluation as a Service [2] approaches that protect such data and allow researchers to send their systems to the data in walled-off containers and receive aggregate results. Maybe the approach in this instance would be to maintain encrypted snapshots of the data that could be unencrypted in highly controlled circumstances.

In any case, the NIH Data Management and Sharing Policy for NIH Funded or Supported Research is a great starting point but should take a broader view of scientific data and develop policies to insure research is reproducible. Research done with data that does not originate as scientific data should be accounted for, including when that data is used for informatics research.


1. Wu, S, Liu, S, et al. (2017). Intra-institutional EHR collections for patient-level information retrieval. Journal of the American Society for Information Science & Technology. 68: 2636-2648.
2. Hanbury, A, Müller, H, et al. (2015). Evaluation-as-a-service: overview and outlook. arXiv:1512.07454.

Tuesday, October 30, 2018

A Great Time to be an Academic Informatician

My recent posting describing my updated study of the health IT workforce shows that this is a great time to work in operational health IT and informatics settings. Many of us, however, work as faculty or in other professional roles in academic health science centers, a smaller but critically important part of the informatics workforce. What are the prospects for those in academic informatics?

I would argue they are excellent. There are great opportunities now both for those who follow the traditional academic researcher/educator pathway as well as for those who focus their involvement in the more operational activities in academic health science centers.

For those following the more conventional faculty pathway, the grant funding situation is currently pretty good. While the main supporter of basic informatics research, the National Library of Medicine (NLM), has a small research budget, it has grown 14% with the increased federal funding to the National Institutes of Health (NIH) in the last couple years. Fortunately, informatics researchers have more options. Despite attempts in some political quarters to de-fund the Agency for Healthcare Research & Quality (AHRQ), the agency continues to pursue and fund its research objectives, a decent portion of which involves informatics innovation. Likewise, the other institutes of the NIH, including those that are disease-oriented, offer opportunities for research that includes informatics activities. This includes not only the big initiatives, such as the AllOfUs project, but day-to-day work with others, such as the National Sleep Research Resource. There are also research funding opportunities from foundations, industry, and others.

Of course, one fortunate aspect of being academic informatics faculty is that activities are not limited to those focusing mainly on research. There are other opportunities in teaching (including beyond those studying informatics, such as healthcare professional students) and operational work (supporting and innovating in all of the missions of academic medical centers, which include clinical care, research, and education). Academic informaticians are often involved implementation of operational systems, especially those supporting healthcare delivery and research. Given the growth of informatics and data science, there are likely to be teaching opportunities for those of us who enjoy teaching our area of expertise to clinicians and others who work in healthcare.

For all of these reasons, I am pretty bullish on careers in academic informatics. While no career pathway in any field is a guarantee of success these days, there are plenty of opportunities for those seeking academic careers in informatics.

Friday, October 12, 2018

What are the Optimal Data Science and Machine Learning Competencies for Informatics Professionals?

Exactly 20 years ago, I organized a panel at the American Medical Informatics Association (AMIA) Annual Symposium that attracted so large an audience that the crowd spilled out of the room into the hallway. Entitled, What are the Optimal Computer Science Competencies for Medical Informatics Professionals?, the panel asked how much knowledge and skills in computer science were required to work professionally in informatics. In the early days of informatics, most informaticians had some programming skills and often contributed to the development of home-grown systems. Some educational programs, such as the one at Stanford University, had required courses in assembly language. (I took an assembler course myself during my informatics fellowship in the late 1980s.)

But as academic informatics systems grew in scope and complexity, they needed more engineering and hardening as they became mission-critical to organizations. At the same time, there was a recognized need for attention to people and organizational issues, especially in complex adaptive settings such as hospitals. Over time, most professional work in informatics has shifted from system building to implementing commercial systems.

With these changes, my evolving view has been that although few informatics professionals perform major computer programming, there is still value to understanding the concepts and thought process of computer science. While plenty of students enter our graduate program at Oregon Health & Science University with programming skills, our program will not turn those without programming skills into seasoned programmers. But I still believe it is important for all informatics professionals to understand the science of computing, even at the present time. This includes some programming to see computing concepts in action.

A couple decades later, I find myself asking a related question, which is, how much data science and machine learning is required of modern informatics professionals? Clearly data science, machine learning, artificial intelligence, etc. are very prominent now in the evolution healthcare and biomedical science. But not everyone needs to be a "deep diver" into data science and machine learning. I often point this out by referring to the data analytics workforce reports from a few years ago that note the need for a five- to ten-fold ring of people who identify the needs, put into practice, and communicate the results of the deep divers [1, 2]. I also note the observation of data analytics thought leader Tom Davenport, who has written the importance of the roles of "light quants" or "analytical translators" in data-driven organizations (such as healthcare)[3].

Thus to answer my question in the title of this post, competence in data science and machine learning may be analogous to the answer to the computer science question of a couple decades ago. Clearly, every informatician must have basic data science skills. These include knowing how to gather, wrangle, and carry out basic analysis of data. They should understand the different approaches to machine learning, even if they do not necessarily understand all of their deep mathematics. And of course they must critically know how to apply data science and machine learning in their everyday professional practice of informatics.


1. Manyika, J, Chui, M, et al. (2011). Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute.
2. Anonymous (2014). IDC Reveals Worldwide Big Data and Analytics Predictions for 2015. Framingham, MA, International Data Corporation.
3. Davenport, T (2015). In praise of “light quants” and “analytical translators”. Deloitte Insights.

Tuesday, October 9, 2018

A Meaningful End to “Meaningful Use?”

The era of meaningful use came to a relatively quiet end this summer with the release of the Final Inpatient Prospective Payment Systems rule by the Center for Medicare and Medicaid Systems (CMS) this past August. The rule put into place most of what had been in the proposed rule earlier in the year. Although the rule has much detail on what healthcare organizations must achieve to receive incentive payments and/or avoid penalties, a large symbolic change is the renaming of the Medicare and Medicaid Electronic Health Record (EHR) Incentive Programs now be called Promoting Interoperability Programs. The "meaningful use" moniker goes away, although under the new program, eligible professionals and hospitals still must demonstrate they are "meaningful users" of health information technology.

As someone who had a front-row seat in meaningful use and how it impacted the informatics world (in my case more teaching about it than being in the trenches implementing it), it is the end of an era that brought our field to national visibility. There is some success to be celebrated by the fact that 96% of hospitals and 85% of office-based clinicians have adopted some form of EHR. Overall, the new rules seem logical and fair, although some would argue that incentive payments should be based more on outcomes than process measures. In any case, there is still important work ahead as we step up to challenge to making EHR systems better and leveraging the data in them to truly benefit health and healthcare.

Unlike in the past, when summaries of the updates were released with great fanfare by multiple sources, there are few summaries of the new rule that provide enough content to understand the details without having to read the hundreds of pages in the government publication. Two good sources I have found are:
The meaningful use moniker for the criteria for eligible professionals went away last year with the introduction of the new CMS Quality Payment Program (QPP) that integrated all previous federal quality programs into a single quality-payment program, with 25% of inventive payment or penalty tied to Advancing Care Information (ACI). The ACI changed from a so-called threshold (all or none) approach to a performance (score-based) one. The ACI portion of QPP is now also renamed as being part of the Promoting Interoperability Program.

The new CMS rule now applies a similar approach to eligible hospitals. The new rule groups Promoting Interoperability into four overall objectives, each of which has one or more measures and a maximum number of points for achieving them. The new rule also streamlines some of the quality reporting measures required by the program as well as limits the reporting period to one quarter of the year.

A final change in the new rule is the requirement that systems use the 2015 Edition Certified EHR Technology (CEHRT) criteria to be eligible for the program. One key requirement of the 2015 CEHRT edition is the implementation of an application programming interface (API) that can (with appropriate authentication and security) access data directly in the EHR. Most vendors are implementing this capability using the emerging Fast Healthcare Interoperability Resources (FHIR) standard. Probably the best-known (but certainly not the only) application of this is the Apple Health app that allows patients to download the so-called Argonaut data set of 21 data elements.

The new Promoting Interoperability measures include:
1. e-Prescribing (1 required, 2 optional measures in 2019 that will be required in 2020)
  • e-Prescribing
  • Query of Prescription Drug Monitoring Program (PDMP)
  • Verify Opioid Treatment Agreement
2. Health Information Exchange (2 required measures)
  • Support Electronic Referral Loops by Sending Health Information
  • Support Electronic Referral Loops by Receiving and Incorporating Health Information
3. Provider to Patient Exchange (1 required measure)
  • Provide Patients Electronic Access to Their Health Information
4. Public Health and Clinical Data Exchange (2 required measures from following)
  • Syndromic Surveillance Reporting
  • Immunization Registry Reporting
  • Electronic Case Reporting
  • Public Health Registry Reporting
  • Clinical Data Registry Reporting
  • Electronic Reportable Laboratory Result Reporting
The final rule contains a table of the above with the scoring algorithm for the measures. The official details of the new program are at:

It is hard not to wax somewhat nostalgic about these changes, especially in this blog that started about the time of the introduction of the Health Information Technology for Clinical and Economic Health (HITECH) Act that seems like eons ago. Although the goal was not just to put computers into hospitals and clinicians’ offices, that is an accomplishment and hopefully lays the foundation for improving healthcare and leveraging data going forward.

Wednesday, September 12, 2018

Artificial Intelligence in Medicine: 21st Century Resurgence

I first entered the informatics field in the late 1980s, at the tail end of the first era of artificial intelligence (AI) in medicine. Initial systems focused on making medical diagnoses using symbolic processing, which was appropriate for a time of relatively little digital data, both for individual patients and healthcare as whole, and underpowered hardware. Systems like MYCIN [1], INTERNIST-1/QMR [2], and DXPLAIN [3] provided relatively accurate diagnostic performance, but were slow and difficult to use. They also provided a single likely diagnosis, which was not really what clinicians needed. Because of these shortcomings, they never achieved significant real-world adoption, and their "Greek Oracle" style of approach was abandoned. [4]. There was also some early enthusiasm for neural networks around that time [5], although in retrospect those systems were hampered by lack of data and computing power.

Into the 1990s, informatics moved on to other areas, such as information retrieval (search) from the newly evolving World Wide Web and more focused (rule-based) decision support. At the start of the new century, I started to wonder whether I should still even cover those early AI systems in my well-known introductory informatics course. I kept them included, mainly out of a sense of historical perspective, since those systems were a major focus of work in the field in its early days. However, the term "AI" almost seemed to disappear from informatics jargon.

In recent years, however, AI in medicine (and beyond) has re-emerged. Driven by much larger quantities of data (through electronic health records, curated data sets - mainly images, and personal tracking devices) and much more powerful hardware (mainly networked clusters of low-cost computers and hard disks as well as mobile devices), there has been a resurgence of AI, although with a somewhat different focus from the original era. There has also been a maturing of machine learning techniques, most prominently neural networks applied in complex formats known as deep learning [6, 7].

The most success for use of deep learning has come in image processing. The well-known researcher and author Dr. Eric Topol keeps an ever-growing list of systems for diagnosis and their comparison with humans (to which I have contributed a few, and to which I add studies that have only been published as preprints on
  • Radiology - diagnosis comparable to radiologists for pneumonia [8] tuberculosis [9], intracranial hemorrhage [10]
  • Dermatology - detecting skin cancer from images [11-13]
  • Ophthalmology - detecting diabetic retinopathy from fundal images [14-15], predicting cardiovascular risk factors from retinal fundus photographs [16]; diagnosis of congenital cataract [17], age-related macular degeneration [18], plus disease [19]; and diagnoses of retinal diseases [20] and macular diseases [21]
  • Pathology - classifying various forms of cancer from histopathology images [22-25], detecting lymph node metastases [26]
  • Cardiology - cardiac arrhythmia detection comparable to cardiologists [27] and classification of views in echocardiography [28]
  • Gastroenterology - endocytoscope images for diagnose-and-leave strategy for diminutive, nonneoplastic, rectosigmoid polyps [29]
Organized medicine has taken notice of AI. Journal of the American Medical Association recently published two perspective pieces [30, 31] as well as editorial [32] on how AI and machine learning will impact medical practice. I have heard anecdotally that some of the most heavily attended sessions at radiology meetings are those devoted to AI. I am sure there is a mixture of intellectual excitement tinged with some fear of future livelihood.

The success of these systems and the technology underlying them are exciting, but I also would tell any thoughtful radiologist (or pathologist, dermatologist, or ophthalmologist) not to fear for his or her livelihood. Yes these tools will change practice, maybe sooner than we realize. However, I always think that high-tech medicine of the future will look like how it is used the doctors of Star Trek. Yes, those physicians have immense technology at their disposal, not only for diagnosis but also for treatment. But those tools do not remove the human element of caring for people. Explaining to patients their disease process, describing the prognosis as we know it, and shared decision-making among the diagnostic and treatment options are all important in applying advanced technology is medicine.

I also recognize we have a ways to go before this technology truly changes medicine. For several years running, I have expressed both my intellectual excitement at predictive data science while also noting that prediction is not enough, and we must demonstrate that what is predicted must be demonstrated to be able to be applied to improve the delivery of care and patient health.

This notion is best elaborated by some discussion of another deep learning paper focused on a non-image domain, namely the prediction of in-hospital mortality, 30-day unplanned readmission, prolonged length of stay, and the entirety of a patient’s final diagnoses [33]. The paper demonstrates the value of deep learning, the application of Fast Healthcare Interoperability Resources (FHIR) for data points, and efforts for the neural network to explain itself along its processing path. I do not doubt the veracity of what the authors have accomplished. Clearly, deep learning techniques will play a significant role as described above. These methods scale with large quantities of data and will likely improve over time with even better algorithms and better data.

But taking off my computer science hat and replacing it with my informatics one, I have a couple of concerns. My first and major concern is whether this prediction can be turned into information that can improve patient outcomes. Just because we can predict mortality or prolonged length of stay, does that mean we can do anything about it? Second, while there is value to predicting across the entire population of patients, it would be interesting to focus in on patients we know are more likely to need closer attention. Can we focus in and intervene for those patients who matter?

Dr. Topol recently co-authored an accompanying editorial describing a study that adheres to the kind of methods that are truly needed to evaluate modern AI in clinical settings [34]. The study itself is to be commended; it actually tests an application of an AI system for detection of diabetic retinopathy in primary care settings [35]. The system worked effectively, though it was not flawless, and other issues common to real-world medicine emerged, such as some patients being non-imageable and others having different eye diseases. Nonetheless, I agree with Dr. Topol that this study sets the bar for how AI needs to be evaluated before its widespread adoption in routine clinical practice.

All of this AI in medicine research is impressive. But its advocates will need to continue the perhaps more mundane research of how we make this data actionable and actually act on it in ways that improve patient outcomes. I personally find that kind of research more interesting and exciting anyways.


1. Miller, RA (2010). A history of the INTERNIST-1 and Quick Medical Reference (QMR) computer-assisted diagnosis projects, with lessons learned. Yearbook of Medical Informatics. Stuttgart, Germany: 121-136.
2. Shortliffe, EH, Davis, R, et al. (1975). Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. Computers and Biomedical Research. 8: 303-320.
3. Barnett, GO, Cimino, JJ, et al. (1987). DXplain: an evolving diagnostic decision-support system. Journal of the American Medical Association. 258: 67-74.
4. Miller, RA and Masarie, FE (1990). The demise of the "Greek Oracle" model for medical diagnostic systems. Methods of Information in Medicine. 29: 1-2.
5. Rumelhart, DE and McClelland, JL (1986). Parallel Distributed Processing: Foundations. Cambridge, MA, MIT Press.
6. Alpaydin, E (2016). Machine Learning: The New AI. Cambridge, MA, MIT Press.
7. Kelleher, JD and Tierney, B (2018). Data Science. Cambridge, MA, MIT Press.
8. Rajpurkar, P, Irvin, J, et al. (2017). CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv:1711.05225.
9. Lakhani, P and Sundaram, B (2017). Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 284: 574-582.
10. Arbabshirani, MR, Fornwalt, BK, et al. (2018). Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. npj Digital Medicine. 1: 9.
11. Esteva, A, Kuprel, B, et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. 542: 115-118.
12. Haenssle, HA, Fink, C, et al. (2018). Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology. 29: 1836-1842.
13. Han, SS, Kim, MS, et al. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology. 138: 1529-1538.
14. Gulshan, V, Peng, L, et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Journal of the American Medical Association. 316: 2402-2410.
15. Ting, DSW, Cheung, CYL, et al. (2017). Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Journal of the American Medical Association. 318: 2211-2223.
16. Poplin, R, Varadarajan, AV, et al. (2017). Predicting cardiovascular risk factors from retinal fundus photographs using deep learning. arXiv:1708.09843.
17. Long, E, Lin, H, et al. (2017). An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nature Biomedical Engineering. 1: 0024.
18. Burlina, PM, Joshi, N, et al. (2017). Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmology. 135: 1170-1176.
19. Brown, JM, Campbell, JP, et al. (2018). Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmology. 136: 803-810.
20. DeFauw, J, Ledsam, JR, et al. (2018). Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine: Epub ahead of print.
21. Kermany, DS, Goldbaum, M, et al. (2018). Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 172: 1122-1131.E1129.
22. Bejnordi, BE, Zuidhof, G, et al. (2017). Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging. 4(4): 044504.
23. Liu, Y, Gadepalli, K, et al. (2017). Detecting cancer metastases on gigapixel pathology images. arXiv:1703.02442.
24. Yu, KH, Zhang, C, et al. (2017). Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications. 7: 12474.
25. Capper, D, Jones, DTW, et al. (2018). DNA methylation-based classification of central nervous system tumours. Nature. 555: 469–474.
26. Bejnordi, BE, Veta, M, et al. (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Journal of the American Medical Association. 318: 2199-2210.
27. Rajpurkar, P, Hannun, AY, et al. (2017). Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv:1707.01836.
28. Madani, A, Arnaout, R, et al. (2018). Fast and accurate view classification of echocardiograms using deep learning. npj Digital Medicine. 1: 6.
29. Mori, Y, Kudo, SE, et al. (2018). Real-time use of artificial intelligence in identification of diminutive polyps during colonoscopy: a prospective study. Annals of Internal Medicine: Epub ahead of print.
30. Hinton, G (2018). Deep learning—a technology with the potential to transform health care. Journal of the American Medical Association: Epub ahead of print.
31. Naylor, CD (2018). On the prospects for a (deep) learning health care system. Journal of the American Medical Association: Epub ahead of print.
32. Stead, WW (2018). Clinical implications and challenges of artificial intelligence and deep learning. Journal of the American Medical Association: Epub ahead of print.
33. Rajkomar, A, Oren, E, et al. (2018). Scalable and accurate deep learning for electronic health records. npj Digital Medicine. 1: 18.
34. Keane, PA and Topol, EJ (2018). With an eye to AI and autonomous diagnosis. npj Digital Medicine. 1: 40.
35. Abràmoff, MD, Lavin, PT, et al. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digital Medicine. 1: 39.

Monday, July 30, 2018

Healthcare Information Technology Workforce: Updated Analysis Shows Continued Growth and Opportunity

A new analysis of the healthcare information technology (IT) workforce indicates that as hospitals and health systems continue to adopt electronic health records (EHRs) and other forms of IT, as many as 19,852 to 153,114 more full-time equivalent (FTE) personnel may be required [1]. The new study has been published by myself and colleagues Keith Boone and Annette Totten in the new journal, JAMIA Open. It updates an original analysis [2] from before the passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act, which has led to substantial growth in the adoption of EHRs [3, 4] and this the expansion of the healthcare IT workforce.

The data used in the analysis actually focus only on hospitals and health systems, so informatics/IT workforce growth will also likely occur in other health-related areas. The results remind us that there remain and will likely be growing opportunities for those who train and work in biomedical and health informatics.

The new paper represents an update of a research interest of mine that emerged over a decade ago. As my activities in informatics education were growing at that time, I became interested in the characteristics of the healthcare IT workforce and its professional development. This led me to search for studies of that workforce, which essentially came up empty. There was a single resource I was able to find that provided some data about healthcare IT staffing, the HIMSS Analytics Database, but no one had ever done any analysis of it. The HIMSS Analytics Database mostly focuses on the IT systems that hospitals and health systems implement but also contains some data on IT staffing FTE. The result of the analysis was a paper that garnered a great detail of attention when it was published in 2008 [2], including an invitation to present the results in Washington, DC to the Capitol Hill Steering Committee on Telehealth and Healthcare Informatics.

Based on 2007 data, our initial paper looked at FTE staffing, especially as it related to level of adoption, based on the well-known HIMSS Analytics Electronic Medical Record Adoption Model (EMRAM), a 0-7 scale that measures milestones of EHR adoption. This was, of course, before the HITECH Act, when a much smaller number of hospitals and health systems had adopted EHRs. Also around that time, there had been the publication of the first systematic review of evidence supporting benefit of healthcare IT, showing the value came mainly from use of clinical decision support (CDS) and computerized provider order entry (CPOE) [5]. As such, we looked at the level of healthcare IT staffing by EMRAM stage, with a particular focus on what increase might be required to achieve the level of IT use associated with those evidence-based benefits. We assessed the ratio of IT FTE staff to hospital bed ratio by EMRAM stage.

Because the self-reported data of the database was incomplete for FTE staffing, we had to extrapolate from the data present to the entire country (recognizing a potential bias from those who responded vs. those who did not). We also noted some other limitations of the data, which was that the data represented only hospitals and health systems, and not the entire healthcare system, nor the use of IT outside of the healthcare system. Our analysis found that the national health IT workforce size in 2007 was estimated to be 108,390. But the real sound bite from the study was that if EHR adoption were to increase to the level supported by the evidence, namely EMRAM Stage 4 (use of CDS and CPOE), and FTE/Bed ratios remained the same for those hospitals, the size of the workforce would need to grow to 149,174. In other words, there was a need to increase the size of the healthcare IT workforce by 40,784 people.

Within a year of the study’s publication, the US economy was entering the Great Recession, and the new Obama Administration had taken office. The recession led to Congress passing the HITECH Act (as part of the American Recovery and Reinvestment Act), which allocated about $30 billion in economic stimulus funding to EHR adoption. Recognizing that a larger and better-trained workforce would be necessary to facilitate this EHR adoption, the HITECH Act included $118 million for workforce development. The rationale for this included the data from our study showing the need for expanding the workforce, especially as the meaningful use of EHRs required of HITECH would necessitate the use of CDS and CPOE.

Since that time, EHR adoption has grown substantially, to 96% of hospitals [3] and 87% of office-based physicians and other clinicians [4]. A few years ago, I started to wonder how the widespread adoption impacted the workforce, especially at the higher stages of EMRAM, which very few hospitals had achieved in 2007. By 2014, one-quarter of US hospitals had reached Stages 6 and 7.

The new study reports some interesting findings. First, the FTE/Bed ratios in 2014 for different levels of EMRAM are remarkably similar to those in 2007 (with the exception of Stage 7, which no hospitals had reached in 2007). However, because of the advancing of hospitals to higher EMRAM stages beyond Stage 4, the total workforce ended up being larger than we had estimated to be needed from the 2007 data. Probably most important, as more hospitals continue to reach Stages 6 and 7, the workforce will continue to grow. Our new study estimates that if all hospitals were to achieve Stage 6, an additional 19,852 healthcare IT FTE would be needed. Our analysis also shows an almost explosive growth of 153,114 more FTE if all hospitals moved to Stage 7, although we have less confidence in that result due to the relatively small numbers of hospitals that have achieved this stage at the present time., and it is also unclear whether the leaders reaching Stage 7 early are representative of the rest of hospitals and health systems generally.

Nonetheless, the US healthcare industry is moving toward increased EHR adoption. At the time of the data snapshot we used in the analysis in 2014, there were 3.7% and 22.2% of hospitals at Stages 6 and 7 respectively. The latest EMRAM data from the end of 2017 show those to have increased to 6.4% and 33.8% respectively. In other words, the healthcare industry is moving toward higher levels of adoption that, if our findings hold, will lead to increased healthcare IT hiring.

The new paper also reiterates the caveats of the HIMSS Analytics data. It is a valuable database, but not really designed to measure the workforce or its characteristics in great detail. Another limitation is that only about a third of organizations respond to the staffing FTE questions. In addition, while the hospital setting comprises a large proportion of those who work in the healthcare industry, there are other places where IT and informatics personnel work, including for vendors, research institutions, government, and other health-related entities. As healthcare changes, these latter settings may account for an even larger fraction of the healthcare IT workforce.

Because of these limitations of the data and the changing healthcare environment, the paper calls for additional research and other actions. We note that better data, both more complete and with more detail, is critical to learn more about the workforce. We also lament the decision of the US Bureau of Labor Statistics (BLS) to not add a Standard Occupational Classification (SOC) code for health informatics, which would have added informatics to US labor statistics. Fortunately the American Medical Informatics Association (AMIA) is undertaking a practice analysis of informatics work, so additional information about the workforce will be coming by the end of this year.

It should be noted that some may view the employment growth in healthcare IT as a negative, especially due to its added cost. However, the overall size of this workforce needs to be put in perspective, as it represents just a small fraction of the estimated 12 million Americans who work in the healthcare industry. As the need for data and information to improve operations and innovations in health-related industries grows, a large and well-trained workforce will continue to be critical to contribute toward the triple aim of improved health, improved care, and reduced cost [6]. In addition, and many career opportunities will continue to be available to those who want to join the informatics workforce.


1. Hersh, WR, Boone, KW, et al. (2018). Characteristics of the healthcare information technology workforce in the HITECH era: underestimated in size, still growing, and adapting to advanced uses. JAMIA Open. Epub ahead of print. (The data used in the analysis is also available for access at
2. Hersh, WR and Wright, A (2008). What workforce is needed to implement the health information technology agenda? An analysis from the HIMSS Analytics™ Database. AMIA Annual Symposium Proceedings, Washington, DC. American Medical Informatics Association. 303-307.
3. Henry, J, Pylypchuk, Y, et al. (2016). Adoption of Electronic Health Record Systems among U.S. Non-Federal Acute Care Hospitals: 2008-2015. Washington, DC, Department of Health and Human Services.
4. Office of the National Coordinator for Health Information Technology. 'Office-based Physician Electronic Health Record Adoption,' Health IT Quick-Stat #50.
5. Chaudhry, B, Wang, J, et al. (2006). Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Annals of Internal Medicine. 144: 742-752.
6. Berwick, DM, Nolan, TW, et al. (2008). The triple aim: care, health, and cost. Health Affairs. 27: 759-769.