Thursday, November 7, 2024

Generative AI in My Teaching - The Final Shoe Drops

I recently published a paper in the journal npj Digital Medicine looking at how well generative artificial intelligence (AI) systems perform in my well-known introductory biomedical and health informatics course. This online course is taught to three audiences: graduate students (required in our graduate program and taken as an elective by students in several others, including public health, basic sciences, and nursing), continuing education students (the well-known 10x10 ["ten by ten"] course), and medical students at Oregon Health & Science University (OHSU). Student assessment varies for the three different course audiences and is carried out by up to three activities: multiple-choice quizzes (MCQs) of ten questions for each of the ten units of the course, a short-answer final exam, and a 10-15 page term paper on an appropriate topic of the student's choice. Graduate students must complete all three forms of assessment, while continuing education and medical students write a shorter 2-3 page paper. The final exam is required for graduate students and optional for continuing education students who want to obtain graduate credit, usually with the purpose of pursuing further study in the field.

As also described in a summary of the paper by OHSU, our research found that any of six well-known large-language model (LLM) systems score better than up to 75% of all students and easily achieve a passing grade for the two-thirds of the course assessment that include the MCQs and final exam. The LLMs were assessed and compared with the 139 students who took last year’s (2023) version of the course. The results of the study bring into question how I will assess students in the course going forward.

I thought I had some solace in that LLMs still could not write what is the final third of the assessment for graduate students, namely the term paper. Alas, this was before someone pointed me to a new generative AI system from researchers at Stanford University, led by Professor Monica Lam, called Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking (STORM). Although STORM is designed to guide users through an iterative pre-writing process, its Web interface also allows a two-part prompt to define the topic of the paper and its format, length, and other attributes. When I ask for what I want for that last part of the course assessment, namely the 10-15 page paper on a focused topic, STORM serves up a paper that, while perhaps superficial in some coverage, generates a paper that for the most part satisfies the requirements of my paper-grading rubric. My co-author of the npj Digital Medicine paper Kate Fultz-Hollis noted that the first papers I generated did not have many peer-reviewed citations in the reference, but that was easily fixed by asking for that explicitly in the prompt.

Now I must ask, has the final shoe dropped, i.e., can generative AI now pretty much do everything needed to pass my course? I hope that students will still want to learn informatics, but clearly that will not be a requirement for passing. Those of us who are educators have new challenges from generative AI, which is based on their ability to perform as well as students in a variety of learning assessments. One researcher, business professor Ethan Mollick from the University of Pennsylvania, has called this the homework apocalypse.

Some have argued that new approaches to assessment are required, and Professor Mollick has a wealth of ideas. Many of these ideas are challenging to implement, especially in large online courses and when there is a true knowledge base for which we aim for students to learn. I do agree with those who advocate we should not merely assess students to regurgitate facts, especially in an era when finding facts online is easier than ever. But I do try in my teaching (maybe not always succeeding) to have students apply the knowledge they are learning. I find MCQs are actually pretty good at that.

Nonetheless, the implication for these results is that generative AI systems challenge our ability to assess student learning. This will require us to make modifications in how we evaluate students. This does not mean we should ban LLMs, but that we need to find ways to ensure enough learning so students can think critically based on a core of fundamental knowledge.

We also need to answer other questions: Is there a core of knowledge about which students should be able to answer questions without digital assistance? Does this core of knowledge facilitate higher-order thinking about a discipline? Does that core enable thoughtful searching for information, via classic search or LLMs, for information beyond the human’s memory store? Should we have explicit policies around the use of generative AI in specific courses (here is mine)? Is it appropriate to try to maintain rigor in academic teaching, and if so, how?

I have talked in a number of forums about these issues, and find many other educators are struggling to address these challenges like I am. Clearly we will need solutions for these problems in ways that optimizes student learning and critical thinking while make the best of these tools that can enhance our performance in tasks we are learning.

Thursday, October 31, 2024

Medical Practice is Rarely a Zero/One-Shot Activity

A new generative artificial intelligence (AI) study presents some interesting findings.(1) Physicians randomized to conventional information resources plus or minus ChatGPT-4 showed no significant difference in their ability to solve diagnostic vignettes. In other words, physician access to ChatGPT did not improve their diagnostic ability. Another finding, however, was that ChatGPT alone scored far better than each group of physicians. The results can be interpreted in different ways, from generative AI providing no value for physicians to it performing superior to them.

This is a good study, and we need more like it, but we cannot lose sight of the bigger picture, which is that, as the title to this post days, real-world medical practice is rarely a zero- or one-shot activity. In other words, there are very few situations in medicine where the definitive diagnosis must be made the instant that a patient presents. Clinical diagnosis and treatment are most often an iterative process. Sometimes the response to treatment helps make the diagnosis.

Additional insight to this issue in the context of generative AI comes from an unrelated study.(2) This analysis assessed proprietary and open-source models for extracting relevant findings from chest radiograph reports. While GPT-4 led the pack for zero-shot prompting, several other open-source large language models (LLMs) were able to “catch up” with GPT-4 when the prompting included some example reports. In other words, by iterating with additional content, the other LLMs were able to match GPT-4’s performance.

We can gain additional context from another recent publication that assessed diagnostic reasoning.(3) In this study, the generative AI performed inferiorly to physicians, following neither diagnostic nor treatment guidelines, nor being able to interpret laboratory results. A criticism of this study is that it employed less commonly used LLMs. The stated reason for using non-leading LLMs was a restriction on the data used not being allowed to be submitted to commercial LLMs. This paper also did not do any sort of interactive prompting, so we do not know if performance may have improved as the radiology study cited above did.

We can certainly learn about LLM use in medical diagnosis from these studies, but in the long run, we really need studies that assess whether and how generative AI leads to real benefits for patients and their care. I do not underestimate the challenges for carrying out these studies but they must be done to determine the value generative AI for use in clinical medicine.

References

1. Goh E, et al., 2024. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open 7, e2440969. https://doi.org/10.1001/jamanetworkopen.2024.40969
2. Dorfner FJ, et al., 2024. Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports. Radiology 313, e241139. https://doi.org/10.1148/radiol.241139
3. Hager P, et al., 2024. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med 30, 2613–2622. https://doi.org/10.1038/s41591-024-03097-1

Sunday, June 2, 2024

Graduation Participation and Faculty Address for 2024 OHSU Biomedical Informatics Graduates

The Oregon Health & Science University (OHSU) Commencement and School Hooding Ceremonies, also known as Graduation, is always a special event for me. It is a pleasure to celebrate graduates of our Biomedical Informatics Graduate Program completing their studies and moving on to the next steps in their careers. Since our program had its first graduates in 1998, I have only missed the ceremony once, due to a conflict with a National Institutes of Health meeting I was required to attend.

This year our program has 22 graduates, distributed among our PhD (3), Master of Science (15), and Graduate Certificate (4) programs and among our two majors, Health & Clinical Informatics (15) and Bioinformatics & Computational Biology (7). With this year’s graduates, it brings the total number of degrees and certificates awarded by our program since inception to 984.

This year’s event is also special because I have been invited to give the Faculty Address in the OHSU School of Medicine Graduate Studies Hooding & Completion Ceremony. This is the second time I have been asked to give this address, the previous time in 2011.

As I did the last time, I will share my remarks in this post:

Thank you, Dean Fryer for inviting me to give the faculty address at this year’s Graduate Student Hooding Ceremony. It is a real honor. A good number of you in the room know me. I have been at OHSU for 34 years. Along the way, I started a graduate program, some of whose 22 graduates from this year are here today. I also started a department in the School of Medicine.

I work in the field of biomedical informatics, where we focus on the use of information and technology to advance health, healthcare, and biomedical research. Our graduate program matriculated its first students into our master’s program in 1996. From there, we developed a PhD program, a Graduate Certificate program, and bifurcated into two majors, one called Health and Clinical Informatics and the other called Bioinformatics and Computational Biomedicine. The field of clinical informatics is now a recognized medical subspecialty, and I am among the nearly 3000 physicians in the country who are board-certified. With today’s graduates, our graduate program is nearing the awarding of 1000 degrees and certificates. Our alumni have gone on to jobs in academia, industry, healthcare settings, and more, and some have become leaders in the field in their own right.

Our department in the School of Medicine is called the Department of Medical Informatics & Clinical Epidemiology, or DMICE as many know it, a fitting acronym for a department that does a lot of work with computers. The department has had a sustained record of accomplishment and impact since its inception 21 years ago. This is evidenced by a citation and publishing analysis of the top 2% of the world’s researchers maintained by the Elsevier Data Repository. In this dataset of over 200,000 scientists, there are 295 current and past OHSU scientists, 10 of whom are from DMICE. Our department’s research has been sustained by over $200 million of funding in grants and contracts.

As some of you know, I decided to step down from leadership of the department and graduate program last year. I continue on as a faculty doing what I love most, namely my research and teaching.

The success of our graduate program, our department, and everything else would never have been possible without the faculty, students, and staff who participated in the journey. And this gets to the theme of what I would like to tell today’s graduates, which is that whatever we do that is meaningful in our careers, it takes a group effort. Some of you are graduating to become scientists, while others of you are focused on professional practice. But no matter what your career path, it is important to remember that you have been and will be part of a team. Many of you who become academic faculty will be told of the importance of achieving independence as a researcher. Among those of you who will be practitioners, you will want to practice with professional independence.

Being independent is important, but for all of us, no matter how successful we are as individuals, we would not achieve our accomplishments without the help of other people. Our supervisors, our mentors, our colleagues, our students, our families, and, even if we do not participate directly in clinical care, our patients. Being in the field of informatics, I am both awed by and concerned about the growth of AI, but even the use of AI will not replace our interactions with people and our organizations.

So let me advise those of you who aspire to be scientists upon your graduation. Your research is important and contributes to the greater knowledge base that allows us to keep people healthy and treat their disease. You hope to succeed as individuals, and we want you to as well, but never forget those who provided the foundation for your success. Your parents, teachers, mentors, fellow students and others.

Consideration of the others who help you does not stop there. There are staff in your departments and labs who assist you with writing grant proposals, managing projects and their finances, and helping you deal with space, payroll, etc. And please do not forget patients, which is the whole reason why we work at an academic health science center. Whether your work is in the lab or involves computational approaches, virtually all of the research that goes on in academic medicine ultimately aims to benefit patients. No matter whether we are working on an animal model for a disease or a data set derived from electronic health records, we must honor and respect patients. And if we are privy to their data, we must respect its privacy and confidentiality.

For those of you who will be pursuing professional paths in informatics, management, dietetics, and other disciplines, an analogous approach applies. Your parents, teachers, mentors, fellow students and others are also key to your career success. You too will likely support the delivery of patient care, even if not directly, so you too must honor and respect patients and their data.

In summary, let me wish the best for our graduates and their families. These are challenging times as we emerge from the pandemic and face new threats to funding and other aspects of healthcare and science. Keep positive, draw from and contribute back to those you work and collaborate with, and do good. Thank you.

Wednesday, February 28, 2024

Kudos for the Informatics Professor - 2023 Update

Periodically in this blog, lately once a year, I review all of my professional accomplishments, including honors, talks given, papers published, and more for the past year. Last year was a transitional year for me professionally, as I stepped down from two leadership positions as Chair of the Oregon Health & Science University (OHSU) Department of Medical Informatics & Clinical Epidemiology (DMICE) and from being Director of the OHSU Biomedical Informatics Graduate Program. As seen in this post, however, my productivity did not miss a beat in 2023, and in fact I am enjoying my work as much as ever by being able to focus on teaching, research, and writing.

I was awarded one honor in 2023 which was the Lifetime Achievement Award from the HIMSS Oregon Chapter.

I also gave a large number of invited talks, some in person and others virtual. Links to my slides and references, and for some, videos of the talks, are available on my Web site. The talks given include:

  • From the Longwood Medical Area to Oregon: An Informatics Career Journey - Harvard Clinical Informatics Lecture Series (virtual), January 24, 2023
  • Introduction to Informatics:  What You Should Know as a Health Services Researcher - VA Health Services Research & Development Advanced Fellowship Program (virtual), March 7, 2023
  • Translational Artificial Intelligence (AI): The Need to Translate from Basic Science to Clinical Value - University of Iowa Distinguished Biomedical Scholars Lecture Series, Iowa City, IA, March 9, 2023
  • Competencies and Curricula Across the Spectrum of Learners for Biomedical and Health Informatics - University of Texas Southwestern Clinical Informatics Research Colloquium (virtual), May 18, 2023
  • Informatics Innovation: Quarter-Century of OHSU Leadership - HIMSS Oregon Chapter Annual Conference 2023, Portland, OR, June 1, 2023
  • Biomedical and Health Informatics: An Essential Discipline for 21st Century Medicine - Informatics Colloquium, Department of Pathology, Indiana University School of Medicine (virtual), July 19, 2023
  • Biomedical and Health Informatics: An Essential Discipline for 21st Century Medicine - Department of Medicine Meet the Global Expert Webinar, University of Cape Town, South Africa (virtual), August 17, 2023
  • Artificial Intelligence: Implications for Health Professions Education - Keynote Talk, Commission on Accreditation for Health Informatics and Information Management (CAHIIM) Summit on Higher Education (virtual), September 28, 2023
  • Artificial Intelligence in Medicine: Promise and Peril - Grand Rounds, Department of Pathology and Laboratory Medicine, OHSU (virtual), October 4, 2023
  • ChatGPT and Other AI Tools for Medicine and Medical Education - International Association of Medical Science Educators (IAMSE) Fall 2023 Webcast Seminar Series: Brains, Bots, and Beyond: Exploring AI's Impact on Medical Education (virtual), October 5, 2023
  • Artificial Intelligence: Implications for Health Professions Education - Educator’s Collaborative, OHSU (virtual), October 18, 2023
  • Artificial Intelligence in Medicine: Promise and Peril - 33rd Infectious Diseases Society of Oregon Annual Meeting, Salem, OR, October 26, 2023
  • Artificial Intelligence in Medicine: Promise and Peril - Public Health Officers Caucus of Oregon (virtual), November 28, 2023
  • Artificial Intelligence: Implications for Informatics Education - OHSU DMICE Conference, Portland, OR, December 14, 2023

I also participated in a number of panels at meetings:

  • Current Clinical Evidence and Available Technology - Voice AI Symposium, Bridge2AI-Voice Consortium, Washington, DC, April 19, 2023 (Panelist)
  • Program Sustainability - Office of the National Coordinator for Health Information Technology (ONC) Public Health Informatics & Technology (PHIT) Workforce Program (virtual) June 20, 2023 (Panelist)
  • Strategies for Effective and Equitable Partnerships - Third Meeting of the Data Science Initiative for Africa Consortium, Kigali, Rwanda, November 8, 2023 (Panelist)
  • Building Human and Data Capacity in the NIH Data Science for Africa Initiative - AMIA Annual Symposium 2023, New Orleans, LA, November 13, 2023 (Panelist and Moderator)
  • Bridging Training Gaps through Voice: An Ethics-based Approach to Teaching Trustworthy AI - AMIA Annual Symposium 2023, New Orleans, LA, November 14, 2023 (Panelist)
  • Generative Applications of Large Language Models for Medical Education and Knowledge Searching: Shall We Count on ChatGPT and Co.? - AMIA Annual Symposium 2023, New Orleans, LA, November 15, 2023 (Panelist)

I appeared on an episode of the Health and Explainable AI Podcast from the University of Pittsburgh HexAI Research Laboratory.

I was awarded one new grant in 2023 but continued to be busy with my four existing grant projects. The new grant was an administrative supplement to an existing grant, the Bridge2AI Voice as a Biomarker project, to develop a summer school for college undergraduates and students in clinical training. My existing grants include a National Library of Medicine (NLM) R01, NLM training grants for informatics predocs and postdocs and for a college summer internship program, another training grant in the Data Science Initiative for Africa, and being part of the Skills and Workforce Development Module of the Voice as a Biomarker project.

I continued my teaching in 2023. As always, a major part of my teaching my introductory course in biomedical and health informatics, which is offered as BMI 510/610 at OHSU, the AMIA-OHSU 10x10 ("ten by ten") course, and as MINF 705A/709B, a medical student elective at OHSU. I also gave my annual lecture to OHSU medical students in their first month of class, Information is Different Now That You're a Doctor, on August 25, 2023.

Wednesday, February 7, 2024

Translational AI: A Necessity and Opportunity for Biomedical Informatics and Data Science

How much of the hype for artificial intelligence (AI) that will truly impact health, healthcare, and research is an unknown. The potential benefits are unequivocal, from assistant patients pursue actions to improve their health to giving guidance to clinicians in diagnosis and treatment to helping researchers find information and devise new ideas to advance their research.

I have published an invited post in the National Library of Medicine (NLM) Musings from the Mezzanine Blog, the blog of the Director of the NLM. I chose to update some of my past writings posted in this blog with a new discussion of what I call translational AI.

The tl;dr is:

  • The actual day-to-day use of clinical AI in healthcare is still modest, according to surveys.
  • While there are thousands of machine learning model papers that are published, and many systematic reviews of those model papers, there are a much small number, probably on the order of 100, randomized controlled trials (RCTs) of AI interventions in healthcare.
  • Of those RCTs, not all have resulted in positive outcomes and a number of them have risk of bias concerns.

Clearly, as in all of healthcare, we cannot do RCTs on every permutation of model, implementation, setting, etc. of AI. However, we must treat AI the same way as any other tool we use in healthcare: Show us the evidence. Granted, evaluating the use of AI has plenty of differences from evaluating other interventions used in patient care, such as drugs and devices. It is difficult to conure a “placebo” for AI, and hard to perform controlled studies when AI, such as ChatGPT, is all around us.

Nonetheless, we can apply evidence-based medicine (EBM) to help inform its clinical use. The ideal way to do that is through randomized controlled trials (RCTs), or ideally systematic reviews of RCTs. As I note in the post, this is imperative not only for those of us who promote the use of AI and other biomedical and health informatics interventions, but also for students and trainees looking for projects to develop impactful research programs in their careers.

Tuesday, January 30, 2024

Whither Search? A New Perspective on the Impact of Generative AI on Information Retrieval (IR)

When I was putting the finishing touches on the 4th edition of my textbook on information retrieval (IR, also known as search) in the domain on biomedicine and health in 2020, I wondered whether the major problems in the field of IR were mostly solved. Retrieval systems such as Google for general Web searching and PubMed for the biomedical literature were robust and mature. One literally had the world’s written knowledge at their fingertips for general and biomedical topics from these systems respectively (even if paywalls did not always allow immediate access to the content).

There were certainly some areas of IR where additional work was needed and important, e.g., search over specific types of content such as social media or, in the case of my own research, electronic health record (EHR) data and text. There were also some nascent advances in the application of machine learning, although the gains in experimental results were more incremental than transformative.

But any staidness of IR was upended by the emergence of generally available generative artificial intelligence (AI) chatbots, based on large language models (LLMs), initially with ChatGPT and soon others to follow. Shortly thereafter came generative AI capabilities added to the two major Web search engines, Microsoft Bing and Google. All of a sudden, searching the Web was transformed in ways that most of us did not see coming.

I recently took advantage of the call for papers for a special issue devoted to ChatGPT and LLMs in biomedicine and health of the flagship journal for the field of informatics, JAMIA, to write a perspective piece on why search is still important, even in the era of generative AI. At least for me, while the answer to my question is important in a search, it is also critical to know where the information came from. In addition, as I am commonly synthesizing my own knowledge and views on a topic, I do not just want a single generative AI answer to my question but rather the source articles and documents so I can compare and contrast different views and develop my own answer.

At the close of the paper, I do acknowledge that there may well be areas of IR where generative AI may have major impact going forward. I know that there is a lot of buzz around retrieval-augmented generation (RAG), although for many of the questions on which I search, I am much more interested in generation-augmented retrieval (GAR?). That is, how can generative AI methods improve the way we search to steer us to the kinds of authoritative, originally sourced information we seek to carry out our work?

The day before the article was published, a reporter who came across my preprint wrote a piece on the impact of AI on search, noting some of the issues I raise with regards to accuracy and authority for search in fields like medicine and in academia.

The paper itself has been published in JAMIA as an Advance Article, Hersh W, Search still matters: information retrieval in the era of generative AI, Journal of the American Medical Informatics Association, 2024, ocae014. Unfortunately, the open-access publishing fee for JAMIA is fairly steep ($4125), especially for a short perspective piece like this, but those wanting to read it can access the preprint that I posted.