Monday, November 18, 2024

10x10 (“Ten by Ten”) Course: New Name and Milestones

Although I consider artificial intelligence (AI) to be part of biomedical informatics, there is no question that AI has become very important on its own, in and beyond biomedicine and health. This is certainly reflected in the introductory course I teach to graduate, continuing education, and medical students. Not only one of ten units in the course devoted to AI, but AI is increasingly infused across other units, such as information retrieval and clinical research informatics.

To this end, I am changing the name of course, which will now be called, Introduction to Biomedical Informatics and Artificial Intelligence. The course content will not change substantially because AI is already part of the course. Thus, topics like electronic health records, data standards and interoperability, health information exchange, privacy and security, telehealth, information retrieval, clinical research, and more will remain, perhaps with their connections to AI more prominent.

The 10x10 course hit another milestone recently, which is its 20th year of being offered. Since 2005, over 3300 people have completed the 10x10 course. The graduate student version of the course has been around even longer, dating back to 1994, and has been completed by over 1700 students, not only those in our graduate program but also students in other programs such as public health, nursing, and basic biomedical sciences, who take the course as an elective.

The ten units of the course cover the following topics:

  1. Overview of Fields and Problems Motivating Them
  2. Computing Concepts for Biomedical Informatics
  3. Electronic and Personal Health Records (EHR, PHR)
  4. Standards and Interoperability
  5. Artificial Intelligence
  6. Advanced Use of the EHR
  7. EHR Implementation, Security, and Evaluation
  8. Information Retrieval (Search)
  9. Research Informatics
  10. Other Areas of Informatics

Thursday, November 7, 2024

Generative AI in My Teaching - The Final Shoe Drops

I recently published a paper in the journal npj Digital Medicine looking at how well generative artificial intelligence (AI) systems perform in my well-known introductory biomedical and health informatics course. This online course is taught to three audiences: graduate students (required in our graduate program and taken as an elective by students in several others, including public health, basic sciences, and nursing), continuing education students (the well-known 10x10 ["ten by ten"] course), and medical students at Oregon Health & Science University (OHSU). Student assessment varies for the three different course audiences and is carried out by up to three activities: multiple-choice quizzes (MCQs) of ten questions for each of the ten units of the course, a short-answer final exam, and a 10-15 page term paper on an appropriate topic of the student's choice. Graduate students must complete all three forms of assessment, while continuing education and medical students write a shorter 2-3 page paper. The final exam is required for graduate students and optional for continuing education students who want to obtain graduate credit, usually with the purpose of pursuing further study in the field.

As also described in a summary of the paper by OHSU, our research found that any of six well-known large-language model (LLM) systems score better than up to 75% of all students and easily achieve a passing grade for the two-thirds of the course assessment that include the MCQs and final exam. The LLMs were assessed and compared with the 139 students who took last year’s (2023) version of the course. The results of the study bring into question how I will assess students in the course going forward.

I thought I had some solace in that LLMs still could not write what is the final third of the assessment for graduate students, namely the term paper. Alas, this was before someone pointed me to a new generative AI system from researchers at Stanford University, led by Professor Monica Lam, called Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking (STORM). Although STORM is designed to guide users through an iterative pre-writing process, its Web interface also allows a two-part prompt to define the topic of the paper and its format, length, and other attributes. When I ask for what I want for that last part of the course assessment, namely the 10-15 page paper on a focused topic, STORM serves up a paper that, while perhaps superficial in some coverage, generates a paper that for the most part satisfies the requirements of my paper-grading rubric. My co-author of the npj Digital Medicine paper Kate Fultz-Hollis noted that the first papers I generated did not have many peer-reviewed citations in the reference, but that was easily fixed by asking for that explicitly in the prompt.

Now I must ask, has the final shoe dropped, i.e., can generative AI now pretty much do everything needed to pass my course? I hope that students will still want to learn informatics, but clearly that will not be a requirement for passing. Those of us who are educators have new challenges from generative AI, which is based on their ability to perform as well as students in a variety of learning assessments. One researcher, business professor Ethan Mollick from the University of Pennsylvania, has called this the homework apocalypse.

Some have argued that new approaches to assessment are required, and Professor Mollick has a wealth of ideas. Many of these ideas are challenging to implement, especially in large online courses and when there is a true knowledge base for which we aim for students to learn. I do agree with those who advocate we should not merely assess students to regurgitate facts, especially in an era when finding facts online is easier than ever. But I do try in my teaching (maybe not always succeeding) to have students apply the knowledge they are learning. I find MCQs are actually pretty good at that.

Nonetheless, the implication for these results is that generative AI systems challenge our ability to assess student learning. This will require us to make modifications in how we evaluate students. This does not mean we should ban LLMs, but that we need to find ways to ensure enough learning so students can think critically based on a core of fundamental knowledge.

We also need to answer other questions: Is there a core of knowledge about which students should be able to answer questions without digital assistance? Does this core of knowledge facilitate higher-order thinking about a discipline? Does that core enable thoughtful searching for information, via classic search or LLMs, for information beyond the human’s memory store? Should we have explicit policies around the use of generative AI in specific courses (here is mine)? Is it appropriate to try to maintain rigor in academic teaching, and if so, how?

I have talked in a number of forums about these issues, and find many other educators are struggling to address these challenges like I am. Clearly we will need solutions for these problems in ways that optimizes student learning and critical thinking while make the best of these tools that can enhance our performance in tasks we are learning.