Thursday, January 19, 2012

OHSU Biomedical Informatics Education: By the Numbers

This year, 2012, marks the 17th year of the Oregon Health & Science University (OHSU) Biomedical Informatics Graduate Program. What began with a half-dozen Master of Science (MS) students has grown to one of the largest programs in the country, featuring certificates and degrees, three tracks, and a large distance learning component. While informatics education actually started in 1992 with the awarding of our first National Library of Medicine (NLM) Fellowship Training Grant, we did not launch any degree program until 1996.

Now, 17 years later, the program has awarded 401 degrees and certificates to 378 people. Another 19 have completed the NLM Fellowship program without obtaining a degree, bringing the total of alumni to 397 individuals. These alumni are highly successful by any measure, having taken jobs in industry, academia, healthcare organizations, and other settings. Some have gone on to become successful faculty in the field while many more have gone on to take operational informatics roles in companies, healthcare institutions, governments, and other organizations.

The entire number of individuals who have enrolled in any of our graduate programs since its inception  is 1297. The largest enrollment has been in the Graduate Certificate Program, with 974 enrollees. The total enrollment in the other programs has been 27 PhD, 217 MS, and 240 Master of Biomedical Informatics (MBI). The total number of students in all programs adds up to more than the total number of students because some have taken advantage of the "building block" structure of the program that allows students who are enrolled at one level to carry courses forward to a higher level (subject to time limitations).

The most common instance of advancing from one program to a higher level is moving from the Graduate Certificate to one of the master's degree programs. This has been done by 118 students, 41 of whom have completed the master's degree. Three individuals have advanced from the Graduate Certificate to the PhD (two being master's students along the way), while 15 out of the 27 students who have even been in the PhD program were in a master's program at some point.

The breakdown of degree and certificate graduates of the program is 11 PhD, 66 MS, 88 MBI (also including a half-dozen who completed the degree when it was called Master of Medical Informatics) and 236 Graduate Certificate. While it may seem that the program has a somewhat low graduation rate, it must be remembered that people enroll in graduate programs for reasons other than obtaining a degree or certificate. A not insignificant number of our students are already highly accomplished academically and choose a more "a la carte" path to furthering their education than obtaining a degree or certificate. Another reason for a seemingly low rate of graduation is that the program has witnessed substantial growth in recent years, meaning that many students in the program are still current students.

Students in the program have always come from a diversity of backgrounds and many bring substantial prior achievement into the program. More complete data about student backgrounds is available for those who enrolled in the program since 2007, when OHSU started doing a better job of capturing such data.

Of the 819 individuals who have enrolled in the program in 2007 or later, a total of 215 (26.3%) have prior master's degrees, 56 (6.8%) have non-medical doctoral degrees, and 211 (25.8%) have medical degrees. The most common types of master's degrees include MS (82), MBA (40), and MPH (27). The program has 26 (3.2%) individuals with explicit nursing degrees, although we know there are some other nurses who do have "nursing" explicitly in their degree titles.

Another group into which to drill down further is physicians, especially in light of the newly designated subspecialty of clinical informatics and the possibility of altered pathways for training in the future (i.e., clinical fellowships). Our physician enrollment since 2007 includes 184 MD, 15 MBBS, and 12 DO over the course of the program, with 78 MD, 6 MBBS, and 1 DO graduating. These numbers represent 25.8% of the enrollment and 30.7% of the graduates in the program. The MBI program has the highest proportion of physicians, with physicians representing 41.3% of enrollees and 38.3% of the graduates. Physicians in the program tend to be established in their careers and taking on informatics as a new career, with many already reporting professional activity in the field. The average age of physicians in the Graduate Certificate program is 47.8 years and in the MBI program is 44.2 years.

The age distribution of students enrolled since 2007 shows that those in the programs of shorter duration (Graduate Certificate and MBI) tend of be of higher age than the longer programs (MS and PhD). The average age of students in the different programs respective is Graduate Certificate 44.0, MBI 43.6, MS 35.6, and PhD 39.2.

The majority of students in the program since 2007 pursue their studies on a part-time basis. In the Graduate Certificate Program, the average duration of enrollment (including current students) is 1.5 years and the average time to graduate is 2.6 years. In the MBI program, the average duration of enrollment is 3.1 years, while the average time to graduate is 3.9 years. These numbers tend to be relatively comparable for groups with different backgrounds, i.e., those with doctoral degrees, master's degrees, or medical degrees.

The numbers reported so far represent the entire program. As noted at the onset, the program actually has three tracks. While the clinical informatics track dominates the numbers, it is important to note the other two tracks in the program.

The second track in the program is the bioinformatics and computational biology (BCB) track. This track is available on-campus only, and has subject matter that is more computational in nature. Since the launching of this track in 2007, a total of 31 students have matriculated, with 24 in the master's degree programs and seven in the PhD program. (One PhD student had a previous MS from the clinical informatics track.) The program has had 11 graduates, eight with a master's degree and three with a PhD. (One master's graduate completed both the clinical informatics and BCB tracks.) This track is likely to see growth for a variety of reasons, including from the growing role of genomics and related areas in healthcare and biomedical research as well as OHSU's continued investment in quantitative biosciences.

The third track is the health information management (HIM) track, which was launched in 2008. The motivation for this track was to bring about more integration of the HIM and clinical informatics fields, and the curriculum has been structured accordingly, with students combining classes from the clinical informatics track as well as those specific to HIM. The program is accredited by the Commission on the Accreditation of Health Informatics and Information Management (CAHIIM), which allows students to sit for the Registered Health Information Administrator (RHIA) certificate. Since inception, a total of 56 students have enrolled, 17 of whom have graduated. Of the graduates, five have sat for and passed the RHIA certification exam. Just as the American Health Information Management Association (AHIMA) is calling for the HIM entry level to move to the master's degree level, we are expanding the HIM track to the master's degree programs and seeking CAHIIM accreditation at that level.

One final part of our program is the 10x10 ("ten by ten") program. This program was started in partnership with the American Medical Informatics Association (AMIA) in 2005, when then-AMIA President Charles Safran called for one physician and one nurse to be trained in informatics at each US hospital. OHSU operationalized this definition to become 10,000 individuals trained in informatics by 2010 (hence "10x10") and became the first partner with AMIA to offer such a course, which is a standalone version of the introductory course (BMI 510) in the clinical informatics and HIM tracks of the graduate program. At the end of the course, students can optionally take the BMI 510 final exam. If they receive a B grade or better, they can then receive graduate credit for the course upon matriculating in one of the graduate programs without paying additional tuition.

At the end of 2010, a total of 999 individuals completed the OHSU 10x10 offering. Because of continued interest in the course, we have continued to offer it, and by the end of 2011, 1169 people have now completed it. Of those people, 522 (44.7%) have passed the optional final exam, and about 15% gone on to enroll in one of the graduate programs (usually the Graduate Certificate but sometimes the master's program). One individual has "run the table" of OHSU education, starting in 10x10 and advancing to the Graduate Certificate, MBI, and PhD programs.

Wednesday, January 18, 2012

Secondary Use of Clinical Data, the TREC Medical Records Track, and "Big Data" in Biomedicine

Last week I had the opportunity to present my latest research activity at the OHSU Biomedical Informatics Conference we hold almost every Thursday during the academic year. I chose to present work about the TREC Medical Records Track and its place in the larger context of "secondary use" of clinical data in electronic health record (EHR) systems [1]. The impetus for this work grows with the increasing adoption of EHRs under the HITECH Act, along with the vision of the "learning healthcare system" [2].

I will not recapitulate the talk here, which covers the rationale, data, methods, and early results of the TREC Medical Records Track. (Details can be found on the video and slides from the talk.) I will, however, explore the relationship of what is increasingly called "big data" to biomedicine. One can easily find volumes of information on the Web about big data, but the vision is probably best articulated in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery, published in 2009 by Microsoft Research [3]. This book presents visionary essays on how the growing amount of big data, from EHRs to biomolecular data to patient-entered data will facilitate new discovery of knowledge that conventional experiments will not. As other non-medical essays in the book show, this approach has led to many discoveries in other disciplines that use this form of eScience. We also know that businesses and others make productive use of the vast troves of data they collect from purchases, Web chatter, and other sources of information.

It is important to remember, however, that the existence of large volumes of electronic data does not guarantee that this data will automatically translate into knowledge. In my talk, I reviewed the unfortunately modest amount of literature on this topic. The bottom line, discussed and referenced in more detail below, is that medical records are not only incomplete, but they are also often much less meticulously kept than research data. As I have said in the past, clinical documentation is often what stands between the clinician's daily work and his or her going home for dinner. Another problem with medical records of course is that the data are observational and not experimental, so confounding factors can influence conclusions that might be drawn.

In preparing for this talk, I came across a somewhat obscure but well-written critique of big data [4]. As often happens, I found this paper almost by accident, being pointed to it by one of the email lists to which I subscribe. The primary author of the paper is Danah Boyd, who is another member of Microsoft Research and is also Research Assistant Professor in Media, Culture, and Communication at New York University as well as Visiting Researcher at Harvard Law School. (The paper was delivered as a keynote address at the Oxford Internet Institute's A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society on September 21, 2011.)

Boyd and her co-author list six "provocations" for big data, which sum up to the best critique of big data I have seen. These provocations give us thoughts for concern and are all relevant to biomedicine.  I list them here along with my commentary for applicability in biomedicine or other general comments:
  1. Automating Research Changes the Definition of Knowledge - In all research, we tend to meld the question to the data we can obtain. This has certainly been true in biomedical research, where some have criticized research with answering questions either of interest to the research or that have expediency in being able to answer [5, 6, 7]. We need to remember that the data available in electronic systems, big or small, similarly impacts the questions we ask.
  2. Claims to Objectivity and Accuracy are Misleading - Just because data are collected in a disinterested way does not mean that bias does not occur. We certainly know from the clinical documentation setting (see above or [8]) that data entered by clinicians is not necessarily accurate, objective, or complete.
  3. Bigger Data are Not Always Better Data - This has always been known in medicine from the context of those who do "claims" research based on data collected for billing purposes, which usually consists of diagnosis and procedures codes. One argument for this type of research is the sheer volume of such data, but we also know that this data does not give a complete picture of the patient [9, 10].
  4. Not All Data Are Equivalent - We certainly know from the clinical setting that certain types of data (e.g., data collected by motivated researchers) are more likely to be of higher completeness and accuracy than others (e.g., clinical documentation) [11].
  5. Just Because it is Accessible Doesn’t Make it Ethical - I agree with the author that the use of Institutional Review Boards is important but also has its limitations in keeping research ethical.
  6. Limited Access to Big Data Creates New Digital Divides - I have seen this issue play out in information retrieval research, where the researchers from the big search engine companies have access to proprietary data, which makes peer review as well as reproducibility of the work difficult at best. I know Jimmy Lin personally, and it pains me to read his comment quoted in this paper.
In summary, EHRs provide great potential for improving health and the delivery of healthcare through the learning health system, but we also must remember the caveats of doing so. The consumers of this data need to be cognizant of its limitations but also supportive of the research that explores its more effective use.


1. Safran, C., Bloomrosen, M., et al. (2007). Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. Journal of the American Medical Informatics Association, 14: 1-9.
2. Friedman, C., Wong, A., et al. (2010). Achieving a nationwide learning health system. Science Translational Medicine, 2(57): 57cm29.
3. Hey, T., Tansley, S., et al., eds. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, WA. Microsoft Research.
4. Boyd, D. and Crawford, K. (2011). Six Provocations for Big Data. Cambridge, MA, Microsoft Research.
5. Harari, E. (2001). Whose evidence?  Lessons from the philosophy of science and the epistemology of medicine. Australia and New Zealand Journal of Psychiatry, 35: 724-730.
6. Cohen, A., Stavri, P., Hersh W. (2004). A categorization and analysis of the criticisms of evidence-based medicine. International Journal of Medical Informatics, 73: 35-43.
7. Tunis, S., Stryer, D., et al. (2003). Practical clinical trials - increasing the value of clinical research for decision making in clinical and health policy. Journal of the American Medical Association, 290: 1624-1632.
8. Benin, A., Vitkauskas, G., et al. (2005). Validity of using an electronic medical record for assessing quality of care in an outpatient setting. Medical Care, 43: 691-698.
8. Jollis, J., Ancukiewicz, M., et al. (1993). Discordance of databases designed for claims payment versus clinical information systems:  implications for outcomes research. Annals of Internal Medicine, 119: 844-850.
9. O'Malley, K., Cook, K., et al. (2005). Measuring diagnoses: ICD code accuracy. Health Services Research, 40: 1620-1639.
10. Berlin, J. and Stang, P. (2011). Clinical Data Sets That Need to Be Mined, 104-114, in Olsen, L., Grossman, C. and McGinnis, J., eds. Learning What Works: Infrastructure Required for Comparative Effectiveness Research. Washington, DC. National Academies Press.

Thursday, January 5, 2012

Connecting Informatics Research to Practice: Innovations for AMIA 2012

This year, I will be serving as the Scientific Program Committee (SPC) Chair for the AMIA 2012 Annual Symposium. The annual "AMIA meeting" is the most important biomedical and health informatics scientific meetings of the year, attracting the highest-quality submissions and otherwise providing a snapshot of the field through keynote talks, panels, and other sessions. I am honored to have been selected as SPC Chair for the 2012 meeting and, like many SPC Chairs before me, hope to make some innovations to the meeting that prove to be enduring in value. I was interviewed at the AMIA 2011 conference to give my perspective on the conference and my role in 2012.

The AMIA 2012 innovation I am most excited about is a new category of presentation we are calling the State of the Practice. This session type fits in well with my growing activity at the intersection between the science and practice of informatics. We hope to accept sessions led by experts and leaders from operational settings who will describe key problems and challenges whose solutions have answers in the scientific research of the field. These sessions will provide what all mature professions must have, which is robust and pertinent science that supports operational practice.

Another key AMIA 2012 innovation is a submission category for podium presentations of abstracts. As many AMIA authors and presenters know, the indexing of AMIA papers in the MEDLINE bibliographic database has been a mixed blessing. While it enables authors to have their work made more visible by indexing in the premier biomedical literature database, it also often precludes later, more substantive publication of the work in a scientific journal, due to rules around "prior publication." This new category of submission will allow authors to present their most innovative and cutting-edge work, with the abstract published in the proceedings but not indexed in MEDLINE, so that the author will retain complete flexibility for future publication of the work.

A couple other new changes will be the return of the tutorial program to presenter-initiated submissions (instead of commission by an AMIA committee) and a new pre-symposium program for AMIA Working Groups. The conference Call for Participation provides details on submitting for presentation.