Thursday, April 21, 2011

Information Retrieval (Search) in Health and Biomedicine Still "Springs" Eternal

One of my earliest visions of computers in medicine was the ability to type in a question and get an answer. In 1980s, while everyone in informatics was trying to build expert systems, I followed a different dream, of being able to find clinical information seamlessly. In that decade, however, I never could have imagined being able to pull up something called a Web browser, typing in words, and getting back "pages." Especially as I can do now, with something that fits in my pocket, also makes phone calls, and is connected to something I had not yet heard of in the 1980s (before I started my informatics training) called the Internet.

This fascination guided my early research interests in the area of information retrieval. I write about it now because every spring I teach my course on this topic in the OHSU graduate program, BMI 514/614. (Hence the title of this posting.) My interest in this area resulted in dozens of scientific papers and a textbook, currently in its third edition [1]. Despite the marvel I have for today's modern systems, I always have to ask myself, Why didn't I think of the idea of ranking the output (Web pages) by how many other pages pointed to them? Had I thought of that before a couple Stanford graduate students named Brin and Page, my life might be considerably different. Or at least my wealth!

I suppose one is getting up in the years when you marvel at how things are now relative to how you remember them. I certainly recall "searching" when I was in medical school in the 1980s, which involved thumbing through the giant Index Medicus books on long shelves in the library. You would "link" to the full text by walking to a different part of the library where the journals were. If your needs were really critical, you could call on a librarian for help, who would take your request to a special computer that accessed a database somewhere (which happened to be MEDLINE, from the National Library of Medicine).

I actually did my first on-line searching in the 1980s. I was able to access PaperChase, and later Elhill, through dial-up networks, though at a price. For an even heftier price, you could get access to the full text … at least "text" in monospaced font and no figures or images. The world did advance, and by 1998 you could search Pubmed for free. (Al Gore, who actually deserves more credit in this area than his critics deny him, did the first "free" search.)

Now, of course, searching is ubiquitous. You can't even not do it, since most browsers will throw you into a search engine when you type in an invalid Web address (URL) into your browser. And the world not only searches, but searches for health information. The two major periodic surveys of health information searching show that 80% of Internet users have searched for health information for themselves, their family, or their friends [2, 3].

Of course, like many areas of informatics, while use of systems is ubiquitous, not all of the problems of systems are solved. Indeed, a few years ago I wrote a short piece on this topic [4]. As wonderful as today's search systems are, we still have many areas for improvement. In that paper, I identified four areas where grand challenges remained:
  • Content - getting diverse users to the right information for the right task
  • Indexing - developing better metadata to get searchers to that proper content
  • Linkage - allowing navigation across multiple resources, even those of different publishing entities
  • Access - making access as open as possible but still being protective of intellectual property
Just as I could not fathom the World Wide Web in the 1980s, I wonder as I write this in 2011 what the world of search and on-line knowledge access will be a decade or two from now.


Hersh, W. (2009). Information Retrieval: A Health and Biomedical Perspective (3rd Edition). New York, NY. Springer.
Fox, S. (2011). Health topics. Washington, DC, Pew Internet & American Life Project.
Taylor, H. (2010). "Cyberchondriacs" on the Rise? Those who go online for healthcare information continues to increase. Rochester, NY, Harris Interactive.
4. Hersh, W. (2008). Ubiquitous but unfinished: grand challenges for information retrieval. Health Information and Libraries Journal, 25(Suppl 1): 90-93.


  1. Sounds like a great segue into your perspective of what a Universal Exchange Language might look like. Can you do some speculation? Are you talking about what the future might look like to your students?

  2. Tom, you are referring to the PCAST report, for which I have provided my thoughts in another post (

    The PCAST universal exchange language (UEL) refers to a granular method for representing clinical data. The field of IR (search) has historically focused on what I call "knowledge-based information," which I distinguish from "patient-based information." The former focuses on the knowledge (science) base of health and biomedicine, whereas the latter represents information about patients.

    Of course, moving forward, the distinction between the two may become less distinct, especially as the "learning healthcare system" takes hold.

    I do cover some aspects of clinical/patient data in my course, especially the natural language processing (NLP) aspects, which I also covered in a post here (

    I do agree that these issues are important going forward, though the UEL concept needs a lot more work before it can be operationalized, as the ONC HIT Policy Committee agrees.