Tuesday, January 30, 2024

Whither Search? A New Perspective on the Impact of Generative AI on Information Retrieval (IR)

When I was putting the finishing touches on the 4th edition of my textbook on information retrieval (IR, also known as search) in the domain on biomedicine and health in 2020, I wondered whether the major problems in the field of IR were mostly solved. Retrieval systems such as Google for general Web searching and PubMed for the biomedical literature were robust and mature. One literally had the world’s written knowledge at their fingertips for general and biomedical topics from these systems respectively (even if paywalls did not always allow immediate access to the content).

There were certainly some areas of IR where additional work was needed and important, e.g., search over specific types of content such as social media or, in the case of my own research, electronic health record (EHR) data and text. There were also some nascent advances in the application of machine learning, although the gains in experimental results were more incremental than transformative.

But any staidness of IR was upended by the emergence of generally available generative artificial intelligence (AI) chatbots, based on large language models (LLMs), initially with ChatGPT and soon others to follow. Shortly thereafter came generative AI capabilities added to the two major Web search engines, Microsoft Bing and Google. All of a sudden, searching the Web was transformed in ways that most of us did not see coming.

I recently took advantage of the call for papers for a special issue devoted to ChatGPT and LLMs in biomedicine and health of the flagship journal for the field of informatics, JAMIA, to write a perspective piece on why search is still important, even in the era of generative AI. At least for me, while the answer to my question is important in a search, it is also critical to know where the information came from. In addition, as I am commonly synthesizing my own knowledge and views on a topic, I do not just want a single generative AI answer to my question but rather the source articles and documents so I can compare and contrast different views and develop my own answer.

At the close of the paper, I do acknowledge that there may well be areas of IR where generative AI may have major impact going forward. I know that there is a lot of buzz around retrieval-augmented generation (RAG), although for many of the questions on which I search, I am much more interested in generation-augmented retrieval (GAR?). That is, how can generative AI methods improve the way we search to steer us to the kinds of authoritative, originally sourced information we seek to carry out our work?

The day before the article was published, a reporter who came across my preprint wrote a piece on the impact of AI on search, noting some of the issues I raise with regards to accuracy and authority for search in fields like medicine and in academia.

The paper itself has been published in JAMIA as an Advance Article, Hersh W, Search still matters: information retrieval in the era of generative AI, Journal of the American Medical Informatics Association, 2024, ocae014. Unfortunately, the open-access publishing fee for JAMIA is fairly steep ($4125), especially for a short perspective piece like this, but those wanting to read it can access the preprint that I posted.