Thursday, August 18, 2011

The Passing of a Giant: Senator Mark O. Hatfield (1922-2011)

Oregonians and indeed many Americans of all political stripes mourned the passing last week of a truly great politician whose statesmanship and bipartisanship seem almost like an anachronism in contrast to our current hyper-partisan, 24-hour news cycle-driven political gridlock. Former Oregon Sen. Mark O. Hatfield was a politician who transcended party and ideology and whose work led to true benefit for large numbers of people, not only those living in Oregon.

While many politicians are an abstraction to most people, impacting their lives only in indirect ways, Sen. Hatfield was personal and real in my life. My presence and success at Oregon Health & Science University (OHSU) as well as the flourishing of our Department of Medical Informatics & Clinical Epidemiology (DMICE) have their origins that can be attributed to Sen. Hatfield.

I personally would not be in Oregon were it not for the "earmark" established by Sen. Hatfield for OHSU under the National Library of Medicine (NLM) Integrated Advanced Management Information Systems (IAIMS) program in the 1980s. I know that political earmarks have a bad name now, but the IAIMS earmark to OHSU was an investment that launched nationally prominent programs in biomedical informatics and clinical epidemiology and shows that such investments can bring true and lasting value. From a financial standpoint, the returns on this investment have accrued manyfold times over for OHSU and the Oregon economy. And perhaps more importantly, the scientific accomplishments and training of future generations of professionals and leaders have even larger returns beyond the financial.

Sen. Hatfield may not have been an expert in informatics or clinical epidemiology, as few people were in the 1980s, but his attaching an earmark to the IAIMS initiative shows that in the 1980s, he had the foresight to see the future potential for these fields in health and biomedicine. He was also a tireless crusader for all types of funding for biomedical research as well as higher education throughout his career.

Sen. Hatfield's political views were different but consistent. He truly matched a label we almost never see any more, namely a "liberal Republican." Sen. Hatfield characterized the proper meaning of the word "liberal." He was a proponent of free markets and economic liberty when they made practical sense, but also recognized when they did not, such as in education and health care. He was an advocate for national defense but opposed military adventurism, best exemplified by being a World War II veteran but also an early opponent of the Vietnam War. Sen. Hatfield recognized the proper role of government in a capitalist society, and it is unfortunate that modern opponents of true liberalism have been able to so successfully redefine the word and the political meaning and actions of those who are true liberals.

I did not agree with all of Sen. Hatfield's political positions. For example, his pacifism and reverence for life led him to oppose the reproductive rights of women. However, I can laud him for consistency in his views of truly being "pro" life, not only opposing abortion, but also capital punishment, corporate misbehavior, and military overreach.

Just as Oregon will miss Sen. Hatfield, it is to our country's detriment that there are not more politicians like him, whether they affiliate themselves with the Republican or Democratic Party. I am not sure Sen. Hatfield would be revered by most leaders of the current Republican Party, although unfortunately, most modern Democratic Party leaders would eschew him also. But rejecting statesmen like Sen. Hatfield will only be to our detriment. The real problems of our debt, unemployment, runaway healthcare costs, and many more will only be solved by people and leaders who place political pragmatism over ideology and those who consider all facts instead of their selective interpretation to score political points. Whether Republican or Democratic, we need more people like Sen. Hatfield back in our political system and dialogue, and this is all the more reason why we should truly mourn his passing.

Although I did not know him well, Sen. Hatfield touched my life and enabled my success. For that reason, I will laud him as well as miss him and people like him.

Monday, August 1, 2011

Identifying Patients for Clinical Studies from Electronic Health Records: The TREC Medical Records Track

The substantial federal investment devoted to electronic health record (EHR) adoption in the Health Information Technology for Economic and Clinical Health (HITECH) Act brings many potential benefits to health care. In addition to the improved availability of information about patients during the delivery of care is the ability to better “learn” from what we do in health care so we can better understand what works and what does not [1]. This is one aspect of how we will benefit from the secondary use (or re-use) of clinical data in EHRs [2].

Another substantial federal health care-related investment is in “comparative effectiveness research” (CER), which focuses medical research (e.g., clinical trials) on critical health care-related questions in head-to-head comparisons in real-world settings [3]. A total of  $1.4 billion of funding in the American Recovery and Reinvestment Act (ARRA) was allocated for CER, with a mandate to establish the Patient-Centered Outcomes Research Institute (PCORI), a public-private entity to prioritize the investment in CER. One of the first products of the government’s CER efforts was a list of the top 100 priority clinical conditions, developed by the Institute of Medicine (IOM),  to guide CER efforts and funding at the federal level.

In the meantime, there have been other federal investments in using health IT to facilitate clinical research. One of these is the National Institutes of Health (NIH) Clinical and Translational Research Award (CTSA) program, which funds 60 centers nationwide to facilitate translational research. Another effort comes from the Strategic Health IT Advanced Research Projects (SHARP) Program of the HITECH Act, which funds four priority areas of research in health IT, including the secondary use of clinical (including text) data.

Against this backdrop of government and other investment in health information technology comes a new track in the Text Retrieval Conference (TREC), an annual challenge evaluation hosted by the US National Institute for Standards & Technology (NIST). TREC is a long-standing event that builds “test collections” allowing different approaches to information retrieval (IR) to be assessed in an open and comparable manner. Each year, a number of “tracks” are held within TREC devoted to different aspects of IR, such as Web searching or cross-language IR [4]. While TREC is focused on general IR, there have been some tracks devoted to IR in specific domains, one of which in the past was genomics [5].

This year, TREC has launched a Medical Records Track. With TREC’s focus on IR, the goal of the track is to develop a task that is both pertinent to real-world clinical medicine and within the scope of IR research. The track is fortunate to have received access to a large corpus of medical text that has been de-identified. These documents are organized as visits (or encounters). The de-identification process prevents linking multiple visits for a single patient. The retrieval task in the first year of the TREC Medical Records Track will be one of retrieving cohorts of patients who would fit criteria to participate in clinical studies. The retrieval “topics” will come from the IOM list of CER priority conditions, modified to create unambiguous and an appropriate quantity of retrieved documents. OHSU has received a grant from NIST to organize the topic development and relevance assessment processes of the track.

The documents for the task come from the University of Pittsburgh NLP Repository, a repository of 95,702 de-identified clinical reports available for NLP research purposes. The reports were generated from multiple hospitals during 2007, and are grouped into “visits” consisting of one or more reports from the patient’s hospital stay. Each document is formatted in XML, with a cross-walk table that matches one or more documents to visits. There are a total of 17,199 visits.

Each document contains four sources of information that can be used for the task:
  • Chief complaint
  • Admit diagnosis (as ICD-9 code)
  • Discharge diagnosis(es) (as ICD-9 code)
  • Report text
The documents come from a number of different report types:
  • Radiology Reports - 47,555
  • History and Physical Exams - 15,721
  • Emergency Department Reports - 13,424
  • Progress Notes - 8,538
  • Discharge Summaries - 7,931
  • Operative Reports - 5,032
  • Surgical Pathology Reports - 2,877
  • Cardiology Reports - 632
  • Letter - 1
The task will require relevance assessments for each visit, with retrieval performance measured by recall, precision, and related measures (e.g., mean average precision – MAP) based on the assessments. As with all TREC relevance assessments, retrieved visits will be pooled based on the top N documents for each run of each participating group, where N is a number that will yield a pool of about 300-400 documents for assessment. The test collection will contain 35 topics.

The relevance assessment process will proceed similar to the typical TREC approach. Retrieved documents will be assessed by relevance judges who have clinical backgrounds. They will assess for each topic whether a visit is definitely relevant (patient would meet the criteria to be a subject in a clinical study), possibly relevant (patient might meet the criteria to be a subject in a clinical study), or not relevant (patient would not meet the criteria to be a subject in a clinical study). We will ideally have one person perform all the relevance assessments for a given topic.

I have had the opportunity to be involved in leading a number of IR challenge evaluations over the years, not only in genomics, but also devoted to interactive IR [6] as well as retrieval of medical images [7]. The TREC Medical Records Track is very timely given the growing interest in leveraging the large ongoing investment in EHRs and working toward a learning health system.


1. Friedman, C., Wong, A., et al. (2010). Achieving a nationwide learning health system. Science Translational Medicine, 2(57): 57cm29.
2. Safran, C., Bloomrosen, M., et al. (2007). Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. Journal of the American Medical Informatics Association, 14: 1-9.
3. Murray, R. and McElwee, N. (2010). Comparative effectiveness research: critically intertwined with health care reform and the future of biomedical innovation. Archives of Internal Medicine, 170: 596-599.
4. Voorhees, E. and Harman, D., eds. (2005). TREC:  Experiment and Evaluation in Information Retrieval. Cambridge, MA. MIT Press.
5. Hersh, W. and Voorhees, E. (2009). TREC genomics special issue overview. Information Retrieval, 12: 1-15.
6. Hersh, W. (2001). Interactivity at the Text Retrieval Conference (TREC). Information Processing and Management, 37: 365-366.
7. Hersh, W., Müller, H., et al. (2009). The ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging, 22: 648-655.