Monday, December 31, 2018

Annual Reflections at the End of 2018

I always use my last posting of the year to this blog to reflect on the year past. As I have noted each year, this blog started at the time of a major transformation for the informatics field, namely the Health Information Technology for Economic and Clinical Health (HITECH) Act. Now, almost 10 years later (10-year anniversary post coming in March!), the era of “meaningful use” is drawing to a close.

The year 2018 was a year of milestones and gratitude for me. I celebrated my 60th birthday, grateful for my personal health and well-being. My immediate family, my wife and two children, are also doing very well in their personal as well as professional lives. This year also marked the 15th year of the Department of Medical Informatics & Clinical Epidemiology (DMICE) in the School of Medicine at Oregon Health & Science University (OHSU), the department of which I have served as its one and only Chair. With DMICE, I am grateful not only of my own academic success but also providing an environment for faculty, students, and staff to achieve their accomplishments and gratitude.

Another milestone for 2018 was my 28th year at OHSU. It is somewhat uncommon these days for a high-profile academic to spend a whole career at a single institution. I have certainly been asked to look at other jobs over the years, as most academics always are, but nothing has ever appealed to me enough to consider leaving not only OHSU, but also Portland, Oregon. Since the Biomedical Information Communication Center (BICC) Building opened in 1991, I have had only two offices, and have been in my current one for over 20 years.

I am happy to report that despite my relatively static work location, I have changed and grown in place. In academia, like work in almost every other knowledge field, one must evolve their knowledge and skills with the evolution of their field. I am grateful that my job has afforded me the ability to grow professionally and intellectually. In fact, there are few things more exciting than being immersed in the field as new ideas and technologies emerge. A decade ago it was the emergence of the value of the electronic health record (EHR); today it is the growth of data and how we can put it to good use, such as via machine learning. But just as we learned with EHR adoption during the HITECH Act, implementing technology, especially in healthcare, does not always go according to plan. While the emergence of machine learning is exciting, it will be interesting to see how it will impact day-to-day medical practice.

Life does not last forever, but as long as I continue to enjoy my work and do it competently, I certainly have no plans to stop. It will also be interesting to see what new advances come down the pike in informatics, some of which we might be able to predict but others that will emerge out of nowhere.

Wednesday, December 19, 2018

Preserving What We Hold Dear About the Internet?

Hardly another day goes by without some explosive report in the news about modern Internet platforms and their adverse effects on our personal lives or on our political or economic systems. But with our personal and professionals lives so deeply intertwined with them, going “off the grid” is hardly an answer. How do we preserve what is good about our networked lives while trying to identify and eliminate the bad? I do not have answers but hope to raise discussion on the question.

Even though I am way too old to be a “digital native,” computers and the Internet have played a large role in my personal and professional life for several decades. I received my first Internet email address in the late 1980s as a postdoctoral fellow. I often tell the story of my jaw dropping the first time I saw the graphical Web browser, NCSA Mosiac, in 1992. While I had read articles about this new World Wide Web, I was initially skeptical because I could not envision the Internet of the time being able to support interaction (e.g., downloading and rendering Web pages) in real-time. But seeing Mosaic made me instantly realize how transformative the Web would be. Fastforwarding a few years, with the emergence of Google, I sometimes joke that my life would be very different had I come up with the idea of ranking Web search output by links in my own information retrieval research. At the end of the decade, a seemingly minor decision to put my course online in 1999 led to a major transformation of my career into a passion for educational technology. Now in modern times, my personal life has fused with Facebook, in which I can easily share parts of my life with family, friends, and colleagues. In addition, most of my teaching is online, I enjoy sharing running routes with fellow runners, and the ubiquitous worldwide reach of cellular and wifi makes travel and just about everything else much easier.

But clearly there are downsides to the Internet and the proliferation of all of our computational devices, including all of our data they hold, that are connected to it. The biggest current news, of course, is the manipulation of social media and search engines by the Russian government. Right behind that are other concerns about the business practices of Facebook and how they selectively share our data, especially with certain business partners. There are also concerns about the ease by which hate groups disseminate content to their followers, for example on YouTube and Twitter. Another worry is the growing commerce monopoly of Amazon, despite the fact many of us find it so convenient for many things we need. There is also the growing concern about what is done with the detailed digital activities of ours that are tracked and used, sometimes for good but other times not.

The solutions to these problems are not easy. Sure, we can try to maintain a balance between our real and virtual lives. We can consider more regulation of these platforms, but I get nervous when we discuss regulating free speech. The question is how to discern between freedom of expression versus not allowing manipulation of news and elections by “bots” and other approaches. Education is certainly important, making sure the general population understands how these platforms work and how they can be used to manipulate public and political opinion. There is also the question of how to economically regulate these platforms that achieve monopoly status. There is no question that these issues will attract further attention from the new media, lawmakers, and others going forward.

Sunday, December 16, 2018

Kudos for the Informatics Professor - Fall 2018 Update

I had a busy summer and fall of 2018, with a number of talks and publications.

In September, I had the opportunity to be interviewed by The Jefferson Exchange, which is part of Jefferson Public Radio in southern Oregon (direct link to MP3 of interview).

I was also featured for the work I have contributed for over 10 years in partnership with the Emergency Medicine Informatics Section of the American College of Emergency Physicians (ACEP) to deliver a version of the 10x10 course. It was another successful year in general for the 10x10 course, with a total of 2517 people having completed the OHSU offering since 2005.

I was very busy at this year's American Medical Informatics Association (AMIA) Annual Symposium. I was among the 130 people inducted in the inaugural class of Fellows of AMIA (FAMIA). I also had a chance to describe our educational program at the Learning Showcase. In addition, I participated in a panel with three other academic colleagues entitled, Collaborative Science Within Academic Medical Centers: Opportunities and Challenges for Informatics.

I also had the opportunity to present OHSU Department of Medicine Grand Rounds on December 11, 2018, delivering the topic, Update in Clinical Informatics: Machine Learning, Interoperability, and Professional Opportunities (video and slides available).

Earlier in the year, I joined colleague Robert Hoyt, MD as a Co-Editor of the textbook, Health Informatics: Practical Guide (7th Edition), which is available both in print and eBook formats. The book is also available for the Amazon Kindle, as are other books of mine.

Also during this time period, I published a paper in the new AMIA journal, JAMIA Open.

Tuesday, December 11, 2018

Response to NIH RFI: Proposed Provisions for a Draft Data Management and Sharing Policy for NIH Funded or Supported Research

Earlier this year, I submitted a response (and posted it in this blog) to a National Institutes of Health (NIH) Request for Information (RFI) on a draft of their Strategic Plan for Data Science. My main concern was that while there was nothing in the report I did not agree with, I believed there needed to be more attention to the science of data science.

In October, the NIH released another RFI, this one entitled, Proposed Provisions for a Draft Data Management and Sharing Policy for NIH Funded or Supported Research. Similar to the Strategic Plan for Data Science, most of what is in this draft plan is reasonable in my opinion. But what concerns me more is, similar to the earlier RFI, what is left out.

My main concerns have to do with the definition and use of “scientific data.” Early on, the plan defines “scientific data” as “the recorded factual material commonly accepted in the scientific community as necessary to validate and replicate research findings including, but not limited to, data used to support scholarly publications.” The draft further notes that “scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens. For the purposes of a possible Policy, scientific data may include certain individual-level and summary or aggregate data, as well as metadata. NIH expects that reasonable efforts should be made to digitize all scientific data.”

The draft report then runs through the various provisions. Among them are:
  • Data Management and Sharing Plans - new requirements to make sure data is FAIR (findable, accessible, interoperable, and reusable)
  • Related Tools, Software and/or Code - documentation of all the tools used to analyze the data, with a preference toward open-source software (or documentation of reasons why open-source software is not used)
  • Standards - what standards, including data formats, data identifiers, definitions, and other data documentation, are employed
  • Data Preservation and Access - processes and descriptions for how data is preserved and made available for access
  • Timelines - for access, including whether any is held back to allow publication(s) by those who collect it
  • Data Sharing Agreements, Licensing, and Intellectual Property - which of these are used and how so
All of the above are reasonable. However, my main concern is what appears to be a relatively narrow scope of what constitutes scientific data. As such, what follows is what I submitted in my comments to the draft policy. (These comments were also incorporated into a larger response by the Clinical Data to Health [CD2H] Project, of which I am part.)

The definition of scientific data implies that such data is only that which is collected in active experimentation or observation. This ignores the increasing amount of scientific research that does not come from experiments, but rather is derived from real-world measurements of health and disease. This includes everything from data routinely collected by mobile or wearable devices to social media to the electronic health record (EHR). A growing amount of research analyzes and makes inferences using such data.

It could be argued that this sort of data derived “from the wild” should adhere to the provisions above. However, this data is also highly personal and usually highly private. Would you or I want our raw EHR in a data repository? Perhaps connected to our genome data? But if such data are not accessible at all, then the chances for reproducibility are slim.

There is also another twist on this, which concerns data used for informatics research. In a good deal of informatics research, such as the patient cohort retrieval work I do in my own research [1], we use raw, identifiable EHR data. We then proceed to evaluate the performance of our systems and algorithms with this data. Obviously we want this research to be reproducible as well.

There are solutions to these problems, such as Evaluation as a Service [2] approaches that protect such data and allow researchers to send their systems to the data in walled-off containers and receive aggregate results. Maybe the approach in this instance would be to maintain encrypted snapshots of the data that could be unencrypted in highly controlled circumstances.

In any case, the NIH Data Management and Sharing Policy for NIH Funded or Supported Research is a great starting point but should take a broader view of scientific data and develop policies to insure research is reproducible. Research done with data that does not originate as scientific data should be accounted for, including when that data is used for informatics research.


1. Wu, S, Liu, S, et al. (2017). Intra-institutional EHR collections for patient-level information retrieval. Journal of the American Society for Information Science & Technology. 68: 2636-2648.
2. Hanbury, A, Müller, H, et al. (2015). Evaluation-as-a-service: overview and outlook. arXiv:1512.07454.