Monday, January 25, 2016

Biomedical Data Science Needs Measures of Information Density and Value

I wrote recently that one of my concerns for data science is the Big Data over-emphasis on one of its four Vs, namely volume. Since then, I was emailing with Dr. Shaun Grannis and other colleagues from the Indiana Health Information Exchange (IHIE). I asked them about size of their data for near 6 billion clinical observations from the 17 million patients in their system. I was somewhat surprised to hear that the structured data only takes up 26 terabytes. I joked that I almost have that much disk storage lying around my office and home. That is a huge amount of data, but some in data science seem to imply that data sizes that do not seem to start with at least “peta-” are somehow not real data science.

Of course, imaging and other binary data add much more to the size of the IHIE data, as will the intermediate products of various processing that are carried out when doing analysis. But it is clear that the information “density” or “value” contained in that 26 terabytes is probably much higher than a comparable amount of binary (e.g., imaging, genome, etc.) data. This leads me to wonder whether we should be thinking about how we might measure the density or value of different types of biomedical and health information, especially if we are talking about the Vs of Big Data.

The measurement of information is decades old. Its origin is attributed to Shannon and Weaver from a seminal publication in 1949 [1]. They defined information as the number of forms a message could take. As such, a coin flip has 2 bits of information (heads or tails), a single die has 6 bits, and a letter in the English language has 26 bits. This measure is of course simplistic in that it assumes the value of each form in the message is equal. For this reason, others such as Bar Hillel and Carnap began adding semantics (meaning) that, among other things, allowed differing values for each form [2].

We can certainly think of plenty of biomedical examples where the number of different forms that data can take yields widely divergent value of the information. For example, the human genome contains 3 billion nucleotide pairs, each of which can take 4 forms. Uncompressed, and not accounting for the fact a large proportion is identical across all humans [3], this genome by Shannon and Weaver’s measure would have 12 billion bits of information. The real picture of human genomic variation is more complex (such as through copy number variations), and the point is that there is less information density in the huge amount of data in a genome than in, say, a short clinical fact, such as a physical exam finding or a diagnosis.

By the same token, images also have different information density than clinical facts. This is especially so as the resolution of digital images continues to increase. There is certainly value in higher-resolution images, but there are also diminishing returns in terms of the information value. Doubling or quintupling or any other increase of pixels or their depth will create more information as measured by Shannon and Weaver’s formula but not necessarily provide more value of that information.

Even clinical data may have diminishing returns based on its size. Some interesting work from OHSU faculty Nicole Weiskopf and colleagues demonstrates an obvious finding but one that has numerous implications for secondary use of clinical data, which is that sicker patients have more data in the electronic health record (EHR) [4-5]. The importance of this is that sicker patients may be “oversampled” in clinical data sets and thus skew secondary analysis by over-representing patients who have received more healthcare.

There are a number of implications for increasing volumes of data that we must take into consideration, especially when using such data for purposes for which it was not collected. This is probably true for any Big Data endeavor, where the data may be biased by the frequency and depth of its measuring. The EHR in particular is not a continuous sampling of a patient’s course, but rather represents periods of sampling that course. With the EHR there is also the challenge that different individual clinicians collect and enter data differently.

Another implication of data volumes is its impact on statistical significance testing. This is one form of what many criticize in science as “p-hacking,” where researchers modify the presentation of their data in order to achieve a certain value for the p statistic that measures the likelihood that differences are not due to chance [6]. Most researchers are well aware that their samples must be of sufficient size in order to achieve the statistical power to attain a significant difference. However, on the flip side, it is very easy to obtain a p value that shows small, perhaps meaningless, differences are statistically significant when one has very large quantities of data.

The bottom line is that as we think about using data science, certainly in biomedicine and health, and the development of information systems to store and analyze it, we must consider the value of information. Just because data is big does not mean it is more important than when data is small. Data science needs to focus on all types and sizes of data.


1. Shannon, CE and Weaver, W (1949). The Mathematical Theory of Communication. Urbana, IL, University of Illinois Press.
2/ Bar-Hillel, Y and Carnap, R (1953). Semantic information. British Journal for the Philosophy of Science. 4: 147-157.
3. Abecasis, GR, Auton, A, et al. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature. 491: 56-65.
4. Weiskopf, NG, Rusanov, A, et al. (2013). Sick patients have more data: the non-random completeness of electronic health records. AMIA Annual Symposium Proceedings 2013, Washington, DC. 1472-1477.
5. Rusanov, A, Weiskopf, NG, et al. (2014). Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Medical Informatics & Decision Making. 14: 51.
6. Head, ML, Holman, L, et al. (2015). The extent and consequences of p-hacking in science. PLoS Biology. 13: e1002106.

Monday, January 18, 2016

Meaningful Use Ending? Yes and No

The health information technology (HIT) world was shaken to its core last week by Andy Slavitt, Acting Administrator of the Centers for Medicare & Medicaid Services (CMS), who announced that the CMS Meaningful Use (MU) program was over. More precisely, he stated, “The Meaningful Use program as it has existed, will now be effectively over and replaced with something better.” He tweeted this as well, and it was retweeted by Dr. Karen DeSalvo, Director of the Office of the National Coordinator for Health IT (ONC). A transcript of his comments were posted in The CMS Blog. The health IT media was quick to pick up on his remarks.

Does this mean that eligible professionals and hospitals will no longer need to meet MU criteria to get CMS incentive dollars or avoid penalties. Sort of. Months ago, in one of the few bipartisan moves of the current Congress, passed the Medicare Access and CHIP Reauthorization Act (MACRA) legislation [1]. This legislation is best known as the “doc fix” because it fixed the problem of the old sustainable growth rate (SGR) formula of Medicare that threatened to substantially decrease physician reimbursement under Medicare.

Another part of the MACRA legislation addressed some criticisms of the various Medicare incentive programs, such as their multiple number, i.e., not only MU, but also the Physician Quality Reporting System (PQRS) and the Value-Based Payment Modifier (VM). These will all be rolled into a single Merit-based Incentive Payments (MIPS) program, which will start to assess penalties in a graduated way, from 4% in 2019 up to 9% in 2022. After 2019, CMS will also provide another pathway to incentives via Alternative Payment Models (APMs), such as through accountable care organizations (ACOs).

The MIPS program will consist of four categories of measures (divided among percentages): quality (30%), resource use (30%) , clinical practice improvement activities (15%), and MU of certified EHR technology (25%). The details of these are under development by CMS, but it is clear that within MIPS, MU will be part of what eligible hospitals and eligible professionals will need to achieve to qualify for Medicare incentives and avoid penalties. As of right now, that includes MU, including Stage 3 [2]. What any new approach will look like going forward is not known. Stay tuned!

Some suggestions for improving MU going forward from John Halamka speaking on behalf of about 30 healthcare delivery organizations and Peter Basch and Thomson Kuhn speaking on their own but as leaders from the American College of Physicians.

Both Basch and Kuhn as well as Halamka deem MU Stage 1 a success in terms of achieving widespread adoption, but note it is time to move beyond the functional-use measures of MU. They call for Stage 3 as it is currently planned to be abandoned and also note how the highly persecutive approach stifles innovation by clinicians and boxes in the work of EHE developers. Basch and Kuhn go farther in terms of making recommendations. They call for a reconfiguration of MU within MIPS, with the elimination of functional-use measure thresholds (e.g., 80% of patients with problem lists or use of five clinical decision support rules), judicious use of non-threshold functional-use measures, practical interoperability that allows the delivery of high-quality care, more flexible engagement with patients, more innovative approaches to participating in and measuring quality initiatives. A final call they make is for continuing medical education within the domains of health IT so that physicians (and others) can learn how to deliver the best care using IT.

When writing about the present situation of the HITECH Act, I often harken back to what I wrote when it was first unveiled: "This is a defining moment for the informatics field. Never before has such money and attention been lavished on it. HITECH provides a clear challenge for the field to 'get it right.' It will be interesting to look back on this time in the years ahead and see what worked and did not work. Whatever does happen, it is clear that informatics lives in a HITECH world now." It is now time to move on from HITECH and MU to a more sustaining health IT that meets the needs of the healthcare system going forward.


1. Doherty, RB (2015). Goodbye, sustainable growth rate—hello, merit-based incentive payment system. Annals of Internal Medicine. 163: 138-139.
2. O'Neill, T (2015). Primer: EHR Stage 3 Meaningful Use Requirements. Washington, DC, American Action Forum.

Wednesday, January 13, 2016

A Tale of Childbirth in Two Countries and Some Teachable Moments

As the father of two healthy adult children who were not born in hospitals, as well as being married to a certified nurse midwife (CNM), I take great interest in childbirth, especially its evidence-based aspects. Both of my children were born in out-of-hospital birth centers, where I was the only physician present, and was in no way there in a physician capacity. Both were delivered by CNMs. I always marvel at how pleasant the experience was relative to the eight deliveries I performed as a medical student in an academic hospital in the 1980s.

Two recent studies provide a number of teachable moments concerning evidence-based care. Both studies asked roughly the same question, looking at the risk of perinatal mortality for planned in-hospital vs. out-of-hospital births. One study from Ontario, Canada found no differences in neonatal mortality for planned out-of-hospital births by midwives compared to planned in-hospital births [1]. But another study from Oregon found that there was a higher risk of neonatal mortality in planned out-of-hospital births (3.9 deaths compared to 1.8 deaths per 1000 deliveries) [2]. This study also found a much higher rate of cesarean section (C-section) for in-hospital births (28.1% versus 6.2%). Caesarian sections are associated with a variety of short and longer term patient complications.

What are the teachable moments? Certainly one is that out-of-hospital birth is very safe, especially in a healthcare system when care among midwives and physicians is highly coordinated, as occurs in Canada and most other developed countries. That is not the case in the United States, and it is likely that some complications in Oregon were a result of that lack of coordination. But even in the United States, at least in Oregon, out-of-hospital birth is relatively safe.

Another teachable moment concerns relative versus absolute risk. When stated as a relative risk ratio, the difference in mortality in Oregon was 2.2-fold higher. But neonatal mortality is an extremely rare event. As such, the absolute risk difference between in-hospital and out-of-hospital birth was only 2.1 deaths per 1000 deliveries. This was accompanied by a relative risk ratio for C-section that was 4.5-fold higher and which also had a substantially higher absolute risk difference of 21.9% of all deliveries.

An additional teachable moment concerns some important issues around data. This study was made possible because several years ago, the state of Oregon added a question to birth certificates that asked all women who had an in-hospital delivery, “Did you go into labor planning to deliver at home or at a freestanding birthing center?” This enabled the researchers to determine planned vs. unplanned out-of-hospital births and thus made this study possible. Data does not magically just appear; we have to determine what we want to collect, make plans to collect it, and determine its completeness and validity.

These studies also raise the question of whether the difference in neonatal mortality in Oregon could be ameliorated by better care coordination. The Oregon study also raises the question of whether moving deliveries from hospitals to other settings would reduce the C-section rate. Hopefully these and other questions will be answerable in the future. I certainly hope it may lead to more families being able to experience the pleasant deliveries I was able to have with my children.


1. Hutton, EK, Cappelletti, A, et al. (2015). Outcomes associated with planned place of birth among women with low-risk pregnancies. Canadian Medical Association Journal. Epub ahead of print.
2. Snowden, JM, Tilden, EL, et al. (2015). Planned out-of-hospital birth and birth outcomes. New England Journal of Medicine. 373: 2642-2653.

Saturday, January 9, 2016

My Changing Relationship With Photography

I imagine that like many people, my relationship with photography has changed over the years. In my current life, I do like to capture many aspects of my life with pictures. But my approach to photography has changed over the years.

I was not always a big-time photographer. While I certainly have plenty of pictures of my children as well as major events in my life, I did very little to capture much more beyond that in my earlier adulthood.

That all changed with the advent of digital photography. Even with the early digital cameras that took pictures that were (compared to today) of poorer quality and had extremely limited memory (so you could not store many pictures on the camera), I took to electronic photos. No doubt the convenience and instant gratification of seeing the results right away and no longer having to send film for developing played a huge role.

Despite my proclivity for digital photography, I have never invested in any high-end cameras. I have definitely preferred the convenience of point-and-shoot cameras that were quick to turn on, easy to take pictures, and simple to transfer them to a computer.

Along the way I also for the most part stopped printing pictures. While I do occasionally have a reason to print a photo, for the most part I view my pictures on my computer or mobile devices.

In 2014 I made the plunge into a somewhat higher-end camera, a Sony Alpha 6000. These “mirrorless” cameras supposedly approach digital single-lens reflex (DSLR) cameras in quality, with a more compact size (though not able to fit in a pocket). This particular camera also gave me, for the first time ever, a changeable (zoom) lens. It is definitely a nice camera and takes great pictures, especially when a zoom lens is preferable or lighting conditions are suboptimal.

About the same time, however, I had upgraded my smartphone to an iPhone 6. The newer cameras on these modern smartphones also take excellent pictures, especially in decent light and when not requiring any sort of zoom. Of course a major convenience is that they can be carried in a pocket (and also serve as phone, Internet access device, music player, and more). Another critical benefit of smartphone cameras is the convenience of posting photos on Facebook, which is something I do frequently. I can also carry it when I go running if there is a need for having a camera.

I now take the overwhelming majority of my pictures with my smartphone. The quality of most of them is exemplary, and when married with the convenience of being part of a device that easily fits in my pocket, my Sony camera is reserved mostly for special occasions. Those occasions are still important, so I will likely never part with a camera better than that on my smartphone, although who knows what technology for the latter may develop in the future to change even that.