The so-called big data revolution, which has perhaps eclipsed the term cloud in ubiquity and vagueness, arguably started with the open sourcing of Hadoop by Yahoo! in 2009. This is also the same year Cerner unknowingly stumbled into the big data space.
At the time, a small group of us were building a semantic search system that had significant computational requirements, a large percentage of which was attributed to a natural language processing engine schooled in the vocabulary of medicine. Though early in its maturity, Hadoop appeared to be a cost-effective and scalable processing platform, especially since most of our workloads were batch-oriented and inherently parallelizable.
Cerner entered the big data stage through nontraditional engineering channels. However, it turns out that those engineering decisions paid substantial dividends, as early successes in the search project led to increased confidence in the technology and a natural expansion of data-driven use cases. Today, we have an emerging data hub with several thousand servers running the alphabet soup of Hadoop technologies.
One of our data-driven use cases is centered on preventing the readmission of patients within 30 days of being discharged from a hospital. The Centers for Medicare and Medicaid Services will deny reimbursement to a health system if a person reenters the hospital inside a 30-day window. This means that care providers must operate proactively and take some sort of preventative action for those patients at highest risk of readmission.
So, how can we use data to identify those patients at risk? In this case, our data scientists made use of supervised machine learning techniques to train a predictive model comprised of more than 700 features derived from clinical, financial and operational data. Developed in collaboration with Advocate Health Care, this model has been shown to outperform existing industry models by 15 to 20 percent.
Another data-driven use case that resonates well is our work in predicting the onset of sepsis. Sepsis is an elusive bloodstream infection that can become fatal if not detected early enough. In fact, the chances of death are nearly 50 perfect once the infection reaches a severe stage. To make matters worse, sepsis can be difficult to diagnosis during early stages because symptoms are often similar to common medical conditions.
To detect the early onset of sepsis, Cerner developed a predictive model based on a highly tuned decision tree classifier. This model is deployed in a cloud-hosted production system that actively monitors more than 1 million lives on a daily basis. And, essential to overall effectiveness, alerts generated by this system are weaved into the daily workflow of care providers, enabling them to take immediate action upon detection. The results have been nothing less than remarkable: one client alone stated that nearly 2,700 lives have been saved since activating sepsis monitoring.
An important and necessary journey of trial and error unfolds when an organization decides to embark on being a data-driven culture. Fortunately, the ecosystem surrounding Hadoop has matured appreciably since 2009 and is rapidly becoming much more enterprise-ready. However, it turns out that the data itself is usually the bigger challenge, especially when you start measuring things on the order of petabytes. Invariably, data is almost always messier than one might think, so cleansing and normalization become essential steps prior to applying any sort of reasoning at scale.
As big data systems continue to mature, the health care industry will see a pronounced uptick in adoption combined with a shift toward more intelligent, data-driven solutions. Arguably, one could say that such an inflection point on the proverbial hockey stick is already playing out. Even so, the pace of innovation has never been as fierce as the open source community rapidly churns out more sophisticated tools. Looking back, Cerner was fortunate to have dipped our toe in these waters at such an early stage, leading to the accrual of deep engineering and operations experience. Very soon, the pipeline of ideas for which big data technologies can be applied, including those encompassing predictive analytics and data-driven workflows, will be limited only by our capacity and imagination.