Abstract
Introduction
Methods
Discussion
References
Abstract
Background
Machine learning is increasingly used to predict healthcare outcomes, including cost, utilization, and quality.
Objective
We provide a high-level overview of machine learning for healthcare outcomes researchers and decision makers.
Methods
We introduce key concepts for understanding the application of machine learning methods to healthcare outcomes research. We first describe current standards to rigorously learn an estimator, which is an algorithm developed through machine learning to predict a particular outcome. We include steps for data preparation, estimator family selection, parameter learning, regularization, and evaluation. We then compare 3 of the most common machine learning methods: (1) decision tree methods that can be useful for identifying how different subpopulations experience different risks for an outcome; (2) deep learning methods that can identify complex nonlinear patterns or interactions between variables predictive of an outcome; and (3) ensemble methods that can improve predictive performance by combining multiple machine learning methods.
Results
We demonstrate the application of common machine methods to a simulated insurance claims dataset. We specifically include statistical code in R and Python for the development and evaluation of estimators for predicting which patients are at heightened risk for hospitalization from ambulatory care-sensitive conditions.
Conclusions
Outcomes researchers should be aware of key standards for rigorously evaluating an estimator developed through machine learning approaches. Although multiple methods use machine learning concepts, different approaches are best suited for different research problems.
Introduction
Machine learning is a rapidly growing field that attempts to extract general concepts from large datasets, commonly in the form of an algorithm that predicts an outcome (commonly referred to as a predictive model or estimator)—a task that has become increasingly difficult to accomplish by humans because data volume and complexity has increased beyond what was capable with traditional statistics and desktop computers. Recently, machine learning has been used to predict healthcare outcomes including cost, utilization, and quality; for example, machine learning methods have been used to predict “cost bloomers,” or patients who move from a lower to the highest decile of per capita healthcare expenditures.1 Machine learning has also been used to predict which patients are most likely to experience a hospital re-admission for congestive heart failure and related conditions.2 Although causal research identifies what factors cause healthcare outcomes, machine learning will inter alia use these factors to identify which patients will have these outcomes. Because machine learning remains an emerging field and its application to healthcare outcomes research is also nascent, we provide a high-level overview of key concepts and best practices in machine learning for practitioners and readers of healthcare outcomes research. We describe the steps of data preparation, estimator family selection, parameter learning, regularization, and evaluation. We then compare 3 of the most common machine learning methods: (1) decision tree methods that can be useful for identifying how different subpopulations experience different risks for an outcome; (2) deep learning methods that can identify complex non-linear patterns or interactions between variables predictive of an outcome; and (3) ensemble methods that can improve predictive performance by combining multiple machine learning methods.