Abstract
۱٫ Introduction
۲٫ Methods
۳٫ Automated feature engineering
۴٫ Hyperparameter optimization
۵٫ Pipeline optimizers
۶٫ Neural architecture search
۷٫ Automated machine learning in healthcare
۸٫ Conclusion
Declaration of Competing Interest
Acknoledgement
Appendix A. Supplementary data
References
Abstract
Objective: This work aims to provide a review of the existing literature in the field of automated machine learning (AutoML) to help healthcare professionals better utilize machine learning models “off-the-shelf” with limited data science expertise. We also identify the potential opportunities and barriers to using AutoML in healthcare, as well as existing applications of AutoML in healthcare. Methods: Published papers, accompanied with code, describing work in the field of AutoML from both a computer science perspective or a biomedical informatics perspective were reviewed. We also provide a short summary of a series of AutoML challenges hosted by ChaLearn. Results: A review of 101 papers in the field of AutoML revealed that these automated techniques can match or improve upon expert human performance in certain machine learning tasks, often in a shorter amount of time. The main limitation of AutoML at this point is the ability to get these systems to work efficiently on a large scale, i.e. beyond small- and medium-size retrospective datasets. Discussion: The utilization of machine learning techniques has the demonstrated potential to improve health outcomes, cut healthcare costs, and advance clinical research. However, most hospitals are not currently deploying machine learning solutions. One reason for this is that health care professionals often lack the machine learning expertise that is necessary to build a successful model, deploy it in production, and integrate it with the clinical workflow. In order to make machine learning techniques easier to apply and to reduce the demand for human experts, automated machine learning (AutoML) has emerged as a growing field that seeks to automatically select, compose, and parametrize machine learning models, so as to achieve optimal performance on a given task and/or dataset. Conclusion: While there have already been some use cases of AutoML in the healthcare field, more work needs to be done in order for there to be widespread adoption of AutoML in healthcare.
Introduction
The extensive collection of health data through electronic health records (EHRs), genomic sequencing, and digital health wearables has led to an exponentially growing amount of biomedical “big data” [۱–۳]. The amount of digital information available to clinicians is becoming simply too much to process: within the timespan of 20−۴۰ min that are generally assigned per visit, it is virtually impossible to review 80+ megabytes (equivalent to 20,000+ pages of free text) worth of patient data captured in the average individual EHR [4]. Machine learning, and more recently deep learning, are key techniques that have demonstrated the ability to translate these large health datasets into actionable knowledge. In general, the use of machine learning models could improve patient safety [5–۷], improve quality of care [8–۱۰], and reduce healthcare costs [11–۱۳]. Specifically, machine learning has the capability to augment the work of clinicians by processing the billions of patient data points that are stored in EHRs, and it has been successfully applied in many clinical applications already, such as identifying patients at high risk of being transferred to the ICU [14], diagnosing respiratory conditions from chest X-rays [15], detecting early signals of lung cancer [16], and detecting fraudulent and abusive health insurance claims [17].