Cardea: An Open Automated Machine Learning Framework for Electronic Health Records

An estimated 180 papers focusing on deep learning and EHR were published between 2010 and 2018. Despite the common workflow structure appearing in these publications, no trusted and verified software framework exists, forcing researchers to arduously repeat previous work. In this paper, we propose Cardea, an extensible open-source automated machine learning framework encapsulating common prediction problems in the health domain and allows users to build predictive models with their own data. This system relies on two components: Fast Healthcare Interoperability Resources (FHIR) – a standardized data structure for electronic health systems – and several AU TOML frameworks for automated feature engineering, model selection, and tuning. We augment these components with an adaptive data assembler and comprehensive data- and modelauditing capabilities. We demonstrate our framework via 5 prediction tasks on MIMIC-III and KAGGLE datasets, which highlight Cardea’s human competitiveness, flexibility in problem definition, extensive feature generation capability, adaptable automatic data assembler, and its usability.

[1]  Hisashi Kashima,et al.  Simultaneous Modeling of Multiple Diseases for Mortality Prediction in Acute Hospital Care , 2015, KDD.

[2]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[3]  Kalyan Veeramachaneni,et al.  Label, Segment, Featurize: A Cross Domain Framework for Prediction Engineering , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[4]  Jorge Silva,et al.  Multi-Task Learning with Incomplete Data for Healthcare , 2018, ArXiv.

[5]  Jimeng Sun,et al.  Clinical Predictive Modeling Development and Deployment through FHIR Web Services , 2015, AMIA.

[6]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[7]  Jimeng Sun,et al.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review , 2018, J. Am. Medical Informatics Assoc..

[8]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[9]  G. Escobar,et al.  Length of Stay Predictions: Improvements Through the Use of Automated Laboratory and Comorbidity Variables , 2010, Medical care.

[10]  Aditya Tiwari,et al.  Automatic Classification of Critical Findings in Radiology Reports , 2017, MIH@KDD.

[11]  Pier Luigi Lopalco,et al.  Unplanned readmissions within 30 days after discharge: improving quality through easy prediction. , 2017, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[12]  Marzyeh Ghassemi,et al.  MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III , 2019, CHIL.

[13]  Joseph Futoma,et al.  A comparison of models for predicting early hospital readmissions , 2015, J. Biomed. Informatics.

[14]  Paul G. Biondich,et al.  Towards Standardized Patient Data Exchange: Integrating a FHIR Based API for the Open Medical Record System , 2015, MedInfo.

[15]  Dimitra I. Kaklamani,et al.  Using FHIR to develop a healthcare mobile application , 2014, 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH).

[16]  Kalyan Veeramachaneni,et al.  The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development , 2019, SIGMOD Conference.

[17]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[18]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[19]  M. Woodward,et al.  Risk prediction models: II. External validation, model updating, and impact assessment , 2012, Heart.

[20]  Alvaro Riascos,et al.  Predicting Annual Length-Of-Stay and its Impact on Health , 2017, MIH@KDD.

[21]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[22]  Faisal Farooq,et al.  A Robust Framework for Accelerated Outcome-driven Risk Factor Identification from EHR , 2019, KDD.

[23]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[24]  Joon Lee,et al.  Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric , 2015, PloS one.

[25]  Harini Suresh,et al.  Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU , 2018, KDD.

[26]  Shannon L. Harris,et al.  Modeling Patient No-Show History and Predicting Future Outpatient Appointment Behavior in the Veterans Health Administration. , 2017, Military medicine.

[27]  Jandyra Maria Guimarães Fachel,et al.  Mortality prediction model using data from the Hospital Information System. , 2010, Revista de saude publica.

[28]  Kenneth D. Mandl,et al.  SMART on FHIR: a standards-based, interoperable apps platform for electronic health records , 2016, J. Am. Medical Informatics Assoc..

[29]  May D. Wang,et al.  Intelligent mortality reporting with FHIR , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[30]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[31]  Zoubin Ghahramani Automating machine learning , 2016 .

[32]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[33]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.