CometCloudCare (C3): Distributed Machine LearningPlatform-as-a-Service with Privacy Preservation

The growth of data sharing initiatives in neuroscience and genomics [14, 16, 19, 25] represents an exciting opportunity to confront the “small N ” problem plaguing contemporary studies [20]. When possible, open data sharing provides the greatest benefit. However some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. Additionally, some researchers have expressed doubts about the efficiency and scalability of centralized data storage and analysis for large volume datasets [18]. In response, distributed cloud solutions have been suggested [23]; however, the task of transferring large volumes of imaging data (processed or unprocessed) to and from the cloud is far from trivial. More worrisome than the challenges of data transfer and storage is the tendency for labs to collect, label, and maintain neuroimaging data in idiosyncratic ways. Developing standardized data collection and storage is a recent trend [26], and achieving such a standard may take years, or may never happen at all.

[1]  Tim Kraska,et al.  MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Oluwasanmi Koyejo,et al.  Toward open sharing of task-based fMRI data: the OpenfMRI project , 2013, Front. Neuroinform..

[3]  Bruce Fischl,et al.  A technique for the de-identification of structural brain MR images , 2004 .

[4]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[5]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[6]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[7]  Zhenqi Huang,et al.  Differentially Private Distributed Optimization , 2014, ICDCN.

[8]  Anand D. Sarwate,et al.  NEUROINFORMATICS Sharing privacy-sensitive access to neuroimaging and genetics data : a review and preliminary validation , 2018 .

[9]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[10]  Owen Carmichael,et al.  Standardization of analysis sets for reporting results from ADNI MRI data , 2013, Alzheimer's & Dementia.

[11]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[12]  Satrajit S. Ghosh,et al.  Data sharing in neuroimaging research , 2012, Front. Neuroinform..

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Tristan Glatard,et al.  CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research , 2014, Front. Neuroinform..

[15]  Seungjin Choi,et al.  Algorithms for orthogonal nonnegative matrix factorization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[16]  Bharat B. Biswal,et al.  Making data sharing work: The FCP/INDI experience , 2013, NeuroImage.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  M. Tobin,et al.  DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data , 2010, International journal of epidemiology.

[19]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[20]  Xiaoqian Jiang,et al.  Differentially private distributed logistic regression using private and public data , 2014, BMC Medical Genomics.

[21]  Melissa A. Basford,et al.  Ethical and practical challenges of sharing data from genome-wide association studies: the eMERGE Consortium experience. , 2011, Genome research.

[22]  Rahul Singh,et al.  Data-Driven Workflows in Multi-cloud Marketplaces , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[23]  G. Pearlson Multisite collaborations and large databases in psychiatric neuroimaging: advantages, problems, and challenges. , 2009, Schizophrenia bulletin.

[24]  Zhen Li,et al.  Comet: a scalable coordination space for decentralized distributed environments , 2005, Second International Workshop on Hot Topics in Peer-to-Peer Systems.

[25]  Yu Xie,et al.  Federated Computing for the Masses--Aggregating Resources to Tackle Large-Scale Engineering Problems , 2014, Computing in Science & Engineering.