An Architecture for Automatic Deployment of Brown Dog Services at Scale into Diverse Computing Infrastructures

Brown Dog is an extensible data cyberinfrastructure, that provides a set of extensible and distributed data conversion and metadata extraction services to enable access and search within unstructured, un-curated and inaccessible research data across different domains of sciences and social science, which ultimately aids in supporting reproducibility of results. We envision that Brown Dog, as a data cyberinfrastructure, is an essential service in a comprehensive cyberinfrastructure which includes data services, high performance computing services and more that would enable scholarly research in a variety of disciplines that today is not yet possible. Brown Dog focuses on four initial use cases, specifically, addressing the conversion and extraction needs in the research areas of ecology, civil and environmental engineering, library and information science, and use by the general public. In this paper, we describe an architecture that supports contribution of data transformation tools from users, and automatic deployment of the tools as Brown Dog services in diverse infrastructures such as cloud or high performance computing (HPC) based on user demands and load on the system. We also present results validating the performance of the initial implementation of Brown Dog.

[1]  Joe Futrelle,et al.  Medici : A Scalable Multimedia Environment for Research , 2011 .

[2]  Gerhard Klimeck,et al.  nanoHUB.org: Advancing Education and Research in Nanotechnology , 2008, Computing in Science & Engineering.

[3]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[4]  Jefferson R. Heard,et al.  A system for scalable visualization of geographic archival records , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[5]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[6]  Inna Kouper,et al.  Towards Sustainable Curation and Preservation: The SEAD Project's Data Services Approach , 2015, 2015 IEEE 11th International Conference on e-Science.

[7]  Luigi Marini,et al.  The VAT: enhanced video analysis , 2015, XSEDE.

[8]  Peter Bajcsy,et al.  Towards a Universal, Quantifiable, and Scalable File Format Converter , 2009, 2009 Fifth IEEE International Conference on e-Science.

[9]  R. Manmatha,et al.  Word spotting for historical documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[10]  Rob Kooper,et al.  On improving the communication between models and data. , 2013, Plant, cell & environment.

[11]  Steve Kelling,et al.  Participatory design of DataONE - Enabling cyberinfrastructure for the biological and environmental sciences , 2012, Ecol. Informatics.

[12]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[13]  Peter Bajcsy,et al.  A Mosaic of Software , 2011, 2011 IEEE Seventh International Conference on eScience.

[14]  Rui Liu,et al.  Brown Dog: Leveraging everything towards autocuration , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[15]  Ewan Klein,et al.  An Extensible Toolkit for Computational Semantics , 2009, IWCS.

[16]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..