Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases

Time-domain astronomy (TDA) is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. For example, the Large Synoptic Survey Telescope (LSST), which will begin operations in northern Chile in 2022, will generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky. The LSST will stream data at rates of 2 Terabytes per hour, effectively capturing an unprecedented movie of the sky. The LSST is expected not only to improve our understanding of time-varying astrophysical objects, but also to reveal a plethora of yet unknown faint and fast-varying phenomena. To cope with a change of paradigm to data-driven astronomy, the fields of astroinformatics and astrostatistics have been created recently. The new data-oriented paradigms for astronomy combine statistics, data mining, knowledge discovery, machine learning and computational intelligence, in order to provide the automated and robust methods needed for the rapid detection and classification of known astrophysical objects as well as the unsupervised characterization of novel phenomena. In this article we present an overview of machine learning and computational intelligence applications to TDA. Future big data challenges and new lines of research in TDA, focusing on the LSST, are identified and discussed from the viewpoint of computational intelligence/machine learning. Interdisciplinary collaboration will be required to cope with the challenges posed by the deluge of astronomical data coming from the LSST.

[1]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[2]  Adam A. Miller,et al.  ACTIVE LEARNING TO OVERCOME SAMPLE SELECTION BIAS: APPLICATION TO PHOTOMETRIC VARIABLE STAR CLASSIFICATION , 2011, 1106.2832.

[3]  Manuel Hernandez-Pajares,et al.  Classification of the Hipparcos input catalogue using the Kohonen network , 1994 .

[4]  J. Scargle Studies in astronomical time series analysis. II - Statistical aspects of spectral analysis of unevenly spaced data , 1982 .

[5]  D. Alter PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC , 2016 .

[6]  Pavlos Protopapas,et al.  QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database , 2011, 1101.3316.

[7]  Donald W. Sweeney,et al.  LSST Science Book, Version 2.0 , 2009, 0912.0201.

[8]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[9]  L. Milano,et al.  Spectral analysis of stellar light curves by means of neural networks , 1999 .

[10]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[11]  J. Gunn,et al.  The Sloan Digital Sky Survey , 1994, astro-ph/9412080.

[12]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[13]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[14]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  L. M. Sarro,et al.  Automated supervised classification of variable stars - I. Methodology , 2007, 0711.0703.

[16]  G. Jogesh Babu,et al.  Big data in astronomy , 2012 .

[17]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[18]  Ciro Donalek,et al.  A comparison of period finding algorithms , 2013, 1307.2209.

[19]  Laurent Eyer,et al.  Variable stars across the observational HR diagram , 2007, 0712.3797.

[20]  Pavlos Protopapas,et al.  Kernels for Periodic Time Series Arising in Astronomy , 2009, ECML/PKDD.

[21]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[22]  Kirk D. Borne Astroinformatics: data-oriented astronomy research and education , 2010, Earth Sci. Informatics.

[23]  Pedro Trancoso,et al.  Trends in High-Performance Computing , 2011, Computing in Science & Engineering.

[24]  Ciro Donalek,et al.  Data challenges of time domain astronomy , 2012, Distributed and Parallel Databases.

[25]  Kanishka Bhaduri,et al.  Parallel and Distributed Data Mining for Astronomy Applications , 2012 .

[26]  Pavlos Protopapas,et al.  QUASI-STELLAR OBJECT SELECTION ALGORITHM USING TIME VARIABILITY AND MACHINE LEARNING: SELECTION OF 1620 QUASI-STELLAR OBJECT CANDIDATES FROM MACHO LARGE MAGELLANIC CLOUD DATABASE , 2011 .

[27]  Chad M. Schafer,et al.  Semi-supervised learning for photometric supernova classification★ , 2011, 1103.6034.

[28]  Johan A. K. Suykens,et al.  Kernel spectral clustering of time series in the CoRoT exoplanet database , 2011 .

[29]  L. M. Sarro,et al.  Comparative clustering analysis of variable stars in the Hipparcos, OGLE Large Magellanic Cloud, and CoRoT exoplanet databases , 2009, 0906.0304.

[30]  P. Dubath,et al.  Random forest automated supervised classification of Hipparcos periodic variable stars , 2011, 1101.2406.

[31]  S. G. Djorgovski,et al.  Feature selection strategies for classifying high dimensional astronomical data sets , 2013, 2013 IEEE International Conference on Big Data.

[32]  William B. March,et al.  Multitree Algorithms for Large-Scale Astrostatistics , 2012 .

[33]  A. C. Fabian Serendipity in Astronomy , 2009 .

[34]  Mario Hamuy,et al.  Core-Collapse Supernovae as Standard Candles , 2011 .

[35]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[36]  P. Tisserand,et al.  The EROS2 search for microlensing events towards the spiral arms: the complete seven season results , 2009, 0901.1325.

[37]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[38]  J. Richards,et al.  ON MACHINE-LEARNED CLASSIFICATION OF VARIABLE STARS WITH SPARSE AND NOISY TIME-SERIES DATA , 2011, 1101.1959.

[39]  A. J. Drake,et al.  The MACHO Project: Microlensing Results from 5.7 Years of Large Magellanic Cloud Observations , 2000, astro-ph/0001272.

[40]  John Rice,et al.  Classification of Poorly Time Sampled Light Curves of Periodic Variable Stars , 2012 .

[41]  Alistair R. Walker Distances to Local Group Galaxies , 2003 .

[42]  A. Udalski,et al.  Optical Gravitational Lensing Experiment. OGLE-2 -- the Second Phase of the OGLE Project , 1997 .

[43]  Robert Jedicke,et al.  Pan-STARRS: A Large Synoptic Survey Telescope Array , 2002, SPIE Astronomical Telescopes + Instrumentation.

[44]  S. Djorgovski,et al.  Using conditional entropy to identify periodicity , 2013, 1306.6664.

[45]  Pavlos Protopapas,et al.  Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars , 2009, arXiv.org.

[46]  G. Bruce Berriman,et al.  The Application of Cloud Computing to Astronomy: A Study of Cost and Performance , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.

[47]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[48]  Pavlos Protopapas,et al.  A NOVEL, FULLY AUTOMATED PIPELINE FOR PERIOD ESTIMATION IN THE EROS 2 DATA SET , 2014, ArXiv.

[49]  K. Borne Virtual Observatories, Data Mining, and Astroinformatics , 2013 .

[50]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[51]  Magdalena Balazinska,et al.  Astronomy in the Cloud: Using MapReduce for Image Co-Addition , 2010, ArXiv.

[52]  Kirk Borne,et al.  Future Sky Surveys: New Discovery Frontiers , 2012 .

[53]  Pavlos Protopapas,et al.  AUTOMATIC CLASSIFICATION OF VARIABLE STARS IN CATALOGS WITH MISSING DATA , 2013, ArXiv.

[54]  B. Shylaja,et al.  Stellar masses , 2002 .

[55]  Pavlos Protopapas,et al.  Finding anomalous periodic time series , 2009, Machine Learning.

[56]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs , 2010, 2010 IEEE International Conference on Data Mining.

[57]  Richard G. West,et al.  The automated classification of astronomical light curves using Kohonen self-organizing maps , 2004 .

[58]  Jonas Debosscher,et al.  Improved methodology for the automated classification of periodic variable stars: Automated classification of periodic variable stars , 2011 .

[59]  Robert C. Nichol,et al.  The three-point correlation function of luminous red galaxies in the Sloan Digital Sky Survey , 2007, astro-ph/0703340.

[60]  Pavlos Protopapas,et al.  An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves , 2012, IEEE Transactions on Signal Processing.

[61]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[62]  John R. Percy Understanding Variable Stars , 2007 .

[63]  E. Glikman,et al.  Some Pattern Recognition Challenges in Data-Intensive Astronomy , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[64]  Danielle Alloin,et al.  Stellar candles for the extragalactic distance scale , 2003 .

[65]  Pavel Nikolaevich Kholopov,et al.  General catalogue of variable stars. Vol.5 , 1996 .

[66]  S. G. Djorgovski,et al.  Towards an Automated Classification of Transient Events in Synoptic Sky Surveys , 2011, CIDU.

[67]  R. M. Deeley Variable Stars , 1916, Nature.

[68]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.