Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach

Improving survey specifications are causing an exponential rise in pulsar candidate numbers and data volumes. We study the candidate filters used to mitigate these problems during the past fifty years. We find that some existing methods such as applying constraints on the total number of candidates collected per observation, may have detrimental effects on the success of pulsar searches. Those methods immune to such effects are found to be ill-equipped to deal with the problems associated with increasing data volumes and candidate numbers, motivating the development of new approaches. We therefore present a new method designed for on-line operation. It selects promising candidates using a purpose-built tree-based machine learning classifier, the Gaussian Hellinger Very Fast Decision Tree (GH-VFDT), and a new set of features for describing candidates. The features have been chosen so as to i) maximise the separation between candidates arising from noise and those of probable astrophysical origin, and ii) be as survey-independent as possible. Using these features our new approach can process millions of candidates in seconds (~1 million every 15 seconds), with high levels of pulsar recall (90%+). This technique is therefore applicable to the large volumes of data expected to be produced by the Square Kilometre Array (SKA). Use of this approach has assisted in the discovery of 20 new pulsars in data obtained during the LOFAR Tied-Array All-Sky Survey (LOTAAS).

[1]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[2]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[3]  E. L. Kelly Clinical versus statistical prediction: A theoretical analysis and review of the evidence. , 1955 .

[4]  L. Goddard Information Theory , 1962, Nature.

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[7]  A. Vaughan,et al.  Pulsar Search at the Molonglo Radio Observatory , 1968, Nature.

[8]  A. Hewish,et al.  Observation of a Rapidly Pulsating Radio Source , 1968, Nature.

[9]  J. Taylor,et al.  Periodic Intensity Fluctuations in Pulsars , 1969, Nature.

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  J. Taylor,et al.  A high-sensitivity pulsar survey , 1974 .

[12]  J. Seiradakis,et al.  Galactic distribution of pulsars , 1976 .

[13]  J. Taylor,et al.  The second Molonglo pulsar survey – discovery of 155 pulsars , 1978 .

[14]  J. Taylor,et al.  Parameters of 17 newly discovered pulsars in the northern sky. , 1978 .

[15]  Joseph H. Taylor,et al.  Northern Hemisphere pulsar survey - A third radio pulsar in a binary system , 1982 .

[16]  G. H. Stokes,et al.  A search for low-luminosity pulsars. , 1985 .

[17]  G. H. Stokes,et al.  A survey for short-period pulsars , 1985, Nature.

[18]  A. Lyne,et al.  High-radio-frequency survey for young and millisecond pulsars , 1986, Nature.

[19]  G. H. Stokes,et al.  Results of two surveys for fast pulsars , 1986 .

[20]  A. Lyne,et al.  A 5.75-millisecond pulsar in the globular cluster 47 Tucanae , 1990, Nature.

[21]  A. Lyne,et al.  Discovery of ten millisecond pulsars in the globular cluster 47 Tucanae , 1991, Nature.

[22]  The Jodrell bank 'C' pulsar survey - A survey of the northern Galactic plane for rapidly rotating pulsars , 1992 .

[23]  A. Lyne,et al.  A high-frequency survey of the southern Galactic plane for pulsars , 1992 .

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Two newly discovered millisecond pulsars , 1993 .

[26]  Discovery of two fast-rotating pulsars , 1993 .

[27]  Amit P. Sheth,et al.  Advances in Database Systems , 1994, International Centre for Mechanical Sciences.

[28]  A. Fruchter,et al.  A Search for Fast Pulsars along the Galactic Plane , 1995 .

[29]  S. Anderson,et al.  A high galactic latitude pulsar survey of the Arecibo sky , 1995 .

[30]  D. Lorimer,et al.  The parkes Southern pulsar Survey — I. Observing and data analysis systems and initial results , 1996 .

[31]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[32]  How small were the first cosmological objects , 1996, astro-ph/9603007.

[33]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[34]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[35]  R. Edwards,et al.  The Swinburne intermediate-latitude pulsar survey , 2001, astro-ph/0105126.

[36]  F. Camilo,et al.  The Parkes multi-beam pulsar survey - I. Observing and data analysis systems, discovery and timing of 100 pulsars , 2001, astro-ph/0106522.

[37]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[38]  R. Manchester,et al.  The ATNF Pulsar Catalogue , 2003, astro-ph/0309219.

[39]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[40]  The Arecibo 430 MHz Intermediate Galactic Latitude Survey: Discovery of Nine Radio Pulsars , 2003, astro-ph/0306432.

[41]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[42]  M. Mclaughlin,et al.  The Parkes Multibeam Pulsar Survey - V. Finding binary and millisecond pulsars , 2004, astro-ph/0408228.

[43]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[44]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[45]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[46]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[47]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[48]  R. Manchester,et al.  The Australia Telescope National Facility Pulsar Catalogue , 2005 .

[49]  M. Mclaughlin,et al.  A survey for pulsars in EGRET error boxes , 2005, astro-ph/0510608.

[50]  S. Ransom,et al.  A Survey of 56 Midlatitude EGRET Error Boxes for Radio Pulsars , 2006, astro-ph/0608225.

[51]  B. Reid,et al.  Arecibo Pulsar Survey Using ALFA. I. Survey Strategy and First Discoveries , 2005, astro-ph/0509732.

[52]  B. C. Joshi,et al.  The Parkes High-Latitude pulsar survey , 2006 .

[53]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[54]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[55]  DISCOVERY OF 14 RADIO PULSARS IN A SURVEY OF THE MAGELLANIC CLOUDS , 2006, astro-ph/0604421.

[56]  M. Mclaughlin,et al.  A Bright Millisecond Radio Burst of Extragalactic Origin , 2007, Science.

[57]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[58]  S. Ransom,et al.  A 1.4 GHz Arecibo Survey for Pulsars in Globular Clusters , 2007, 0707.1602.

[59]  S. Kulkarni,et al.  A LARGE-AREA SURVEY FOR RADIO PULSARS AT HIGH GALACTIC LATITUDES , 2009 .

[60]  Philip S. Yu,et al.  Next Generation of Data Mining , 2008, Chapman and Hall / CRC Data Mining and Knowledge Discovery Series.

[61]  Measurement of the ATLAS solenoid magnetic field , 2008 .

[62]  J. Cordes,et al.  ARECIBO PULSAR SURVEY USING ALFA: PROBING RADIO PULSAR INTERMITTENCY AND TRANSIENTS , 2008, 0811.2532.

[63]  D. Thompson,et al.  Detection of 16 Gamma-Ray Pulsars Through Blind Frequency Searches Using the Fermi LAT , 2009, Science.

[64]  D. Lorimer,et al.  Pulsar science with the Five hundred metre Aperture Spherical Telescope , 2009, 0908.1689.

[65]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[66]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[67]  Ubc,et al.  Discovery and timing of the first 8gr8 Cygnus survey pulsars , 2009, 0902.1675.

[68]  A. J. Faulkner,et al.  Pulsar searches and timing with the square kilometre array , 2009 .

[69]  F. Camilo,et al.  Discovery of 28 pulsars using new techniques for sorting pulsar candidates , 2009, 0901.3570.

[70]  M. Mclaughlin,et al.  A 6.5-GHz multibeam pulsar survey , 2010, 1009.5873.

[71]  J. Lazio,et al.  Science with the square kilometre array , 2010 .

[72]  The Pulsar Search Collaboratory , 2010, 1005.1060.

[73]  S. Burke-Spolaor,et al.  The High Time Resolution Universe Pulsar Survey - I. System configuration and initial discoveries , 2010, 1006.5744.

[74]  D. Thompson,et al.  THREE MILLISECOND PULSARS IN FERMI LAT UNASSOCIATED BRIGHT SOURCES , 2010, 1012.2862.

[75]  R. P. Eatough,et al.  Selection of radio pulsar candidates using artificial neural networks , 2010, 1005.5068.

[76]  S. Vander Wiel,et al.  COMPARISON OF RADIO-FREQUENCY INTERFERENCE MITIGATION STRATEGIES FOR DISPERSED PULSE DETECTION , 2012, 1201.1525.

[77]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[78]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[79]  SPAN512: A new mid-latitude pulsar survey with the Nançay Radio Telescope , 2012, Proceedings of the International Astronomical Union.

[80]  I. Cognard,et al.  Blind detection of giant pulses: GPU implementation , 2012 .

[81]  Mohamed Medhat Gaber,et al.  Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[82]  Bonn,et al.  On the origin of a highly dispersed coherent radio burst , 2012, 1206.4135.

[83]  S. Burke-Spolaor,et al.  The High Time Resolution Universe Pulsar Survey - VI. An artificial neural network and timing of 75 pulsars , 2012, 1209.0793.

[84]  R. Lynch,et al.  THE GREEN BANK TELESCOPE 350 MHz DRIFT-SCAN SURVEY. I. SURVEY OBSERVATIONS AND THE DISCOVERY OF 13 PULSARS , 2012, 1209.4293.

[85]  Peter E. Nugent,et al.  DISCOVERY, PROGENITOR AND EARLY EVOLUTION OF A STRIPPED ENVELOPE SUPERNOVA iPTF13bvn , 2013, 1307.1470.

[86]  R. Lynch,et al.  Searching for millisecond pulsars: surveys, techniques and prospects , 2013, 1308.4612.

[87]  R. P. Eatough,et al.  A coherent acceleration search of the Parkes multibeam pulsar survey - techniques and the discovery and timing of 16 pulsars , 2013, 1301.6346.

[88]  M. Mclaughlin,et al.  The Perseus Arm Pulsar Survey , 2013 .

[89]  D. Lorimer,et al.  The Northern High Time Resolution Universe pulsar survey - I. Setup and initial discoveries , 2013, 1308.0378.

[90]  G. Desvignes,et al.  PEACE: Pulsar Evaluation Algorithm for Candidate Extraction - A Software Package for Post-analysis Processing of Pulsar Survey Candidates , 2013, 1305.0447.

[91]  M. Mclaughlin,et al.  GOALS, STRATEGIES AND FIRST DISCOVERIES OF AO327, THE ARECIBO ALL-SKY 327 MHz DRIFT PULSAR SURVEY , 2013, 1307.8142.

[92]  M. Mclaughlin,et al.  Timing of pulsars found in a deep Parkes multibeam survey , 2013, 1306.1198.

[93]  S. Burke-Spolaor,et al.  A Population of Fast Radio Bursts at Cosmological Distances , 2013, Science.

[94]  J. Ayers,et al.  THE PULSAR SEARCH COLLABORATORY: DISCOVERY AND TIMING OF FIVE NEW PULSARS , 2012, 1209.4108.

[95]  M. Kozai,et al.  LONG-TERM VARIATION OF THE SOLAR DIURNAL ANISOTROPY OF GALACTIC COSMIC RAYS OBSERVED WITH THE NAGOYA MULTI-DIRECTIONAL MUON DETECTOR , 2014, 1404.1676.

[96]  A. J. Ford,et al.  THE GREEN BANK NORTHERN CELESTIAL CAP PULSAR SURVEY. I. SURVEY DESCRIPTION, DATA ANALYSIS, AND INITIAL RESULTS , 2014, 1406.5214.

[97]  S. Burke-Spolaor,et al.  A MILLISECOND INTERFEROMETRIC SEARCH FOR FAST RADIO BURSTS WITH THE VERY LARGE ARRAY , 2014, 1412.7536.

[98]  C. Flynn,et al.  SPINN: a straightforward machine learning solution to the pulsar candidate selection problem , 2014, 1406.3627.

[99]  A. Noutsos,et al.  The LOFAR pilot surveys for pulsars and fast radio transients , 2014, 1408.0411.

[100]  X. Siemens,et al.  UvA-DARE ( Digital Academic Repository ) Fast Radio Burst Discovered in the Arecibo Pulsar ALFA Survey , 2014 .

[101]  G. Desvignes,et al.  SEARCHING FOR PULSARS USING IMAGE PATTERN RECOGNITION , 2013, 1309.0776.

[102]  A. Noutsos,et al.  Limits on fast radio bursts at 145 MHz with ARTEMIS, a real-time software backend , 2015, 1506.03370.

[103]  M. Mclaughlin,et al.  THE GMRT HIGH RESOLUTION SOUTHERN SKY SURVEY FOR PULSARS AND TRANSIENTS. I. SURVEY DESCRIPTION AND INITIAL DISCOVERIES , 2015, 1509.07177.

[104]  N. V. Haren,et al.  Self–other integration and distinction in schizophrenia: A theoretical analysis and a review of the evidence , 2015, Neuroscience & Biobehavioral Reviews.

[105]  E. Ofek,et al.  A real-time fast radio burst: polarization detection and multiwavelength follow-up , 2014, 1412.0342.

[106]  R. Lynch,et al.  PSR J1930–1852: A PULSAR IN THE WIDEST KNOWN ORBIT AROUND ANOTHER NEUTRON STAR , 2015, 1503.06276.

[107]  M. Mclaughlin,et al.  The Parkes multibeam pulsar survey – VII. Timing of four millisecond pulsars and the underlying spin-period distribution of the Galactic millisecond pulsar population , 2015, 1501.05516.

[108]  William M. Grove,et al.  Clinical versus Statistical Prediction , 2015 .

[109]  Sabine Fenstermacher Handbook Of Pulsar Astronomy , 2016 .