Abstract
1- Introduction
2- Methodology
3- Data and feature descriptions
4- Deploying and testing
5- Result and discussion
6- Conclusion
References
Abstract
In this paper, we present a deep learning-based recognition algorithm to identify pulsars by observing data containing millions of candidates including radio frequency interference and noise sources. The dataset is obtained from the High Time Resolution Universe survey created and updated by the Parkes telescope. We investigate several effective single and combined features via simple logistic regression. To deal with the imbalanced dataset, we oversimplify the original dataset at different sampling rates, which is also one of the learning parameters. After training the pre-processed dataset via a convolutional neural network, we provide a cross-validated evaluation of all candidates. Results show that the deep-learning based recognition algorithm can identify the pulsar and radio frequency interference signals with high accuracy. The precision and recall of radio frequency interference are both 100%, and those of pulsars are 91% and 94%, respectively.
Introduction
Large amounts of pulsar data are typically required by astrophysicists to find statistically-significant relationships needed to find pulsars. The pulsar candidate selection problem is important and meaningful because it is an important step to find new pulsars. Recently, machine learning methods have been widely used for pulsar candidate selection problems [1–5]. However, with the advent of the Square Kilometer Array (SKA) radio telescope, the data volume has become extremely high. On the one hand, large-volume data provides a great opportunity to find more pulsars, but on the other hand, processing big data sets can become a daunting task rather quickly. The simple reason for this is that traditional machine learning methods cannot meet the SKA data challenge. Traditional machine learning methods find patterns from features extracted from the data [6,7]. This pattern recognition step does not work effectively for pulsar data. Unlike traditional machine learning methods, deep learning methods are used to learn directly from data. The development of an accelerator technique, e.g., graphics processing units (GPU), significantly expands the capacity of deep learning methods to deal with big data. Hinton applied deep neural networks (DNN) to classification problems and obtained highly accurate results [8]. In addition to highly accurate results, processing speed is also an important factor to consider. To increase the training speed, we adopt convolutional neural networks (CNN) in pulsar identification, which have fewer parameters and are thus faster than the DNNs. In this work, we effectively use data architecture to implement learning methods directly to raw data to reduce the system error and obtain highly accurate results.