论文信息 - Gramophone Noise Detection and Reconstruction Using Time Delay Artificial Neural Networks

Gramophone Noise Detection and Reconstruction Using Time Delay Artificial Neural Networks

Gramophone records were the main recording medium for more than seven decades and regained widespread popularity over the past several years. Being an analog storage medium, gramophone records are subject to distortions caused by scratches, dust particles, degradation, and other means of improper handling. The observed noise often leads to an unpleasant listening experience and requires a filtering process to remove the unwanted disruptions and improve the audio quality. This paper proposes a novel approach that employs various feed forward time delay artificial neural networks to detect and reconstruct noise in musical sound waves. A set of 800 songs from eight different genres were used to validate the performance of the neural networks. The performance was analyzed according to the outlier detection and interpolation accuracy, the computational time and the tradeoff between the accuracy and the time. The empirical results of both detection and reconstruction neural networks were compared to a number of other algorithms, including various statistical measurements, duplication approaches, trigonometric processes, polynomials, and time series models. It was found that the neural networks’ outlier detection accuracy was slightly lower than some of the other noise identification algorithms, but achieved a more efficient tradeoff by detecting most of the noise in real time. The reconstruction process favored neural networks with an increase in the interpolation accuracy compared to other widely used time series models. It was also found that certain genres such as classical, country, and jazz music were interpolated more accurately. Volatile signals, such as electronic, metal, and pop music were more challenging to reconstruct and were substantially better interpolated using neural networks than the other examined algorithms.

Andries Petrus Engelbrecht | Christoph F. Stallmann | A. Engelbrecht | C. Stallmann

[1] Sung-Suk Kim. Time-delay recurrent neural network for temporal correlations and prediction , 1998, Neurocomputing.

[2] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[3] H. Sawai. TDNN-LR continuous speech recognition system using adaptive incremental TDNN training , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4] Felipe Maia Galvão França,et al. Multilingual part-of-speech tagging with weightless neural networks , 2015, Neural Networks.

[5] K. Pearson. VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[6] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7] P.P. Rayan Kutty,et al. Kalman filter using quantile based noise estimation for audio restoration , 2011, 2011 International Conference on Emerging Trends in Electrical and Computer Technology.

[8] Aurelio Uncini,et al. Subband neural networks prediction for on-line audio signal recovery , 2002, IEEE Trans. Neural Networks.

[9] Minh Tue Vo,et al. Incremental learning using the time delay neural network , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[11] Qi Cheng,et al. Incremental Learning Algorithm for Speech Recognition , 2007 .

[12] Simon J. Godsill,et al. Digital audio restoration , 1998 .

[13] David J. Hill,et al. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[14] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15] Simon Dixon,et al. Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[18] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[19] R. K. Agrawal,et al. A Homogeneous Ensemble of Artificial Neural Networks for Time Series Forecasting , 2011, ArXiv.

[20] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[21] Andrzej Czyzewski,et al. Neuro-rough control of masking thresholds for audio signal enhancement , 2001, Neurocomputing.

[22] Feng Qian,et al. A Novel Time-Delay Recurrent Neural Network and Application for Identifying and Controlling Nonlinear Systems , 2007, Third International Conference on Natural Computation (ICNC 2007).

[23] Daricha Sutivong,et al. Avoiding Local Minima in Feedforward Neural Networks by Simultaneous Learning , 2007, Australian Conference on Artificial Intelligence.

[24] T. Bollerslev,et al. Generalized autoregressive conditional heteroskedasticity , 1986 .

[25] F. Galton. Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[26] Gerald Friedland,et al. Audio concept classification with Hierarchical Deep Neural Networks , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[27] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[28] Christoph Frank Stallmann,et al. Digital Audio Restoration of Gramophone Records , 2015 .

[29] David L. Elliott,et al. A Better Activation Function for Artificial Neural Networks , 1993 .

[30] J. Zakoian,et al. GARCH Models: Structure, Statistical Inference and Financial Applications , 2010 .

[31] M. C. Hau,et al. A practical method for outlier detection in autoregressive time series modelling , 1989 .

[32] Krzysztof Cisowski,et al. Adaptive scheme for elimination of background noise and impulsive disturbances from audio signals , 1993 .

[33] R. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[34] Laurent Oudre,et al. Automatic Detection and Removal of Impulsive Noise in Audio Signals , 2015, Image Process. Line.

[35] Christian Igel,et al. Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[36] Aurelio Uncini,et al. Audio signal processing by neural networks , 2003, Neurocomputing.

[37] Edward K. Blum,et al. Approximation theory and feedforward networks , 1991, Neural Networks.

[38] Andries Petrus Engelbrecht,et al. Gramophone noise reconstruction - a comparative study of interpolation algorithms for noise reduction , 2015, 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE).

[39] Geoffrey Zweig,et al. Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40] Ah Chung Tsoi,et al. Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[41] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[42] Marc Moonen,et al. Declipping of Audio Signals Using Perceptual Compressed Sensing , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[43] R. Tsay. Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[44] Patrick J. Wolfe,et al. Correction of Wow and Flutter Effects in Analog Tape Transfers , 2004 .

[45] Erik Marchi,et al. Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46] Saeed V. Vaseghi,et al. Advanced Digital Signal Processing and Noise Reduction , 2006 .

[47] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[48] Robert L. Mason,et al. Fractional factorial design , 2009 .

[49] Kurt Hornik,et al. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[50] Andries Petrus Engelbrecht,et al. Training neural networks with PSO in dynamic environments , 2009, 2009 IEEE Congress on Evolutionary Computation.

[51] Junshui Ma,et al. Online novelty detection on temporal sequences , 2003, KDD '03.

[52] Christian Igel,et al. Improving the Rprop Learning Algorithm , 2000 .

[53] D. Polan,et al. Noise: The Political Economy of Music , 1989 .

[54] Peter J. W. Rayner,et al. Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[55] K. P. Seng,et al. Multimedia signal processing using AI , 2003, 9th Asia-Pacific Conference on Communications (IEEE Cat. No.03EX732).

[56] Andrzej Czyzewski,et al. Learning algorithms for audio signal enhancement. Part 1: Neural network implementation for the removal of impulse distortions , 1997 .

[57] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[58] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[59] Stephan Herzog,et al. Efficient DSP Implementation of Median Filtering for Real-Time Audio Noise Reduction , 2013 .

[60] Maciej Niedzwiecki,et al. Elimination of clicks from archive speech signals using sparse autoregressive modeling , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).