Harmonic coding - state of the art and future trends

Abstract The recent research trend towards the use of harmonies/sinusoid based methods, in order to exploit the fine spectral structure of voiced speech, cannot be questioned. This paper discusses the state of the art in this area, both in terms of analysis-synthesis methods and of their application to coding. The key points are: • - Harmonic modelling is a very efficient tool for voiced regions, producing synthetic speech of very high quality, but being simultaneously prone to pitch and voicing errors. The main disadvantage of harmonic coding is the need for an alternative method for unvoiced regions. ATC is a natural choice. In this paper, an 8 kbit/s simulation is presented, using hard switching between harmonic coding and ATC. • - Sinusoid based modelling extends the basic analysis-synthesis framework to unvoiced and transition regions, by removing the constraint that the sinusoids be harmonically related. When it comes to coding, however, it still has many unsolved problems. As a conclusion, some guidelines for future research are discussed.

[1]  Luís B. Almeida,et al.  Nonstationary spectral modeling of voiced speech , 1983 .

[2]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. R. Schroeder,et al.  Adaptive predictive coding of speech signals , 1970, Bell Syst. Tech. J..

[4]  Tor A. Ramstad,et al.  Fully vector-quantized subband coding with adaptive codebook allocation , 1984, ICASSP.

[5]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[6]  José M. Tribolet,et al.  A spectral model for nonstationary voiced speech , 1982, ICASSP.

[7]  Luís B. Almeida,et al.  Quasi-optimal analysis for sinusoidal representation of speech , 1987 .

[8]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[9]  Ronald E. Crochiere,et al.  Frequency domain coding of speech , 1979 .

[10]  Isabel Trancoso,et al.  Pole-zero multipulse speech representation using harmonic modelling in the frequency domain , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[12]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[13]  T. Quatieri,et al.  Phase modelling and its application to sinusoidal transform coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  D. Griffin,et al.  A high quality 9.6 kbps speech coding system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Luís B. Almeida,et al.  Harmonic coding: A low bit-rate, good-quality speech coding technique , 1982, ICASSP.

[16]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[17]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[18]  R. McAulay,et al.  Mid-rate coding based on a sinusoidal representation of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Mark J. T. Smith,et al.  A new speech coding model based on a least-squares sinusoidal representation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.