Zwicker, E., Terhardt, E., Paulus, E. (1979). Automatic speech recognition using psychoacoustic models. J. Acoust. Soc. Am. 65, 487-498
An approach to automatic speech recognition is described, which, in a straightforward way, follows the concept of (1) preprocessing in terms of auditory parameters and (2) subsequent classification and recognition. The preprocessing system has been realized in analog hardware, while recognition is carried out on a digital computer. In the preprocessing system, the essential psychoacoustic principles of the perception of loudness, pitch roughness, and subjective duration are implemented with some approximation. The system essentially consists of 24 bandpass filters, nonlinear transformation of each filter output into specific loudness and specific roughness, and final transformation of these parameters into total loudness, total roughness, and three spectral momenta. As a means to further reduce the information flow, continuous selection of dominant parameters is also considered on the basis of psychoacoustic data. The subsequent recognition process is mainly characerized by (1) discrimination between speech and silent periods, (2) detection of syllable peaks and classification of syllable nuclei, and (3) assumption of syllable boundaries and classification of consonant clusters. Though the entire system as yet is far from being complete and perfect, the present results indicate that the concept provides a systematic and promising way towards automatic recognition of continuous speech.