A Neural Network Based, Speaker-Independent, Large Vocabulary, Continuous Speech Recognition System

Description

Project Title:
A Neural Network Based, Speaker-Independent, Large Vocabulary, Continuous Speech Recognition System
Acronym:
WERNICKE
Number:
6487
Work Area:
Speech & Natural Language
Coordinator:
Lernout & Hauspie Speechproducts
K. Albert I laan 64
B - 1780 WEMMEL
Coordinator Country:
B
Partners
INESC P
University of Cambridge UK
International Computer Science Institute USA
Contact Point:
Dr. H. Bourlard
Telephone:
+32/2 460 33 97
Fax:
+32/2 460 01 72
E-Mail:
bourlard@brussels.lhs.be
Keywords:
speech recognition, hidden Markov models, artificial neural networks, hybrid models, speaker adaptation, array processing
Start Date:
1 October 92
Duration:
36 months
Status:
running
Abstract:
WERNICKE is exploiting hybrid structures consisting of combinations of hidden Markov models (HMMs) and artificial neural Networks (ANNs) to improve the state-of-the-art in large vocabulary, continuous speech recognisers. Building on existing prototypes that were available to most of the partners, this project includes state-of-the-art HMMs and ANNs and explores aspects such as theory, implementation, improved training and speaker adaptation in hybrid HMM/ANN systems. At the end of the first year of this project, comparison results of different hybrid structures based on a common recogniser and common hardware are available.

AIMS

The main objective of this project is to learn how artificial neural networks (ANNs) can be used for continuous speech recognition to significantly improve state-of-the-art systems and, using dedicated hardware, to develop fast implementations of the resulting algorithms, ie real-time recognition and fast turnaround of training. More specifically, this project addresses the problem of improving state-of-the-art, hidden markov model (HMM)-based, large vocabulary, speaker dependent and independent, continuous speech recognition systems by means of hybrid HMM/ANN structures.
In this framework, different ANN architectures will be compared and speaker adapation methods will be developed. This project contains two parts with very strong inter-dependencies:
- development and evaluation of theories and methods to improve hybrid HMM/ANN systems, and
- development of hardware and software tools to help the research and to implement resulting algorithms.

APPROACH AND METHODS

The consortium brings together partners with existing skills and baseline systems in the area: LHS and ICSI (Intl. Computer Science Institute, Berkeley, CA, subcontractor) in hybrid hidden Markov model (HMM)/multilayer perceptron (MLP) structures and CUED in recurrent neural network (RNN) structures, both of which perform competitively with state-of-the-art HMM technology; INESC in artificial neural networks (ANNs) and speaker adaptation, and ICSI in their development of the Ring Array Processor (RAP) which provides over 500 Mflops and which is now being used for computation by each partner.
The main research themes include further development and improvement of the baseline HMM/MLP hybrid, and development of an HMM/RNN hybrid; definition of common recognition software to be used as a basis for comparison and assessments of research results; comparison of both MLP and RNN hybrid systems; development of better acoustic features with enhanced speaker and communication channel robustness; incorporation of improvements in hybrids analogous to those used in state-of-the-art HMM recognisers; development of better training procedures; investigation of fast speaker adaptation in hybrids; demonstration of real-time recognisers and their evaluation against state-of-the-art HMMs and international reference databases such as DARPA Resource Management (1000 words, speaker independent, continuous speech) and Wall Street Journal (5000 and 20000 words, speaker independent, continuous speech).
The training of hybrid structures is highly computer intensive. The inclusion of ICSI as a subcontractor gives the consortium access to the very high performance hardware (RAP and a VLSI processor called SPERT) and software tools which ICSI has developed and will further adapt as the project progresses. These hardware and software tools will be used as a common platform of this project.

PROGRESS AND RESULTS

At the end of the first year of this project, comparison results of different hybrid structures based on a common recogniser and common hardware are available. These results have shown that the hybrid approach was able to achieve recognition performance comparable to much more sophisticated state-of-the-art HMMs, ie, around 5% error rate on the DARPA Resource Management database (1000 words, speaker independent, continuous speech recognition task).

POTENTIAL

This project is expected to make a significant technical and scientific contribution to the use and understanding of HMM/ANN hybrids and of HMMs and ANNs separately in speech recognition, pattern recognition, and to the neural computing involved. It will also provide a testbed for a new generation of commercial speech recognition systems exploiting hybrid HMM/ANN technology.

LATEST PUBLICATIONS

INFORMATION DISSEMINATION ACTIVITIES

A workshop on this topic with invited external participants will be organised during 1994.



Sven Müßig, last update 07-nov-1995. Your feedback is welcome.