Module Code: EEEM034 |
Module Title: SPEAKER AND SPEECH RECOGNITION |
|
Module Provider: Electronic Engineering
|
Short Name: EEM.SSR
|
Level: M
|
Module Co-ordinator: JACKSON PJ Dr (Elec Eng)
|
Number of credits: 15
|
Number of ECTS credits: 7.5
|
|
|
|
Module Availability |
Spring Semester |
|
|
Assessment Pattern |
Unit(s) of Assessment
|
Weighting Towards Module Mark (%)
|
Written Examination (2-hr unseen paper)
|
60
|
Computer-based coursework (laboratory assignment, max. 20 pages, with marks awarded for lab. preparation, performance, results and reporting)
|
|
|
|
|
Module Overview |
Human computer interaction and biometrics are two of the most important facets of machine intelligence. The topics provide advanced statistical and probabilistic techniques that are applied to speech patterns, together with powerful methods of time-series modelling and likelihood testing. You will be taught how to extract useful features from speech signals, to design a language model, to perform decoding and training, and given an insight into current research on authentication and spontaneous speech recognition, such as speaker adaptation and robustness to noise. |
|
|
Prerequisites/Co-requisites |
EE3.dspa (Digital Signal Processing) or equivalent undergraduate signal processing course |
|
|
Module Aims |
To teach the principles of automatic speech and speaker recognition. |
|
|
Learning Outcomes |
On completion a successful student should be able to:
• demonstrate an understanding and appreciation of the principles of pattern recognition in relation to speech and speaker recognition, including feature extraction, dynamic time warping, hidden Markov modelling, Gaussian mixture models, expectation maximisation, language models and their application to large-vocabulary continuous speech recognition; • formulate and analyse solutions to HMM problems, such as simple likelihood calculation, optimal state-sequence identification and parameter re-estimation; • evaluate a speaker verification system based on objective measures of its operating characteristics.
|
|
|
Module Content |
AUTOMATIC SPEECH RECOGNITION (24h) - PJBJ
Introduction Human speech communication. The role of ASR in human computer interaction. Fundamentals of phonetic and speech perception. Feature extraction Vocal tract acoustics and Linear prediction. Mel-frequency cepstrum. Difference features. Template matching Dynamic time warping. Isolated-word and connected-word recognition. Search pruning.
Hidden Markov models Markov models and state topologies. HMM formulation. Discrete and continuous output pdfs. Recognition and Viterbi decoding Trellis diagrams. Forward and backward probabilities. Cumulative likelihoods and trace back. Machine learning by Expectation maximisation Baum-Welch training: derivation and implementation. Large-vocabulary continuous speech recognition Language modeling and discounting. Context-sensitivity and parameter tying. Adaptation and robustness Speaker adaptation: MLLR and MAP methods. Noise robustness: spectral subtraction and parallel model combination. ASR as machine intelligence Definitions of intelligence. Neural network approaches to ASR, and HMMs in pattern recognition and other applications.
SPEAKER RECOGNITION (6h) - JK
Introduction Biometrics. Speaker recognition/verification problem. Basic terminology: Open/closed set formulation; text-dependent, text-prompted, and text-independent systems. Speech characterisation. Silence removal. Approaches to speaker identification Gaussian mixture models. Client population modelling by adaptation of the world model. Maximum likelihood hypothesis testing. Sphericity test. Score normalisation. Decision thresholds. System evaluation Authentic users, impostors; training, evaluation and test sets. False acceptances and false rejections. Receiver operating characteristics (ROC) and DET curves. Generalisation. Total error rate.
|
|
|
Methods of Teaching/Learning |
Lectures (27 hours total, 2-3 hr/week), laboratory (3 hours total, 0-1 hour/week), assignment, web-based exercises and tutorial sheets. |
|
|
Selected Texts/Journals |
Young, S.J. et al., The HTK Book [http://htk.eng.cam.ac.uk/], Entropic CRL Online [A]
Jelinek, F., Statistical Methods for Speech Recognition [0-262-10066-5]., MIT Press, £30 [B]
Holmes, J.N. & Holmes, W.J., Speech Synthesis and Recognition [0-748-40857-6], Taylor & Francis, £22.50 [B]
Young, S.J., Large vocabulary continuous speech recognition. IEEE Sig. Proc. Mag. 13(5): 45--57, 1996., IEEE Online [C]
|
|
|
Last Updated |
15th June 2010 |
|
|
|