University of Surrey - Guildford
Registry
  
 

  
 
Registry > Module Catalogue
View Module List by A.O.U. and Level  Alphabetical Module Code List  Alphabetical Module Title List  Alphabetical Old Short Name List  View Menu 
2010/1 Module Catalogue
 Module Code: EEEM034 Module Title: SPEAKER AND SPEECH RECOGNITION
Module Provider: Electronic Engineering Short Name: EEM.SSR
Level: M Module Co-ordinator: JACKSON PJ Dr (Elec Eng)
Number of credits: 15 Number of ECTS credits: 7.5
 
Module Availability

Spring Semester

Assessment Pattern

Unit(s) of Assessment
Weighting Towards Module Mark (%)
Written Examination (2-hr unseen paper)
60
Computer-based coursework (laboratory assignment, max. 20 pages, with marks awarded for lab. preparation, performance, results and reporting)

Module Overview
Human computer interaction and biometrics are two of the most important facets of machine intelligence. The topics provide advanced statistical and probabilistic techniques that are applied to speech patterns, together with powerful methods of time-series modelling and likelihood testing. You will be taught how to extract useful features from speech signals, to design a language model, to perform decoding and training, and given an insight into current research on authentication and spontaneous speech recognition, such as speaker adaptation and robustness to noise.
Prerequisites/Co-requisites

EE3.dspa (Digital Signal Processing) or equivalent undergraduate signal processing course

Module Aims

To teach the principles of automatic speech and speaker recognition.

Learning Outcomes

On completion a successful student should be able to: 

• demonstrate an understanding and appreciation of the principles of pattern recognition in relation to speech and speaker recognition, including feature extraction, dynamic time warping, hidden Markov modelling, Gaussian mixture models, expectation maximisation, language models and their application to large-vocabulary continuous speech recognition;
• formulate and analyse solutions to HMM problems, such as simple likelihood calculation, optimal state-sequence identification and parameter re-estimation;
• evaluate a speaker verification system based on objective measures of its operating characteristics.

Module Content

AUTOMATIC SPEECH RECOGNITION (24h) - PJBJ 

Introduction
Human speech communication. The role of ASR in human computer interaction. Fundamentals of phonetic and speech perception.
Feature extraction
Vocal tract acoustics and Linear prediction. Mel-frequency cepstrum. Difference features.
Template matching
Dynamic time warping. Isolated-word and connected-word recognition. Search pruning.

Hidden Markov models
Markov models and state topologies. HMM formulation. Discrete and continuous output pdfs.
Recognition and Viterbi decoding
Trellis diagrams. Forward and backward probabilities. Cumulative likelihoods and trace back.
Machine learning by Expectation maximisation
Baum-Welch training: derivation and implementation.
Large-vocabulary continuous speech recognition
Language modeling and discounting. Context-sensitivity and parameter tying.
Adaptation and robustness
Speaker adaptation: MLLR and MAP methods.
Noise robustness: spectral subtraction and parallel model combination.
ASR as machine intelligence
Definitions of intelligence. Neural network approaches to ASR, and HMMs in pattern recognition and other applications.

SPEAKER RECOGNITION (6h) - JK 

Introduction
Biometrics. Speaker recognition/verification problem. Basic terminology: Open/closed set formulation; text-dependent, text-prompted, and text-independent systems. Speech characterisation. Silence removal.
Approaches to speaker identification
Gaussian mixture models. Client population modelling by adaptation of the world model. Maximum likelihood hypothesis testing. Sphericity test. Score normalisation. Decision thresholds.
System evaluation
Authentic users, impostors; training, evaluation and test sets. False acceptances and false rejections. Receiver operating characteristics (ROC) and DET curves. Generalisation. Total error rate.

Methods of Teaching/Learning

Lectures (27 hours total, 2-3 hr/week), laboratory (3 hours total, 0-1 hour/week), assignment, web-based exercises and tutorial sheets.

Selected Texts/Journals

Young, S.J. et al., The HTK Book [http://htk.eng.cam.ac.uk/], Entropic CRL Online [A] 

Jelinek, F., Statistical Methods for Speech Recognition [0-262-10066-5]., MIT Press, £30 [B] 

Holmes, J.N. & Holmes, W.J., Speech Synthesis and Recognition [0-748-40857-6], Taylor & Francis, £22.50 [B] 

Young, S.J., Large vocabulary continuous speech recognition. IEEE Sig. Proc. Mag. 13(5): 45--57, 1996., IEEE Online [C]

Last Updated

15th June 2010