University of Surrey - Guildford
Registry
  
 

  
 
Registry > Module Catalogue
View Module List by A.O.U. and Level  Alphabetical Module Code List  Alphabetical Module Title List  Alphabetical Old Short Name List  View Menu 
2010/1 Module Catalogue
 Module Code: EEEM030 Module Title: SPEECH AND AUDIO PROCESSING AND CODING
Module Provider: Electronic Engineering Short Name: EEM.SAP
Level: M Module Co-ordinator: KONDOZ A Prof (Elec Eng)
Number of credits: 15 Number of ECTS credits: 7.5
 
Module Availability

Autumn semester

Assessment Pattern

Components of Assessment
Method(s)
Percentage Weighting
Speech Processing experiment
Written
15%
Closed-book Examination
Written
85%

Module Overview
Prerequisites/Co-requisites

Module Aims

The aim of this module is to educate students in the particular aspects of Information Technology, so that they can either proceed to do PhDs or get jobs in the R & D departments of industry, i.e. jobs that are at a higher level than mere software package operators.

Learning Outcomes

On successful completion of this module, the student will have developed an understanding of the fundamental underlying principles of speech and audio processing and coding.

Module Content

Lecture Component Speech Processing
Lecturer Dr. W. Wang
Hours 10 Lecture hours with interspersed Problem Classes

1 Introduction - Speech and language. Digital speech processing. Speech processing applications. Characteristics of speech signals.
2 Speech Production - Vocal tract description. Source-filter model. Origin of periodicity, formants and anti-resonance in terms of physical model. All-pole digital model of vocal tract. Relationship between physical model and phonemes.
3 Speech Perception - The structure of the ear. Frequency and amplitude response of ear. Perception units.
4-5 Signal Processing Techniques - Autocorrelation of speech signals. Pitch estimation from speech signals. Fourier analysis of speech signal. Spectrogram and power spectrum density. Spectral analysis of voiced and unvoiced speech. Spectral analysis of formants and antiresonances. Harmonic structure of speech.
6-7 Linear Prediction – Z-transform. Vocal tract transfer function. Stability of transfer function. Concept and model of linear prediction. All-pole source filter. Order selection and its relation to prediction error. LPC coefficients estimation. Speech synthesis from the LPC coefficients.
8 Inverse Filtering of Speech Signal - Separating source from excitation. Vocal tract response – format estimation. Pitch estimation from the residual. Robust linear prediction.
9-10 Cepstral Deconvolution - Definition of real cepstrum. Transforming convolution to sum by non-linear operation. The complex logarithm. The complex cepstrum. The quefrency unit. Pitch estimation via the cepstrum. Comparison of spectral envelope with that derived from linear prediction. 

Lecture Component Speech Coding
Lecturer Prof A. Kondoz
Hours 10 Lecture hours

1 Introduction Speech Coding - Speech quality, typical low-bit rates and the corresponding speech quality, coder complexity including some standards.
2-3 Quantisation of speech for transmission - Scalar: one-dimensional quantisation – Vector: two-dimensional.
4-5 Linear Prediction - LPC and Pitch modelling used in low bit rate speech coding/transmission.
6-8 Time and Frequency Domain speech coders - Adaptive predictive coda (APC). Code Excited LPC (CELP).
9 Noise Shaping in Speech coding - Noise Shaping in Speech Coders to mask the noise and improve its perceptual quality
10 Real-time Implementation Considerations and demos – How to implement speech coders and other elements such as echo and noise cancellation with some sound demos 

Lecture Component Audio Processing
Lecturer Dr. B. Gunel
Hours 10 Lecture hours 

1-2 Introduction to the audio recording and acoustics – Microphone types (including MEMS) and directivity patterns, digital audio acquisition, wave propagation and acoustics, effects of reflections and reverberation, image-source modelling.
3-4 Beamforming and sound source localisation using microphone arrays – Time-delay-of arrival, delay-and-sum beamformer, performance and implementation considerations.
5-6 Psychoacoustics and perceptual audio coding – Absolute threshold of hearing, auditory masking, spatial hearing, audio coding standards.
7-8 Spatial audio recording, post production and rendering – Binaural, stereo, multichannel surround and wave field synthesis (WFS) systems.
9-10 Subjective listening tests – General considerations, test design (full factorial, fractional), statistical analysis, standards for testing audio codecs.

Methods of Teaching/Learning

Lectures: 3 hours per week for 10 weeks
Laboratory: Speech Processing Experiment

Selected Texts/Journals

Rabiner, L.R., and Schafer, R.W, Digital Processing of Speech Signals, Prentice-Hall. 0-13-213603-1 [A] 

Saito, S., and Nakata, K, Fundamentals of Speech Signal Processing, Academic Press. 0-12-614880-5 (out of print) [B] 

Kondoz, A, Digital Speech Coding for Low Bit-rate Communication Systems, Wiley, 1995. 0471 950645 [A] 

Rabiner, L.R., and Schafer, R.W, Digital Processing of Speech Signals, Prentice-Hall. 0-13-213603-1 [B] 

Saito, S., and Nakata, K, Fundamentals of Speech Signal Processing, Academic Press. 0-12-614880-5 (out of print) [B] 

Barnwell, T, Speech Coding, John Wiley & Sons. 0-471-51692-9 [B] 

Howard, D., and Angus, J, Acoustics and Psychoacoustics, Focal Press, 1996. [A] 

Moore, B.C.J, An Introduction to the Psychology of Hearing, Academic Press, 1998. [B] 

Blauert, J, Spatial Hearing, MIT Press, 1997 [B] 

Begault, D, 3D Sound for Virtual Reality and Multimedia, 1995. [B] 

Rumsey, F, Microphones in Stereophonic Application. In Microphone Engineering Handbook (Ed. Gayford), Focal Press 1994. [B] 

Rumsey, F, The Audio Workstation Handbook, Focal Press, 1996. [B] 

Rumsey, F., and Watkinson, J, The Digital Interface Handbook, Focal Press, 1996. [B]

Last Updated

15th June 2010