Module Code: EEEM030 |
Module Title: SPEECH AND AUDIO PROCESSING AND CODING |
|
Module Provider: Electronic Engineering
|
Short Name: EEM.SAP
|
Level: M
|
Module Co-ordinator: KONDOZ A Prof (Elec Eng)
|
Number of credits: 15
|
Number of ECTS credits: 7.5
|
|
|
|
Module Availability |
Autumn semester |
|
|
Assessment Pattern |
Components of Assessment
|
Method(s)
|
Percentage Weighting
|
Speech Processing experiment
|
Written
|
15%
|
Closed-book Examination
|
Written
|
85%
|
|
|
|
Module Overview |
|
|
|
Prerequisites/Co-requisites |
|
|
|
Module Aims |
The aim of this module is to educate students in the particular aspects of Information Technology, so that they can either proceed to do PhDs or get jobs in the R & D departments of industry, i.e. jobs that are at a higher level than mere software package operators. |
|
|
Learning Outcomes |
On successful completion of this module, the student will have developed an understanding of the fundamental underlying principles of speech and audio processing and coding. |
|
|
Module Content |
Lecture Component Speech Processing Lecturer Dr. W. Wang Hours 10 Lecture hours with interspersed Problem Classes
1 Introduction - Speech and language. Digital speech processing. Speech processing applications. Characteristics of speech signals. 2 Speech Production - Vocal tract description. Source-filter model. Origin of periodicity, formants and anti-resonance in terms of physical model. All-pole digital model of vocal tract. Relationship between physical model and phonemes. 3 Speech Perception - The structure of the ear. Frequency and amplitude response of ear. Perception units. 4-5 Signal Processing Techniques - Autocorrelation of speech signals. Pitch estimation from speech signals. Fourier analysis of speech signal. Spectrogram and power spectrum density. Spectral analysis of voiced and unvoiced speech. Spectral analysis of formants and antiresonances. Harmonic structure of speech. 6-7 Linear Prediction – Z-transform. Vocal tract transfer function. Stability of transfer function. Concept and model of linear prediction. All-pole source filter. Order selection and its relation to prediction error. LPC coefficients estimation. Speech synthesis from the LPC coefficients. 8 Inverse Filtering of Speech Signal - Separating source from excitation. Vocal tract response – format estimation. Pitch estimation from the residual. Robust linear prediction. 9-10 Cepstral Deconvolution - Definition of real cepstrum. Transforming convolution to sum by non-linear operation. The complex logarithm. The complex cepstrum. The quefrency unit. Pitch estimation via the cepstrum. Comparison of spectral envelope with that derived from linear prediction.
Lecture Component Speech Coding Lecturer Prof A. Kondoz Hours 10 Lecture hours
1 Introduction Speech Coding - Speech quality, typical low-bit rates and the corresponding speech quality, coder complexity including some standards. 2-3 Quantisation of speech for transmission - Scalar: one-dimensional quantisation – Vector: two-dimensional. 4-5 Linear Prediction - LPC and Pitch modelling used in low bit rate speech coding/transmission. 6-8 Time and Frequency Domain speech coders - Adaptive predictive coda (APC). Code Excited LPC (CELP). 9 Noise Shaping in Speech coding - Noise Shaping in Speech Coders to mask the noise and improve its perceptual quality 10 Real-time Implementation Considerations and demos – How to implement speech coders and other elements such as echo and noise cancellation with some sound demos
Lecture Component Audio Processing Lecturer Dr. B. Gunel Hours 10 Lecture hours
1-2 Introduction to the audio recording and acoustics – Microphone types (including MEMS) and directivity patterns, digital audio acquisition, wave propagation and acoustics, effects of reflections and reverberation, image-source modelling. 3-4 Beamforming and sound source localisation using microphone arrays – Time-delay-of arrival, delay-and-sum beamformer, performance and implementation considerations. 5-6 Psychoacoustics and perceptual audio coding – Absolute threshold of hearing, auditory masking, spatial hearing, audio coding standards. 7-8 Spatial audio recording, post production and rendering – Binaural, stereo, multichannel surround and wave field synthesis (WFS) systems. 9-10 Subjective listening tests – General considerations, test design (full factorial, fractional), statistical analysis, standards for testing audio codecs.
|
|
|
Methods of Teaching/Learning |
Lectures: 3 hours per week for 10 weeks Laboratory: Speech Processing Experiment
|
|
|
Selected Texts/Journals |
Rabiner, L.R., and Schafer, R.W, Digital Processing of Speech Signals, Prentice-Hall. 0-13-213603-1 [A]
Saito, S., and Nakata, K, Fundamentals of Speech Signal Processing, Academic Press. 0-12-614880-5 (out of print) [B]
Kondoz, A, Digital Speech Coding for Low Bit-rate Communication Systems, Wiley, 1995. 0471 950645 [A]
Rabiner, L.R., and Schafer, R.W, Digital Processing of Speech Signals, Prentice-Hall. 0-13-213603-1 [B]
Saito, S., and Nakata, K, Fundamentals of Speech Signal Processing, Academic Press. 0-12-614880-5 (out of print) [B]
Barnwell, T, Speech Coding, John Wiley & Sons. 0-471-51692-9 [B]
Howard, D., and Angus, J, Acoustics and Psychoacoustics, Focal Press, 1996. [A]
Moore, B.C.J, An Introduction to the Psychology of Hearing, Academic Press, 1998. [B]
Blauert, J, Spatial Hearing, MIT Press, 1997 [B]
Begault, D, 3D Sound for Virtual Reality and Multimedia, 1995. [B]
Rumsey, F, Microphones in Stereophonic Application. In Microphone Engineering Handbook (Ed. Gayford), Focal Press 1994. [B]
Rumsey, F, The Audio Workstation Handbook, Focal Press, 1996. [B]
Rumsey, F., and Watkinson, J, The Digital Interface Handbook, Focal Press, 1996. [B]
|
|
|
Last Updated |
15th June 2010 |
|
|
|