University of Surrey - Guildford
Registry
  
 

  
 
Registry > Provisional Module Catalogue - UNDER CONSTRUCTION & SUBJECT TO CHANGE
View Module List by A.O.U. and Level  Alphabetical Module Code List  Alphabetical Module Title List  Alphabetical Old Short Name List  View Menu 
2011/2 Provisional Module Catalogue - UNDER CONSTRUCTION & SUBJECT TO CHANGE
 Module Code: COM3012 Module Title: PARALLEL ARCHITECTURES
Module Provider: Computing Short Name: COM3012
Level: HE3 Module Co-ordinator: BRIFFA JA Dr (Computing)
Number of credits: 15 Number of ECTS credits: 7.5
 
Module Availability
Spring Semester 2009
Assessment Pattern

Assessment Pattern

 

                        Unit(s) of Assessment

 

                        Weighting Towards Module Mark( %)

 

Exam:

 

2 hour practical examination

 

50

 

 

Coursework (individual):

 

1 piece of assigned coursework, due in week 5

 

25

 

 

Coursework (individual):

 

1 piece of assigned coursework, due in week 9

 

25

 

 

Qualifying Condition(s) 

 

A weighted aggregate of 40% is required to pass the module.

 

 

Module Overview

The course introduces concepts of parallel computing by considering different architectures that support this, and working through different categories of examples. The implementation of such solutions and their subsequent analysis gives practical experience and an understanding of the difficulties involved.

 

 

 

Prerequisites/Co-requisites

Good programming skills; prior knowledge of C/C++ is helpful, as these are the languages used throughout the module.

Module Aims

The module aims to develop the student’s ability to think clearly about the relationship between a problem abstraction and architectural implementation details. We focus on the techniques for the development of solutions of scientific computing problems on parallel architectures. A number of case studies are considered to illustrate facets of the subject. This course will enable you to gain experience in building parallel solutions for scientific computing problems. Most of the course will use C and C++ to access the parallel computing libraries of the various architectures.

Learning Outcomes

By the end of the course the students will be able to:

 

1.       explain the major benefits and limitations of parallel computing;

 

2.       identify and explain the differences between available parallel architectures;

 

3.       develop parallel solutions for scientific computing problems on various architectures;

 

analyse the performance of a parallel solution

Module Content
Lesson 1: Introduction

 

·         Motivation (why parallelize?)

 

·         Types of parallelism:

 

-          Implicit vs Explicit

 

-          Floyd Taxonomy

 

·         Example architectures:

 

-          SMP/multicore; Cell BE; GPGPU;

 

-          clusters; grids; clouds;

 

·         Problem bottlenecks/classification:

 

-          compute-bound

 

-          I/O-bound (or memory-bound)

 

·         Programming models:

 

-          shared memory (e.g. threads)

 

-          message-passing (e.g. MPI)

 

-          vectorization

 

·         Introduce systems used in class:

 

-          Intel Core 2 Duo E8400 @3.00GHz (duck/swan labs)

 

-          Intel Xeon E3113 @3.00GHz (ps3 cluster head node)

 

-          nVidia 9400GT (duck/swan labs)

 

-          Cell BE (ps3 cluster)

 

Lesson 2: Threading

 

·         Thread basics

 

-          What are threads?

 

-          Threads and processes

 

-          Advantages of thread programming

 

-          Common models for programming with threads

 

·         POSIX threads

 

-          Creation and termination

 

-          Synchronization primitives

 

-          Joining and detaching

 

-          Mutexes

 

Lesson 3: SIMD Extensions

 

·         Introducing SIMD Extensions

 

-          Vector vs Scalar processing

 

-          Vector data types

 

-          Byte order

 

-          Programming interfaces

 

·         SIMD Programming with GCC

 

-          Rudimentary example

 

-          Vector reorganization

 

-          Data type conversion

 

-          Elimination of conditional branches

 

Lesson 4: Cell BE Programming (part 1)

 

·         The Cell Broadband Engine

 

-          Cell BE Architecture

 

-          Cell BE Programming Model

 

·         SPE Programming

 

-          A simple SPE program

 

-          Using DMA transfer

 

Lesson 5: Cell BE Programming (part 2)

 

·         Parallel SPE Programming

 

-          SIMD programming on the SPE

 

-          Using Multiple SPEs

 

·         Advanced Cell Programming

 

-          Communication between PPE and SPE

 

·         (mailboxes, signal notification registers)

 

Lesson 6: Cell BE Programming (part 3)

 

·         Advanced Cell Programming

 

-          Effective Utilization of DMA Transfer

 

-          Differences in PPE and SPE Data Representation

 

-          Vector Data Alignment

 

-          Scalar Operations on SPE

 

-          SPE Program Embedding

 

Lesson 7: GPU Programming (part 1)

 

·         Introduction to GPU Programming

 

-          GPUs and GPU Architecture

 

-          Introduction to CUDA

 

·         GPU Programming with CUDA

 

-          A simple CUDA example

 

-          Blocks and thread hierarchy

 

-          Memory hierarchy

 

-          Device and host separation

 

-          Putting everything together

 

Lesson 8: GPU Programming (part 2)

 

·         Optimizing Performance

 

·         Advanced CUDA Facilities

 

-          Shared Memory

 

-          Texture Memory

 

-          Page-Locked Host Memory

 

-          Concurrent Execution

 

Lesson 9: Cluster Programming with MPI

 

·         Introduction to Clusters

 

·         Message-Passing Interface API

 

-          MPI Environment Management

 

-          Point-to-Point Communications

 

-          Collective Communications

 

Lesson 10: Exploiting Parallelism

 

·         Introduction to Grids & Public Computing

 

·         Parallelism at different levels

 

-          Granularity and level

 

-          Which way(s) to use?

 

-          System-level concerns

 

·         Closure

 

Methods of Teaching/Learning

The delivery pattern will consist of:

 

  • 10 one-hour lectures in weeks 1-10 (1 per week)

     

  • Labs will be scheduled after class in weeks 1-10, initially based on exercises and subsequently to support coursework

     

Coursework and labs will address learning outcome 3; each assignment and lab exercise will deal with a specific architecture. The labs require specialized equipment, and the students will have 24-hour access to the facilities.

Selected Texts/Journals

There is no single core text that covers the whole course. The following are recommendations for reading.

 

Online Texts:

 

  • Arevalo et al., “Programming the Cell Broadband Engine™ Architecture: Examples and Best Practices”, IBM Redbooks, Aug 2008. ISBN 0738485942. Available online at http://ibm.com/redbooks

     

  • “Cell Broadband Engine Programming Handbook”, Version 1.12, April 2009. Available online.

     

  • “Cell Programming Primer”, Version 1.6, Feb 2008. Available online.

     

  • “nVidia CUDA™ Programming Guide”, Version 2.3, July 2009. Available online at http://nvidia.com/cuda

     

  • “nVidia CUDA™ C Programming Best Practices Guide”, Version 2.3, July 2009. Available online at http://nvidia.com/cuda

     

Recommended Texts:

 

Background / Reference:

 

  • Stroustrup, “The C++ Programming Language”, Special Edition, Addison-Wesley, 2000.

     

  • Eckel, “Thinking in C+rdquo;, Vol. 1-2, Second edition, Prentice-Hall, 2000

     

Last Updated
updated 28/01/2010 jg