COM3012


	Registry > Module Catalogue

View Module List by A.O.U. and Level

Alphabetical Module Code List

Alphabetical Module Title List

Alphabetical Old Short Name List

View Menu

2010/1 Module Catalogue

Module Code: COM3012

Module Title: PARALLEL ARCHITECTURES

Module Provider: Computing	Short Name: COM3012
Level: HE3	Module Co-ordinator: BRIFFA JA Dr (Computing)
Number of credits: 15	Number of ECTS credits: 7.5

Module Availability

SPRING SEMESTER 2011

Assessment Pattern

Unit(s) of Assessment

Weighting Towards Module Mark( %)

Exam:

4 hour practical examination

50

Coursework (individual):

1 piece of assigned coursework, due in week 6

25

Coursework (individual):

1 piece of assigned coursework, due in week 10

25

Qualifying Condition(s)

A weighted aggregate of 40% is required to pass the module.

Module Overview

The course introduces concepts of parallel computing by considering different architectures that support this, and working through different categories of examples. The implementation of such solutions and their subsequent analysis gives practical experience and an understanding of the difficulties involved.

Prerequisites/Co-requisites

Good programming skills; prior knowledge of C/C++ is helpful, as these are the languages used throughout the module. Students who do not have a C/C++ background are strongly encouraged to take a prior course on C++.

Module Aims

The module aims to develop the student’s ability to think clearly about the relationship between a problem abstraction and architectural implementation details. We focus on the techniques for the development of solutions of scientific computing problems on parallel architectures. A number of case studies are considered to illustrate facets of the subject. This course will enable you to gain experience in building parallel solutions for scientific computing problems. Most of the course will use C and C++ to access the parallel computing libraries of the various architectures.

Learning Outcomes

By the end of the course the students will be able to:

1.       explain the major benefits and limitations of parallel computing;

2.       identify and explain the differences between available parallel architectures;

3.       develop parallel solutions for scientific computing problems on various architectures;

4.       analyse the performance of a parallel solution

Module Content

Module Content

The lesson schedule and contents below is subject to change, following revisions based on earlier deivery.

Lesson 1: Introduction

·          Motivation (why parallelize?)

·          Types of parallelism:

-          Implicit vs Explicit

-          Floyd Taxonomy

·          Example architectures:

-          SMP/multicore; Cell BE; GPGPU;

-          clusters; grids; clouds;

·          Problem bottlenecks/classification:

-          compute-bound

-          I/O-bound (or memory-bound)

·          Programming models:

-          shared memory (e.g. threads)

-          message-passing (e.g. MPI)

-          vectorization

·          Introduce systems used in class:

-          Intel Core 2 Duo E8400 @3.00GHz (duck/swan labs)

-          Intel Xeon E3113 @3.00GHz (ps3 cluster head node)

-          nVidia 9400GT (duck/swan labs)

-          Cell BE (ps3 cluster)

Lesson 2: Threading

·          Thread basics

-          What are threads?

-          Threads and processes

-          Advantages of thread programming

-          Common models for programming with threads

·          POSIX threads

-          Creation and termination

-          Synchronization primitives

-          Joining and detaching

-          Mutexes

Lesson 3: SIMD Extensions

·          Introducing SIMD Extensions

-          Vector vs Scalar processing

-          Vector data types

-          Byte order

-          Programming interfaces

·          SIMD Programming with GCC

-          Rudimentary example

-          Vector reorganization

-          Data type conversion

-          Elimination of conditional branches

Lesson 4: Cell BE Programming (part 1)

·          The Cell Broadband Engine

-          Cell BE Architecture

-          Cell BE Programming Model

·          SPE Programming

-          A simple SPE program

-          Using DMA transfer

Lesson 5: Cell BE Programming (part 2)

·          Parallel SPE Programming

-          SIMD programming on the SPE

-          Using Multiple SPEs

·          Advanced Cell Programming

-          Communication between PPE and SPE

·          (mailboxes, signal notification registers)

Lesson 6: Cell BE Programming (part 3)

·          Advanced Cell Programming

-          Effective Utilization of DMA Transfer

-          Differences in PPE and SPE Data Representation

-          Vector Data Alignment

-          Scalar Operations on SPE

-          SPE Program Embedding

Lesson 7: GPU Programming (part 1)

·          Introduction to GPU Programming

-          GPUs and GPU Architecture

-          Introduction to CUDA

·          GPU Programming with CUDA

-          A simple CUDA example

-          Blocks and thread hierarchy

-          Memory hierarchy

-          Device and host separation

-          Putting everything together

Lesson 8: GPU Programming (part 2)

·          Optimizing Performance

·          Advanced CUDA Facilities

-          Shared Memory

-          Texture Memory

-          Page-Locked Host Memory

-          Concurrent Execution

Lesson 9: Cluster Programming with MPI

·          Introduction to Clusters

·          Message-Passing Interface API

-          MPI Environment Management

-          Point-to-Point Communications

-          Collective Communications

Lesson 10: Exploiting Parallelism

·          Introduction to Grids & Public Computing

·          Parallelism at different levels

-          Granularity and level

-          Which way(s) to use?

-          System-level concerns

·          Closure

Methods of Teaching/Learning

The delivery pattern will consist of:

· 10 one-hour lectures in weeks 1-10 (1 per week)

· Labs will be scheduled after class in weeks 1-10, initially based on exercises and subsequently to support coursework

Coursework and labs will address learning outcome 3; each assignment and lab exercise will deal with a specific architecture. The labs require specialized equipment, and the students will have 24-hour access to the facilities.

Selected Texts/Journals

There is no single core text that covers the whole course. The following are recommendations for reading.

Online Texts:

·          Arevalo et al., “Programming the Cell Broadband Engine™ Architecture: Examples and Best Practices”, IBM Redbooks, Aug 2008. ISBN 0738485942. Available online at http://ibm.com/redbooks

·          “Cell Broadband Engine Programming Handbook”, Version 1.12, April 2009. Available online.

·          “Cell Programming Primer”, Version 1.6, Feb 2008. Available online.

·          “nVidia CUDA™ Programming Guide”, Version 2.3, July 2009. Available online at http://nvidia.com/cuda

·          “nVidia CUDA™ C Programming Best Practices Guide”, Version 2.3, July 2009. Available online at http://nvidia.com/cuda

Recommended Texts:

·          Grama et al., “Introduction to Parallel Computing”, Second edition, Addison-Wesley, 2003.

·          Mattson et al., “Patterns for Parallel Programming”, Addison-Wesley, 2005

·          Scarpino, "Programming the Cell Processor", Prentice Hall, 2009.

·          David B. Kirk and Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, Morgan Kaufmann, 2010. ISBN 0123814723

Background / Reference:

·          Stroustrup, “The C++ Programming Language”, Special Edition, Addison-Wesley, 2000.

·          Eckel, “Thinking in C+rdquo;, Vol. 1-2, Second edition, Prentice-Hall, 2000

Last Updated

APRIL 2011

Disclaimer | Accessibility | Freedom of Information| Oracle Portal