Page 82 - B.E CSE Curriculum and Syllabus R2017 - REC
P. 82
Department of CSE, REC
PROFESSIONAL ELECTIVES (PE)
SEMESTER VI
ELECTIVE – I
CS17E61 HIGH PERFORMANCE COMPUTING L T P C
3 0 0 3
OBJECTIVES:
● To learn the concepts of parallel processing as it pertains to high performance computing and learn
the concepts of Multi-core processor.
● To learn to design different optimization techniques on high performance computing.
● To discuss Scheduling and Parallelization Techniques.
● To learn different open source tools.
● To learn the concepts of message passing paradigm using open source APIs.
UNIT I MULTI-CORE PROCESSOR 9
Modern processors- Stored program computer architecture, General-purpose cache-based microprocessor
architecture, Performance metrics and benchmarks, Transistors galore: Moore’s Law Pipelining, Super-
Scalarity, SIMD, Memory hierarchies, Cache , Cache mapping, Pre-fetch, Multicore processors,
Multithreaded processors, Vector processors, Design principles, Maximum performance estimates,
Programming for vector architectures, Basic optimization techniques for serial code- Scalar profiling,
Function- and line-based runtime profiling, Hardware performance counters, Manual instrumentation,
Common sense optimizations.
UNIT II OPTIMIZATION TECHNIQUES 9
Data access optimization - Balance analysis and light-speed estimate, Bandwidth-based performance
modelling, The STREAM benchmarks, Storage order, Case study: The Jacobi algorithm, Dense matrix
transpose, Algorithm classification and access optimizations- O(N)/O(N), O(N2)/O(N2), O(N3)/O(N2), Case
study: Sparse matrix-vector multiply, Sparse matrix storage schemes, Optimizing JDS sparse MVM, Parallel
computers-Taxonomy of parallel computing paradigms, Shared-memory computers, Cache coherence, UMA,
ccNUMA . Distributed-memory computers, Hierarchical (hybrid) systems, Networks, Basic performance
characteristics of networks, Buses, Switched and fat-tree networks, Mesh networks, Hybrids.
UNIT III SCHEDULING AND PARALLELIZATION TECHNIQUES 9
Basics of parallelization- Why parallelize?, Parallelism, Data parallelism, Functional parallelism, Parallel
scalability, Factors that limit parallel execution, Scalability metrics, Simple scalability laws, Parallel
efficiency, Serial performance versus strong scalability, Refined performance models, Choosing the right
scaling baseline, Shared-memory parallel programming with OpenMP- Short introduction to OpenMP,
Parallel execution, Data scoping, OpenMP worksharing for loops, Synchronization, Reductions, Loop
scheduling, Tasking, Case study: OpenMP-parallel Jacobi algorithm, Advanced OpenMP: Wavefront
parallelization.
UNIT IV ARCHITECTURE AND TOOLS 9
Efficient OpenMP programming- Profiling OpenMP programs, Ameliorating the impact of OpenMP
worksharing constructs, Determining OpenMP overhead for short loops, Serialization, False sharing, case
study: Parallel sparse matrix-vector multiply, Locality optimizations on ccNUMA architectures-Locality of
access on ccNUMA, Page placement by first touch, Access locality by other means, Case study: ccNUMA
optimization of sparse MVM, Placement pitfalls, NUMA-unfriendly OpenMP scheduling, File system cache,
ccNUMA issues with C++, Arrays of objects, Standard Template Library.
UNIT V COMMUNICATION TECHNIQUES 9
Distributed-memory parallel programming with MPI, Message passing A short introduction to MPI, A simple
example, Messages and point-to-point communication, Collective communication, Nonblocking point-to-
point communication, Virtual topologies, Example: MPI parallelization of a Jacobi solver, MPI
implementation, Performance properties, Efficient MPI programming, MPI performance tools,
Communication parameters, Synchronization, serialization, contention, Implicit serialization and
Curriculum and Syllabus | B.E. Computer Science and Engineering | R2017 Page 82

