Page 82 - B.E CSE Curriculum and Syllabus R2017 - REC
P. 82

Department of CSE, REC



                                         PROFESSIONAL ELECTIVES (PE)
                                                    SEMESTER VI
                                                     ELECTIVE – I
             CS17E61                              HIGH PERFORMANCE COMPUTING                         L T P C
                                                                                                                                              3 0 0 3
             OBJECTIVES:
               ●  To learn the concepts of parallel processing as it pertains to high performance computing and learn
                   the concepts of Multi-core processor.
               ●  To learn to design different optimization techniques on high performance computing.
               ●  To discuss Scheduling and Parallelization Techniques.
               ●  To learn different open source tools.
               ●  To learn the concepts of message passing paradigm using open source APIs.

            UNIT I        MULTI-CORE PROCESSOR                                                                                      9
            Modern  processors-  Stored  program  computer  architecture,  General-purpose  cache-based  microprocessor
            architecture,  Performance  metrics  and  benchmarks,  Transistors  galore:  Moore’s  Law  Pipelining,  Super-
            Scalarity,    SIMD,    Memory  hierarchies,  Cache  ,  Cache  mapping,  Pre-fetch,  Multicore  processors,
            Multithreaded  processors,  Vector  processors,  Design  principles,  Maximum  performance  estimates,
            Programming  for  vector  architectures,  Basic  optimization  techniques  for  serial  code-  Scalar  profiling,
            Function-  and  line-based  runtime  profiling,  Hardware  performance  counters,  Manual  instrumentation,
            Common sense optimizations.

            UNIT II       OPTIMIZATION TECHNIQUES                                                                              9
            Data  access  optimization  -  Balance  analysis  and  light-speed  estimate,  Bandwidth-based  performance
            modelling,  The  STREAM  benchmarks,  Storage  order,  Case  study:  The  Jacobi  algorithm,  Dense  matrix
            transpose, Algorithm classification and access optimizations- O(N)/O(N), O(N2)/O(N2), O(N3)/O(N2), Case
            study: Sparse matrix-vector multiply, Sparse matrix storage schemes, Optimizing JDS sparse MVM, Parallel
            computers-Taxonomy of parallel computing paradigms, Shared-memory computers, Cache coherence, UMA,
            ccNUMA  .  Distributed-memory  computers,  Hierarchical  (hybrid)  systems,  Networks,  Basic  performance
            characteristics of networks, Buses, Switched and fat-tree networks, Mesh networks, Hybrids.

            UNIT III      SCHEDULING AND PARALLELIZATION TECHNIQUES                                            9
            Basics  of  parallelization-  Why  parallelize?,  Parallelism,  Data  parallelism,  Functional  parallelism,  Parallel
            scalability,  Factors  that  limit  parallel  execution,    Scalability  metrics,  Simple  scalability  laws,  Parallel
            efficiency,  Serial  performance  versus  strong  scalability,  Refined  performance  models,  Choosing  the  right
            scaling  baseline,  Shared-memory  parallel  programming  with  OpenMP-  Short  introduction  to  OpenMP,
            Parallel  execution,  Data  scoping,  OpenMP  worksharing  for  loops,  Synchronization,  Reductions,  Loop
            scheduling,  Tasking,  Case  study:  OpenMP-parallel  Jacobi  algorithm,  Advanced  OpenMP:  Wavefront
            parallelization.

            UNIT IV       ARCHITECTURE AND TOOLS                                                                                    9
            Efficient  OpenMP  programming-  Profiling  OpenMP  programs,  Ameliorating  the  impact  of  OpenMP
            worksharing  constructs,  Determining  OpenMP  overhead  for  short  loops,  Serialization,  False  sharing,   case
            study: Parallel sparse  matrix-vector  multiply, Locality  optimizations  on ccNUMA architectures-Locality  of
            access on ccNUMA, Page placement by first touch, Access locality by other means, Case study: ccNUMA
            optimization of sparse MVM, Placement pitfalls, NUMA-unfriendly OpenMP scheduling, File system cache,
            ccNUMA issues with C++, Arrays of objects, Standard Template Library.

            UNIT V        COMMUNICATION TECHNIQUES                                                                                  9
            Distributed-memory parallel programming with MPI, Message passing A short introduction to MPI, A simple
            example,  Messages  and  point-to-point  communication,  Collective  communication,  Nonblocking  point-to-
            point  communication,  Virtual  topologies,  Example:  MPI  parallelization  of  a  Jacobi  solver,  MPI
            implementation,  Performance  properties,  Efficient  MPI  programming,  MPI  performance  tools,
            Communication  parameters,  Synchronization,  serialization,  contention,  Implicit  serialization  and
            Curriculum and Syllabus | B.E. Computer Science and Engineering | R2017                    Page 82
   77   78   79   80   81   82   83   84   85   86   87