Page 25 - REC :: M.E. CSE Curriculum and Syllabus - R2019
P. 25

CP19P06                        BIG DATA ANALYTICS                         Category   L  T  P  C
                                                                                              PE      3  0  0  3


               Objectives:
                   Gaining factual knowledge regarding data acquisition, data cleansing, and various aspects of data analytics and
                ⚫
                   visualization
                ⚫   Learning the principles of data analytics and its underlying methods and algorithms
                   Learning to apply the methods of Distributed data storage and processing using Hadoop related tools and Map
                ⚫
                   Reduce Concepts
                ⚫   Understand the necessity of Streaming Data Analysis and its applications
                   Developing the skills necessary to use related software tools to perform data collection, cleansing, and
                ⚫
                   analytics


               UNIT-I     BIG DATA ANALYTICS                                                               9
               Introduction  to  Big  Data  Analytics,  Data  Structures,  BI  Vs  Analytics,  Analytic  Architecture,  Data  Analytics  Life
               Cycle,  R  Language  for  Data  Analytics,  Basic  Features,  Data  Import  and  Export,  Descriptive  Statistics,  Predictive
               Analytics

               UNIT-II    ANALYTICAL THEORY                                                                9
               Overview  of  Clustering,  Classification  and  Correlation,  K-means,  Supervised  and  Unsupervised  Learning,  Linear,
               Logistics  and  Lasso  Regression,  Bayesian  Modelling,  Time  Series  Analysis,  Association  Analysis  and  Cluster
               Analysis.

               UNIT-III   HADOOP ECOSYSTEM                                                                 9
               Hadoop Stack for Big Data, Processing Data with Hadoop, HDFS, Hadoop MapReduce 2.0, Job Scheduling, Shuffle
               and sort, Hadoop Related Technologies: Hive, Mahout, Zookeeper, HBase, and Cassandra.

               UNIT-IV    STREAMING DATA ANALYTICS                                                         9
               Introduction  to  Streams  Concepts  –  Stream  data  model  and  architecture  -  Stream  Computing,  Sampling  data  in  a
               stream – Filtering streams – Counting distinct elements in a stream – Estimating moments – Counting oneness in a
               window – Decaying window  - Realtime Analytics Platform(RTAP) applications - case studies - real time sentiment
               analysis, stock market predictions.

               UNIT-V     ADVANCED TOOLS FOR ANALYTICS                                                     9
               Stream Analytics using Apache Spark and Flink, Graph Database using Neo4J, Applications of Spark ML library, In-
               Memory Databases: VoltDB, SciDB, Data Analytics in Cloud: Tableau, AWS Kinesis, and AWS EMR.

                                                                                   Total Contact Hours   :  45


               Course Outcomes:
               Upon completion of the course, students will be able to:
                ⚫   Analyze the importance of analytics and identify the features of it.
                ⚫   Understands different types of supervised and unsupervised learning algorithms
                ⚫   Examine the implementation techniques for big data analysis.
                ⚫   Implement the streaming data sets in stream processors
                ⚫   Learn various tools to execute datasets in real-time.


               Reference Books(s) :
                     EMC  Education  Services,  “Data  Science  and  Big  Data  Analytics:  Discovering,  Analyzing,  Visualizing  and
                1
                     Presenting Data”, Wiley, 2015.
                     Jen Stirrup, and Ruben Oliva Ramos, “Advanced Analytics with R and Tableau“,  Packt Publishing Limited,
                2
                     2017.
                3    Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press, 2012.
   20   21   22   23   24   25   26   27   28   29   30