Page 78 - B.E CSE Curriculum and Syllabus R2017

Page 78 - B.E CSE Curriculum and Syllabus R2017 - REC

P. 78

Department of CSE, REC

2. Kai Hwang, Geoffery C. Fox and Jack J. Dongarra, Distributed and Cloud Computing: Clusters,
Grids, Clouds and the Future of Internet, First Edition, Morgan Kaufman Publisher, an Imprint of
Elsevier, 2012.
REFERENCES:
1. Michael J. Kavis Architecting the Cloud: Design Decisions for Cloud Computing Service Models
(SaaS, PaaS, and IaaS), First Edition, Wiley.
2. Tom White, Hadoop: The Definitive Guide, Yahoo Press, 2014.
3. Rajkumar Buyya, Christian Vecchiola, and Thamarai Selvi, Mastering Cloud Computing, Tata
McGraw Hill, 2013.
4. John W.Rittinghouse and James F.Ransome, Cloud Computing: Implementation, Management, and
Security, CRC Press, 2010.

IT17701 DATA ANALYTICS L T P C
(Common to B.E. CSE and B.Tech. IT) 3 0 0 3

OBJECTIVES:
 To introduce the concepts of Big Data and Hadoop
 To help understand HDFS and Map reduce concepts
 To imbibe the Hadoop Eco System of NoSQL
 To describe the data stream analytics methodologies
 To narrate various data analysis techniques
UNIT I INTRODUCTION TO BIG DATA AND HADOOP 6
Introduction to Big Data, Types of Digital Data, Challenges of conventional systems - Web data, Evolution of
analytic processes and tools, Analysis Vs reporting - Big Data Analytics, Introduction to Hadoop - Distributed
Computing Challenges - History of Hadoop, Hadoop Eco System.

UNIT II HDFS (HADOOP DISTRIBUTED FILE SYSTEM) AND MAP REDUCE 6
Hadoop Overview – Use case of Hadoop – Hadoop Distributors – HDFS – Processing Data with Hadoop –
Map Reduce - Managing Resources and Applications with Hadoop YARN – Interacting with Hadoop
Ecosystem.

UNIT III NOSQL DATABASES 12
NoSQL - Pig - Introduction to Pig, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig
Latin, User Defined Functions, Data Processing operators - Hive - Hive Shell, Hive Services, Hive Metastore,
Comparison with Traditional Databases, HiveQL, Tables, Querying – MongoDB - Needs-Terms-Data Types-
Query Language – Cassandra -Introduction-Features-Querying Commands.

UNIT IV MINING DATA STREAMS 9
Introduction to Streams Concepts – Stream data model and architecture - Stream Computing, Sampling data in
a stream – Filtering streams – Counting distinct elements in a stream – Estimating moments – Counting
oneness in a window – Decaying window – Real time Analytics Platform(RTAP) applications - case studies –
real time sentiment analysis, stock market predictions.
UNIT V DATA ANALYSIS AND VISUALIZATION 12
Regression modelling, Multivariate analysis, Decision Trees, Support vector and kernel methods, Neural
networks: learning and generalization, competitive learning, principal component analysis and neural
networks; Clustering Techniques – Hierarchical – K- Means – Clustering high dimensional data –
Frequent pattern based clustering methods – Clustering in Non-Euclidean space – Clustering for streams
and Parallelism- Visualization - Time series analysis.
TOTAL: 45 PERIODS
OUTCOMES:
At the end of the course, student will be able to:
 understand the usage scenarios of Big Data Analysis and Hadoop framework
 Apply Mapreduce over HDFS
Curriculum and Syllabus | B.E. Computer Science and Engineering | R2017 Page 78

73 74 75 76 77 78 79 80 81 82 83