Page 78 - B.E CSE Curriculum and Syllabus R2017 - REC
P. 78
Department of CSE, REC
2. Kai Hwang, Geoffery C. Fox and Jack J. Dongarra, Distributed and Cloud Computing: Clusters,
Grids, Clouds and the Future of Internet, First Edition, Morgan Kaufman Publisher, an Imprint of
Elsevier, 2012.
REFERENCES:
1. Michael J. Kavis Architecting the Cloud: Design Decisions for Cloud Computing Service Models
(SaaS, PaaS, and IaaS), First Edition, Wiley.
2. Tom White, Hadoop: The Definitive Guide, Yahoo Press, 2014.
3. Rajkumar Buyya, Christian Vecchiola, and Thamarai Selvi, Mastering Cloud Computing, Tata
McGraw Hill, 2013.
4. John W.Rittinghouse and James F.Ransome, Cloud Computing: Implementation, Management, and
Security, CRC Press, 2010.
IT17701 DATA ANALYTICS L T P C
(Common to B.E. CSE and B.Tech. IT) 3 0 0 3
OBJECTIVES:
To introduce the concepts of Big Data and Hadoop
To help understand HDFS and Map reduce concepts
To imbibe the Hadoop Eco System of NoSQL
To describe the data stream analytics methodologies
To narrate various data analysis techniques
UNIT I INTRODUCTION TO BIG DATA AND HADOOP 6
Introduction to Big Data, Types of Digital Data, Challenges of conventional systems - Web data, Evolution of
analytic processes and tools, Analysis Vs reporting - Big Data Analytics, Introduction to Hadoop - Distributed
Computing Challenges - History of Hadoop, Hadoop Eco System.
UNIT II HDFS (HADOOP DISTRIBUTED FILE SYSTEM) AND MAP REDUCE 6
Hadoop Overview – Use case of Hadoop – Hadoop Distributors – HDFS – Processing Data with Hadoop –
Map Reduce - Managing Resources and Applications with Hadoop YARN – Interacting with Hadoop
Ecosystem.
UNIT III NOSQL DATABASES 12
NoSQL - Pig - Introduction to Pig, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig
Latin, User Defined Functions, Data Processing operators - Hive - Hive Shell, Hive Services, Hive Metastore,
Comparison with Traditional Databases, HiveQL, Tables, Querying – MongoDB - Needs-Terms-Data Types-
Query Language – Cassandra -Introduction-Features-Querying Commands.
UNIT IV MINING DATA STREAMS 9
Introduction to Streams Concepts – Stream data model and architecture - Stream Computing, Sampling data in
a stream – Filtering streams – Counting distinct elements in a stream – Estimating moments – Counting
oneness in a window – Decaying window – Real time Analytics Platform(RTAP) applications - case studies –
real time sentiment analysis, stock market predictions.
UNIT V DATA ANALYSIS AND VISUALIZATION 12
Regression modelling, Multivariate analysis, Decision Trees, Support vector and kernel methods, Neural
networks: learning and generalization, competitive learning, principal component analysis and neural
networks; Clustering Techniques – Hierarchical – K- Means – Clustering high dimensional data –
Frequent pattern based clustering methods – Clustering in Non-Euclidean space – Clustering for streams
and Parallelism- Visualization - Time series analysis.
TOTAL: 45 PERIODS
OUTCOMES:
At the end of the course, student will be able to:
understand the usage scenarios of Big Data Analysis and Hadoop framework
Apply Mapreduce over HDFS
Curriculum and Syllabus | B.E. Computer Science and Engineering | R2017 Page 78

