Hadoop Development
Big Data (Hadoop) Developer Course outline
Introduction to Big data and Hadoop
- Understanding Big Data
- Challenges in processing Big Data
- 3V Characteristics (Volume, Variety and Velocity)
- Brief history of Hadoop
- How Hadoop addresses Big Data?
- HDFS and MR
- Hadoop echo system
HDFS (Hadoop Distributed File System)
- HDFS Overview and Architecture
- HDFS Keywords like Name Node, Data Node, Heart Beat etc
- Configuring HDFS
- Data Flows (Read and Write)
- HDFS Permissions and Security
- HDFS commands
- Rack Awareness
5 Daemons processes
Map Reduce
- Map Reduce Basics
- Map Reduce Data Flow
- Word count Example solving
- Algorithms for simple and complex problems
- Hadoop Streaming
Developing a Map Reduce Application
- Setting up working environment
- Custom Data types (Writable and Custom Key types)
- Input and Output file formats
- Driver, Mapper and Reducer Code Wal thru
- Configuring IDE Eclipse
- Writing Unit test and running locally
- Map Reduce Web UI
- Hands -on
How Map Reduce works?
Classic Map Reduce (Map Reduce I)
YARN (Map Reduce II)
Job Scheduling
Shuffle and Sort
Failures
Oozie Workflows
Hands-on Excercises
How Map Reduce works?
- Map Reduce Types
- Input formats – Input splits & records, text input, binary input, multiple inputs and database input.
- Output formats - text output, binary output, multiple outputs, Lazy output and database output.
- Hands-on
Hadoop Echo Systems
PIG
- Overview of PIG
- Installation and running PIG
- PIG Latin
- Loading and storing data
- Hands-on
HIVE
- Overview of HIVE
- Installation and running HIVE
- HiveQL
- Tables
- Hands-on
HBASE
Overview of HBASE
Installation
CLinets (avro, REST, Thrift)
Hands-on
SQOOP
Overview of SQOOP
• Solving Case studies