Introduction to Big Data
• Characteristics of Big Data
• Why is parallel computing important
• Discuss various products developed by vendors
Introducing Hadoop
• Components of Hadoop
• Starting Hadoop
• Identify various processes
Working with HDFS
• Basic file commands
• Web Based User Interface
• Reading & Writing to files
• Run a word count program
• View jobs in the Web UI
Installation & Configuration of Hadoop
• Types of installation (RPM’s & Tar files)
• Set up ‘ssh’ for the Hadoop cluster
• Tree structure
• XML, masters and slaves files
• Checking system health
• Discuss block size and replication factor
• Benchmarking the cluster
Advanced administration activities
• Adding and de-commissioning nodes
• Purpose of secondary name node
• Recovery from a failed name node
• Managing quotas
• Enabling trash
Monitoring the Hadoop Cluster
• Hadoop infrastructure monitoring
• Hadoop specific monitoring
Other Components of the Hadoop ecosystem
• Discuss Hive, Sqoop, Pig, HBase, Flume
• Use cases of each
• Use Hadoop streaming to write code in Perl / Python