Hadoop DEVELOPER

image

ABOUT THE COURSES

ENROLL THIS COURSE

Hadoop  DEVELOPER & ADMIN With Cassandra & Impala 

 

 

1. Understanding Big Data and Hadoop 4hrs

Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, how MapReduce Framework works. 

 

Topics - Big Data, Limitations and Solutions of existing Data Analytics Architecture, Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Hadoop Different Distributions.

2. Hadoop Architecture and HDFS 6hrs  Hands On for Cluster Setup

Learning Objectives - In this module, you will learn the Hadoop Cluster Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques, how to setup single node and multi node hadoop cluster. 

 

Topics - Hadoop 2.x Cluster Architecture - Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Single node cluster set up Hadoop Administration.

 

3. Hadoop MapReduce Framework : 6hrs Lab

Learning Objectives - In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets. 

 

Topics - MapReduce Use Cases, Traditional way Vs MapReduce way, Why MapReduce, Hadoop 2.x MapReduce Architecture, Hadoop 2.x MapReduce Components, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce. Input Splits, Relation between Input Splits and HDFS Blocks, MapReduce: Combiner & Partitioner.

 

4. Pig : 8hrs (6hrs ) Lab

Learning Objectives - In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset. 

 

Topics - About Pig, MapReduce Vs Pig, Pig Use Cases, Programming Structure in Pig, Pig Running Modes, Pig components, Pig Execution, Pig Latin Program, Data Models in Pig, Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank.

 

5. Hive : 8hrs Lab

Learning Objectives - This module will help you in understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive, running hive scripts and Hive UDF. 

 

Topics - Hive Background, Hive Use Case, About Hive, Hive Vs Pig, Hive Architecture and Components, Metastore in Hive, Limitations of Hive, Comparison with Traditional Database, Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set. Advanced Hive concepts such as UDF, Dynamic Partitioning

   

 

Apache SQOOP : 2hrs

·         Introduction to Sqoop

·         MySQL client and  Server Installation Sqoop Installation

·         How to connect to Relational Database using Sqoop Sqoop Commands and Examples on Import and Export commands

Apache FLUME : 2HRS

 

·         Introduction to flume Flume installation

 

·         Flume agent usage and Flume examples execution

·         REAL TIME EXAMPLE WITH TWITTER STREAMING

 

Apache OOZIE : 1HRS

·         Introduction to oozie Oozie installation

o   Executing oozie workflow jobs Monitering Oozie workflow jobs

Apache ZOOKEEPER : 1HRS

·         Introduction to Zookeeper

·         Configuring Zookeeper

·         what is the role of zookeeper

·         One use case with zookeeper

 

 

 

 

NO SQL : HBASE : 10HRS

=================

 

o   Introduction

o   Quick Start - Standalone HBase

 

ü  Apache HBase Configuration

o   Configuration Files

o   Basic Prerequisites

o   HBase run modes: Standalone and Distributed

o   Running and Confirming Your Installation

o    Default Configuration

o   Example Configurations

o   The Important Configurations

o   Dynamic Configuration

o    

ü  Data Model

o    Conceptual View

o   Physical View

o   Namespace

o   Table

o   Row

o   Column Family

o   Cells

o   Data Model Operations

o   Versions

o   Sort Order

o   Column Metadata

o   Joins

o   3ACID

 

ü  HBase and Schema Design

 

o    Schema Creation

o   Table Schema Rules Of Thumb

ü  RegionServer Sizing Rules of Thumb

o    On the number of column families

o   Rowkey Design

o   Number of Versions

o    Supported Datatypes

o    Joins

o   Time To Live (TTL)

o    Keeping Deleted Cells

o   Secondary Indexes and Alternate Query Paths

o    Constraints

o   Schema Design Case Studies

o   Operational and Performance Configuration Options

o    Special Cases

 

 

 

 

ü  HBase and MapReduce

 

o   HBase, MapReduce, and the CLASSPATH

o   MapReduce Scan Caching

o   Bundled HBase MapReduce Jobs

o    HBase as a MapReduce Job Data Source and Data Sink

o    Writing HFiles Directly During Bulk Import

o    RowCounter Example

o   Map-Task Splitting

o   HBase MapReduce Examples

o   Accessing Other HBase Tables in a MapReduce Job

o    Speculative Execution

o    Cascading

o    

ü  Securing Apache HBase

 

o    Using Secure HTTP (HTTPS) for the Web UI

o    Using SPNEGO for Kerberos authentication with Web UIs

o    Secure Client Access to Apache HBase

o   Simple User Access to Apache HBase

o   Securing Access to HDFS and ZooKeeper

o   Securing Access To Your Data

o   Security Configuration Example

 

ü  Architecture

o    Overview

o    Catalog Tables

o   Client

o   Client Request Filters

o   Master

o    RegionServer

o   Regions

o    Bulk Loading

o   HDFS

o   Timeline-consistent High Available Reads

o   Storing Medium-sized Objects (MOB)

ü  Apache HBase APIs

o   Examples

 

o    Bulk Load

ENROLL THIS COURSE