Big Data with Hadoop Ecosystem
Online Live Classes
Learn Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, Pig, HBase, Spark, Kafka, Oozie, Flume and Sqoop
Students, Working Professionals

Course Overview

Hadoop is an Apache project (i.e. an open-source software) to store & process Big Data. Hadoop stores Big Data in a distributed & fault tolerant manner over commodity hardware. Afterwards, Hadoop tools are used to perform parallel data processing over HDFS (Hadoop Distributed File System).

As organisations have realized the benefits of Big Data Analytics, so there is a huge demand for Big Data & Hadoop professionals. Companies are looking for Big data & Hadoop experts with the knowledge of Hadoop Ecosystem and best practices about HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop & Flume.

What are the objectives of our Big Data Hadoop Live Course?

This course is designed by industry experts to make you an expert Big Data Practitioner. This course offers:
• In-depth knowledge of Big Data and Hadoop including HDFS (Hadoop Distributed File System),
• YARN (Yet Another Resource Negotiator) & MapReduce
• Comprehensive knowledge of various tools that fall in Hadoop Ecosystem like Pig, Hive, Kafka, Sqoop, Flume, Oozie, and HBase
• The capability to ingest data in HDFS using Sqoop & Flume, and analyze those large datasets stored in the HDFS

Why should you go for this course?
Big Data is one of the accelerating and most promising fields, considering all the technologies available in the IT market today. In order to take benefit of these opportunities, you need a structured training with the latest curriculum as per current industry requirements and best practices. Besides strong theoretical understanding, you need to work on various real world big data projects using different Big Data and Hadoop tools as a part of solution strategy. Additionally, you need the guidance of a Hadoop expert who is currently working in the industry on real world Big Data projects and troubleshooting day to day challenges while implementing them.

It will be an online live (Live Stream) class, so you can attend this class from any geographical location. It will be an interactive live session, where you can ask your doubts to the instructor (similar to offline classroom program).

It is a weekend Live classes Batch scheduled on every

  • Saturday - 8:00 PM - 11:00 PM (IST)
  • Sunday- 8:00 PM - 11:00 PM (IST)

What You Will Learn

This course will help you to become a Big Data expert. It will hone your skills by offering you comprehensive knowledge on Hadoop framework, and the required hands-on experience for solving real-time industry-based Big Data projects. This course you will be trained by our expert instructors to:

  • Master the concepts of HDFS (Hadoop Distributed File System), YARN (Yet Another Resource
  • Negotiator), & understand how to work with Hadoop storage & resource management.
  • Understand MapReduce Framework
  • Implement complex business solution using MapReduce
  • Learn data ingestion techniques using Sqoop and Flume
  • Perform ETL operations & data analytics using Pig and Hive
  • Implementing Partitioning, Bucketing and Indexing in Hive
  • Understand HBase, i.e a NoSQL Database in Hadoop, HBase Architecture & Mechanisms
  • Schedule jobs using Oozie
  • Implement best practices for Hadoop development
  • Understand Apache Spark and its Ecosystem
  • Learn how to work with RDD in Apache Spark
  • Work on real world Big Data Analytics Project
  • Work on a real-time Hadoop cluster

Course Content

Learning Objectives: In this module, you will understand what Big Data is, the limitations of the
traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop
Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works.

You will also learn Hadoop Cluster Architecture, important
configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop & Flume, and how to
setup Single Node and Multi-Node Hadoop Cluster.


Topics:

  • Introduction to Big Data & Big Data Challenges
  • Limitations & Solutions of Big Data Architecture
  • Hadoop & its Features
  • Hadoop Ecosystem
  • Hadoop 2.x Core Components
  • Hadoop Storage: HDFS (Hadoop Distributed File System)
  • Hadoop Processing: MapReduce Framework
  • Different Hadoop Distributions
  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability Architecture
  • Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single Node Cluster & Multi-Node Cluster set up
  • Basic Hadoop Administration

Learning Objectives: In this module, you will understand Hadoop MapReduce framework
comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced
MapReduce concepts like Input Splits, Combiner & Partitioner.

You will also learn Advanced MapReduce concepts such as Counters,
Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format.


Topics:

  • Traditional way vs MapReduce way
  • Why MapReduce
  • YARN Components
  • YARN Architecture
  • YARN MapReduce Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Input Splits, Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Counters
  • Distributed Cache
  • Reduce Join
  • Custom Input Format
  • Sequence Input Format

Learning Objectives: In this module, you will learn Apache Pig, types of use cases where we can use
Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF,
Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset. We will start Apache Hive as well.


Topics:

  • Introduction to Apache Pig
  • MapReduce vs Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Introduction to Apache Hive
  • Hive vs Pig

Learning Objectives: This module will help you in understanding Hive concepts, Hive Data types,
loading and querying data in Hive, running hive scripts and Hive UDF.


Topics:

  • Hive Architecture and Components
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive Partition
  • Hive Bucketing
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data & Managing Outputs
  • Hive Script & Hive UDF

Learning Objectives: In this module, you will understand advanced Apache Hive concepts such as

UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive. You will also acquire in-
depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.

Topics:

  • Hive QL: Joining Tables, Dynamic Partitioning
  • Hive Indexes and views
  • Hive Query Optimizers
  • Hive UDF
  • Apache HBase: Introduction to NoSQL Databases and HBase
  • HBase v/s RDBMS
  • HBase Components
  • HBase Architecture
  • HBase Run Modes

Learning Objectives: This module will cover advance Apache HBase concepts. We will see demos on
HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in
monitoring a cluster & why HBase uses Zookeeper.


Topics:

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Hive Data Loading Techniques
  • Apache Zookeeper Introduction
  • ZooKeeper Data Model
  • Zookeeper Service
  • HBase Bulk Loading
  • Getting and Inserting Data
  • HBase Filters
Show More

Course Mentors


Sagar Jain (Instructor)

BigData Engineer | Ex- American Express, Siemens |  6+ years of Exp.


Sagar has 6+ years of technical experience and has served in AMEX and Siemens. He has sound knowledge of Big data tech stack, java, scala, python, scalable data-pipelines, Dynamic scheduling of jobs for better resource utilization. He is very passionate about data and handling the same with new technology coming to the market.


FAQs

  1. How will these classes be conducted?
    It will be an online live (Live Stream) class, so you can attend this class from any geographical location. It will be an interactive live session, where you can ask your doubts to the instructor (similar to our offline classroom program). You just need to have working internet and a PC/Laptop.
     
  2. Is there any number to contact for any query?
    You may call us on our toll-free number: 1800 123 8622  or Drop us an email at geeks.classes@geeksforgeeks.org
     
  3. How will we work on the projects?
    You will begin the project with the help of a course mentor. Each student will be guided by the mentor in the class itself.
     
  4. Is this a certification course?
    Yes, It's a GeeksforGeeks certified program that includes projects along with learning. All students will receive a completion certificate.
     
  5. What is the size of a batch?
    The planned batch size is 40.
     
  6. How can I register for the course?
    Click on the Signup for free button & Pay Fees online
     
  7. What are the course duration and class timing?
    The course includes 18 lectures which will be completed in 9 weeks.
    It is a weekend Live classes batch scheduled on every Saturday & Sunday at 8:00 PM - 11:00 PM (IST).
  8. When can i access the recorded session of the class (if someone misses the live class)?
    The recorded session of the class will be uploaded in 2 working days.

Course Registration

Batch Date Type Register
BDHL-1 19 September '20 to 15 November '20 Live Classes
Registration Closed