Course Detail

Big Data-Hadoop

Big Data-Hadoop - Technogeeks


Course Detail


Course Description

Bigdata Hadoop Certification Training Course In Pune

Bigdata Hadoop Training and Certification in Pune

MODULE -1 INTRODUCTION TO HADOOP

      • Hadoop- Demo
      • What is Bigdata
      • When data becomes Bigdata
      • 3V’s of Bigdata
      • Introduction to Hadoop Ecosystem
      • Why Hadoop? If Existing Tools and Technologies are there in the market for decades?
      • How Hadoop is getting two categories Projects- New projects on Hadoop
      • Clients want POC and migration of Existing tools and Technologies on Hadoop
      • Clients want POC and migration of Existing tools and Technologies on Hadoop Technology
      • How Open Source tool (HADOOP) is capable to run jobs in lesser time which take longer time in other tools in the market.
      • Hadoop Processing Framework (Map Reduce) / YARN
      • Alternates of Map Reduce
      • Why NoSQL is in more demand nowadays
      • Distributed warehouse for DFS
      • Most demanding tools which can run on the top of Hadoop Ecosystem for specific requirements in specific scenarios
      • Data import/Export tools

MODULE 2 - HADOOP SETUP INSTALLATION AND PIG BASICS

      • Hadoop installation
      • Introduction to Hadoop FS and Processing Environment’s UIs
      • How to read and write files
      • Basic Unix commands for Hadoop
      • Hadoop’s FS shell
      • Hadoop’s releases
      • Hadoop’s daemons

MODULE 3 - HIVE BASIC, HIVE ADVANCED

      • Hive Introduction
      • Hive Advanced
      • Partitioning
      • Bucketing
      • External Tables
      • Complex Use cases in Hive
      • Hive Advanced Assignment
      • Real-time scenarios of Hive

MODULE 4 - MAP REDUCE BASICS, POC (PROOF OF CONCEPT)

      • How Map Reduce works as Processing Framework
      • End to End execution flow of Map Reduce job
      • Different tasks in Map Reduce job
      • Why Reducer is optional while Mapper is mandatory?
      • Introduction to Combiner
      • Introduction to Partitioner
      • Programming languages for Map Reduce
      • Why Java is preferred for Map Reduce programming
      • POC based on Pig, Hive, HDFS, MR

MODULE 5 - MAP-REDUCE ADVANCED, HBASE BASICS

      • How to work on Map Reduce in real-time
      • Map Reduce complex scenarios
      • Drawbacks of Hadoop
      • Why Hadoop can’t be used for real-time processing

MODULE- 6 ZOOKEEPER, SQOOP, QUICK REVISION OF PREVIOUS CLASSES

      • Introduction to Zookeeper
      • How Zookeeper helps in Hadoop Ecosystem
      • How to load data from Relational storage in Hadoop
      • Sqoop basics Sqoop practical implementation
      • Quick revision of previous classes to fill the gap in understanding and correct understandings

MODULE- 7 FLUME, OOZIE, HADOOP RELEASES, INTRODUCTION TO YARN

      • How to load data in Hadoop that is coming from the web server or other storage without fixed schema
      • How to load unstructured and semi-structured data in Hadoop
      • Introduction to Flume
      • Hands-on on Flume
      • How to load Twitter data in HDFS using Hadoop
      • Introduction to Oozie
      • What kind of jobs can be scheduled using Oozie
      • How to schedule time-based jobs
      • Hadoop releases
      • From where to get Hadoop and other components to install
      • Introduction to YARN
      • Significance of YARN

MODULE- 8 INTRODUCTION TO HUE, DIFFERENT VENDORS IN THE MARKET, MAJOR PROJECT DISCUSSION

      • Introduction to Hue
      • How Hue is used in real-time
      • Real-time Hadoop usage
      • Real-time cluster introduction
      • Hadoop Release 1 vs Hadoop Release 2 in real-time
      • Hadoop real-time project
      • Major POC based on the combination of several tools of Hadoop Ecosystem
      • Datasets for practice purpose

MODULE- 9 SPARK AND PYTHON

      • Introduction to Spark
      • Introduction to Python
      • Pyspark concepts
      • Advantages of Spark over Hadoop
      • Is Spark a replacement for Hadoop?
      • How Spark is Faster than Hadoop
      • Spark RDD
      • Spark Transformation and Actions
      • Spark SQL
      • Datasets and Data Frames
      • Real-time scenarios examples of Spark where we prefer Spark over Hadoop
      • How Spark is capable to process complex data sets in lesser time
      • In-Memory Processing Framework for Analytics

MODULE- 10 HADOOP IN CLOUD COMPUTING: AWS

    • Introduction to Cloud Computing
    • On-premises vs cloud setup
    • Major cloud providers of Bigdata
    • What is EMR
    • HDFS vs S3
    • Overview and working of AWS Glue jobs
    • AWS Glue
    • AWS Redshift
    • AWS Athena

Institute Overview

Pune, Maharashtra, India

Our Story Technogeeks is a Group of IT working professionals, located in Pune. Technogeeks Trainers are working on real-time projects on multiple technologies and always believe to share the knowledge and best practices to help the candidates to bui... Read More

Related Courses

Google Map