Hadoop Training Courses

Hadoop Training

Apache Hadoop is an open-source implementation of two core Google BigData solutions: GFS (Google File System) and MapReduce programming paradigm. It is a complete framework destined for storing and processing large data sets. Hadoop is used by most of the global cloud service providers including such leaders like Yahoo, Facebook or LinkedIn.


Hadoop Course Outlines

ID Name Duration Overview
238322 Przygotowanie do egzaminu CCAH (Certified Administrator for Apache Hadoop) 35 hours
833734 Data Analysis with Hive/HiveQL 7 hours This course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive Hive Overview Architecture and design Aata types SQL support in Hive Creating Hive tables and querying Partitions Joins Text processing labs : various labs on processing data with Hive DQL (Data Query Language) in Detail SELECT clause Column aliases Table aliases Date types and Date functions Group function Table joins JOIN clause UNION operator Nested queries Correlated subqueries
118127 Model MapReduce and Apache Hadoop 14 hours The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers. Data Mining and Business Intelligence Introduction Area of application Capabilities Basics of data exploration Big data What does Big data stand for? Big data and Data mining MapReduce Model basics Example application Stats Cluster model Hadoop What is Hadoop Installation Configuration Cluster settings Architecture and configuration of Hadoop Distributed File System Console tools DistCp tool MapReduce and Hadoop Streaming Administration and configuration of Hadoop On Demand Alternatives
238323 Administrator Training for Apache Hadoop 35 hours Audience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster administration. 1: HDFS (17%) Describe the function of HDFS Daemons Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing. Identify current features of computing systems that motivate a system like Apache Hadoop. Classify major goals of HDFS Design Given a scenario, identify appropriate use case for HDFS Federation Identify components and daemon of an HDFS HA-Quorum cluster Analyze the role of HDFS security (Kerberos) Determine the best data serialization choice for a given scenario Describe file read and write paths Identify the commands to manipulate files in the Hadoop File System Shell 2: YARN and MapReduce version 2 (MRv2) (17%) Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons Understand basic design strategy for MapReduce v2 (MRv2) Determine how YARN handles resource allocations Identify the workflow of MapReduce job running on YARN Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN. 3: Hadoop Cluster Planning (16%) Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster. Analyze the choices in selecting an OS Understand kernel tuning and disk swapping Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario 4: Hadoop Cluster Installation and Administration (25%) Given a scenario, identify how the cluster will handle disk and machine failures Analyze a logging configuration and logging configuration file format Understand the basics of Hadoop metrics and cluster health monitoring Identify the function and purpose of available tools for cluster monitoring Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig Identify the function and purpose of available tools for managing the Apache Hadoop file system 5: Resource Management (10%) Understand the overall design goals of each of Hadoop schedulers Given a scenario, determine how the FIFO Scheduler allocates cluster resources Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN Given a scenario, determine how the Capacity Scheduler allocates cluster resources 6: Monitoring and Logging (15%) Understand the functions and features of Hadoop’s metric collection abilities Analyze the NameNode and JobTracker Web UIs Understand how to monitor cluster Daemons Identify and monitor CPU usage on master nodes Describe how to monitor swap and memory allocation on all nodes Identify how to view and manage Hadoop’s log files Interpret a log file
417006 Hadoop Administration on MapR 28 hours Audience: This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand. Big Data Overview: What is Big Data Why Big Data is gaining popularity Big Data Case Studies Big Data Characteristics Solutions to work on Big Data. Hadoop & Its components: What is Hadoop and what are its components. Hadoop Architecture and its characteristics of Data it can handle /Process. Brief on Hadoop History, companies using it and why they have started using it. Hadoop Frame work & its components- explained in detail. What is HDFS and Reads -Writes to Hadoop Distributed File System. How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster. (This includes setting up a Hadoop cluster in VM BOX/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster). What is Map Reduce frame work and how it works. Running Map Reduce jobs on Hadoop cluster. Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters. Hadoop Cluster Planning: How to plan your hadoop cluster. Understanding hardware-software to plan your hadoop cluster. Understanding workloads and planning cluster to avoid failures and perform optimum. What is MapR and why MapR : Overview of MapR and its architecture. Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors. Planning a cluster in context of MapR. Comparison of MapR with other distributions and Apache Hadoop. MapR installation and cluster deployment. Cluster Setup & Administration: Managing services, nodes ,snapshots, mirror volumes and remote clusters. Understanding and managing Nodes. Understanding of Hadoop components, Installing Hadoop components alongside MapR Services. Accessing Data on cluster including via NFS Managing services & nodes. Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security. Understanding and working with M7- Native storage for MapR tables. Cluster configuration and tuning for optimum performance. Cluster upgrade and integration with other setups: Upgrading software version of MapR and types of upgrade. Configuring Mapr cluster to access HDFS cluster. Setting up MapR cluster on Amazon Elastic Mapreduce. All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.
806636 Hadoop for Developers (4 days) 28 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.   Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software Lab : first look at Hadoop Section 2: HDFS Design and architecture concepts (horizontal scaling, replication, data locality, rack awareness) Daemons : Namenode, Secondary namenode, Data node communications / heart-beats data integrity read / write path Namenode High Availability (HA), Federation labs : Interacting with HDFS Section 3 : Map Reduce concepts and architecture daemons (MRV1) : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Map Reduce Version 1 and Version 2 (YARN) Internals of Map Reduce Introduction to Java Map Reduce program labs : Running a sample MapReduce program Section 4 : Pig pig vs java map reduce pig job flow pig latin language ETL with Pig Transformations & Joins User defined functions (UDF) labs : writing Pig scripts to analyze data Section 5: Hive architecture and design data types SQL support in Hive Creating Hive tables and querying partitions joins text processing labs : various labs on processing data with Hive Section 6: HBase concepts and architecture hbase vs RDBMS vs cassandra HBase Java API Time series data on HBase schema design labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise
806635 Advanced Hadoop for Developers 21 hours Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase.  These advanced programming techniques will be beneficial to experienced Hadoop developers. Audience: developers Duration: three days Format: lectures (50%) and hands-on labs (50%).   Section 1: Data Management in HDFS   Various Data Formats (JSON / Avro / Parquet) Compression Schemes Data Masking Labs : Analyzing different data formats;  enabling compression Section 2: Advanced Pig   User-defined Functions Introduction to Pig Libraries (ElephantBird / Data-Fu) Loading Complex Structured Data using Pig Pig Tuning Labs : advanced pig scripting, parsing complex data types Section 3 : Advanced Hive   User-defined Functions Compressed Tables Hive Performance Tuning Labs : creating compressed tables, evaluating table formats and configuration Section 4 : Advanced HBase   Advanced Schema Modelling Compression Bulk Data Ingest Wide-table / Tall-table comparison HBase and Pig HBase and Hive HBase Performance Tuning Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
806641 Hadoop For Administrators 21 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos. “…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized” — Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising Audience Hadoop administrators Format Lectures and hands-on labs, approximate balance 60% lectures, 40% labs. Prerequisites Introduction Hadoop history, concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Labs: discuss your Big Data projects and problems Planning and installation Selecting software, Hadoop distributions Sizing the cluster, planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure, logs Benchmarking Labs: cluster install, run performance benchmarks HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage, replacing defective drives Labs: getting familiar with HDFS command lines Data ingestion Flume for logs and other data ingestion into HDFS Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL Hadoop data warehousing with Hive Copying data between clusters (distcp) Using S3 as complementary to HDFS Data ingestion best practices and architectures Labs: setting up and using Flume, the same for Sqoop MapReduce operations and administration Parallel computing before mapreduce: compare HPC vs Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through Mapreduce configuration Job config Optimizing MapReduce Fool-proofing MR: what to tell your programmers Labs: running MapReduce examples YARN: new architecture and new capabilities YARN design goals and implementation architecture New actors: ResourceManager, NodeManager, Application Master Installing YARN Job scheduling under YARN Labs: investigate job scheduling Advanced topics Hardware monitoring Cluster monitoring Adding and removing servers, upgrading Hadoop Backup, recovery and business continuity planning Oozie job workflows Hadoop high availability (HA) Hadoop Federation Securing your cluster with Kerberos Labs: set up monitoring Optional tracks Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5) Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
806639 Hadoop for Business Analysts 21 hours Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics Audience Business Analysts Duration three days Format Lectures and hands on labs. Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software Labs : first look at Hadoop Section 2: HDFS Overview concepts (horizontal scaling, replication, data locality, rack awareness) architecture (Namenode, Secondary namenode, Data node) data integrity future of HDFS : Namenode HA, Federation labs : Interacting with HDFS Section 3 : Map Reduce Overview mapreduce concepts daemons : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Thinking in map reduce Future of mapreduce (yarn) labs : Running a Map Reduce program Section 4 : Pig pig vs java map reduce pig latin language user defined functions understanding pig job flow basic data analysis with Pig complex data analysis with Pig multi datasets with Pig advanced concepts lab : writing pig scripts to analyze / transform data Section 5: Hive hive concepts architecture SQL support in Hive data types table creation and queries Hive data management partitions & joins text analytics labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions Section 6: BI Tools for Hadoop BI tools and Hadoop Overview of current BI tools landscape Choosing the best tool for the job
806644 HBase for Developers 21 hours This course introduces HBase – a NoSQL store on top of Hadoop.  The course is intended for developers who will be using HBase to develop applications,  and administrators who will manage HBase clusters. We will walk a developer through HBase architecture and data modelling and application development on HBase. It will also discuss using MapReduce with HBase, and some administration topics, related to performance optimization. The course  is very  hands-on with lots of lab exercises. Duration : 3 days Audience : Developers  & Administrators Section 1: Introduction to Big Data & NoSQL Big Data ecosystem NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage HBase and NoSQL Section 2 : HBase Intro Concepts and Design Architecture (HMaster and Region Server) Data integrity HBase ecosystem Lab : Exploring HBase Section 3 : HBase Data model Namespaces, Tables and Regions Rows, columns, column families, versions HBase Shell and Admin commands Lab : HBase Shell Section 3 : Accessing HBase using Java API Introduction to Java API Read / Write path Time Series data Scans Map Reduce Filters Counters Co-processors Labs (multiple) : Using HBase Java API to implement  time series , Map Reduce, Filters and counters. Section 4 : HBase schema Design : Group session students are presented with real world use cases students work in groups to come up with design solutions discuss / critique and learn from multiple designs Labs : implement a scenario in HBase Section 5 : HBase Internals Understanding HBase under the hood Memfile / HFile / WAL HDFS storage Compactions Splits Bloom Filters Caches Diagnostics Section 6 : HBase installation and configuration hardware selection install methods common configurations Lab : installing HBase Section 7 : HBase eco-system developing applications using HBase interacting with other Hadoop stack (MapReduce, Pig, Hive) frameworks around HBase advanced concepts (co-processors) Labs : writing HBase applications Section 8 : Monitoring And Best Practices monitoring tools and practices optimizing HBase HBase in the cloud real world use cases of HBase Labs : checking HBase vitals
238325 Hadoop Administration 21 hours The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment Course goal: Getting knowledge regarding Hadoop cluster administration Introduction to Cloud Computing and Big Data solutions Apache Hadoop evolution: HDFS, MapReduce, YARN Installation and configuration of Hadoop in Pseudo-distributed mode Running MapReduce jobs on Hadoop cluster Hadoop cluster planning, installation and configuration Hadoop ecosystem: Pig, Hive, Sqoop, HBase Big Data future: Impala, Cassandra
1022881 Hadoop for Developers (2 days) 14 hours Introduction What is Hadoop? What does it do? How does it do it? The Motivation for Hadoop Problems with Traditional Large-Scale Systems Introducing Hadoop Hadoopable Problems Hadoop: Basic Concepts and HDFS The Hadoop Project and Hadoop Components The Hadoop Distributed File System Introduction to MapReduce MapReduce Overview Example: WordCount Mappers Reducers Hadoop Clusters and the Hadoop Ecosystem Hadoop Cluster Overview Hadoop Jobs and Tasks Other Hadoop Ecosystem Components Writing a MapReduce Program in Java Basic MapReduce API Concepts Writing MapReduce Drivers, Mappers, and Reducers in Java Speeding Up Hadoop Development by Using Eclipse Differences Between the Old and New MapReduce APIs Writing a MapReduce Program Using Streaming Writing Mappers and Reducers with the Streaming API Unit Testing MapReduce Programs Unit Testing The JUnit and MRUnit Testing Frameworks Writing Unit Tests with MRUnit Running Unit Tests Delving Deeper into the Hadoop API Using the ToolRunner Class Setting Up and Tearing Down Mappers and Reducers Decreasing the Amount of Intermediate Data with Combiners Accessing HDFS Programmatically Using The Distributed Cache Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners Practical Development Tips and Techniques Strategies for Debugging MapReduce Code Testing MapReduce Code Locally by Using LocalJobRunner Writing and Viewing Log Files Retrieving Job Information with Counters Reusing Objects Creating Map-Only MapReduce Jobs Partitioners and Reducers How Partitioners and Reducers Work Together Determining the Optimal Number of Reducers for a Job Writing Customer Partitioners Data Input and Output Creating Custom Writable and Writable-Comparable Implementations Saving Binary Data Using SequenceFile and Avro Data Files Issues to Consider When Using File Compression Implementing Custom InputFormats and OutputFormats Common MapReduce Algorithms Sorting and Searching Large Data Sets Indexing Data Computing Term Frequency — Inverse Document Frequency Calculating Word Co-Occurrence Performing Secondary Sort Joining Data Sets in MapReduce Jobs Writing a Map-Side Join Writing a Reduce-Side Join Integrating Hadoop into the Enterprise Workflow Integrating Hadoop into an Existing Enterprise Loading Data from an RDBMS into HDFS by Using Sqoop Managing Real-Time Data Using Flume Accessing HDFS from Legacy Systems with FuseDFS and HttpFS An Introduction to Hive, Imapala, and Pig The Motivation for Hive, Impala, and Pig Hive Overview Impala Overview Pig Overview Choosing Between Hive, Impala, and Pig An Introduction to Oozie Introduction to Oozie Creating Oozie Workflows
809300 Big Data Hadoop Analyst Training 28 hours

Course Discounts

Course Venue Course Date Course Price [Remote/Classroom]
Building Web Apps using the MEAN stack Kraków, ul. Rzemieślnicza 1 Mon, 2016-10-24 09:00 14652PLN / 6440PLN
ITIL® Foundation Certificate in IT Service Management Zielona Góra, ul. Reja 6 Mon, 2016-10-24 09:00 4554PLN / 2320PLN
Apache Tomcat and Java EE Administration Warszawa, ul. Złota 3/11 Mon, 2016-10-24 09:00 5178PLN / 1894PLN
Market Forecasting Kraków, ul. Rzemieślnicza 1 Tue, 2016-10-25 09:00 5742PLN / 2714PLN
A Practical Guide to Successful Pricing Strategies Poznan, Garbary 100/63 Wed, 2016-10-26 09:00 3366PLN / 1883PLN
Design Patterns in C# Bielsko-Biała, Al. Armii Krajowej 220 Wed, 2016-10-26 09:00 3861PLN / 2130PLN
Creating and delivering presentations in Power Point (social skills workshop) Poznan, Garbary 100/63 Thu, 2016-10-27 09:00 3346PLN / 1414PLN
Agile Project Management with Scrum Kraków, ul. Rzemieślnicza 1 Wed, 2016-11-02 09:00 3386PLN / 1929PLN
Understanding Windows Communication Foundation (WCF) Szczecin, ul. Małopolska 23 Wed, 2016-11-02 09:00 10266PLN / 3611PLN
Python Programming Szczecin, ul. Małopolska 23 Mon, 2016-11-07 09:00 N/A / 5434PLN
Agile Project Management with Scrum Poznan, Garbary 100/63 Mon, 2016-11-07 09:00 3386PLN / 1529PLN
Business Analysis, BABOK V3.0 and IIBA Certification Preparation Katowice ul. Opolska 22 Mon, 2016-11-07 09:00 9405PLN / 5753PLN
Adobe InDesign Lublin, ul. Spadochroniarzy 9 Mon, 2016-11-07 09:00 1881PLN / 1327PLN
Administration of Linux System Olsztyn, ul. Kajki 3/1 Tue, 2016-11-08 09:00 3861PLN / 1920PLN
SQL Fundamentals Warszawa, ul. Złota 3/11 Wed, 2016-11-09 09:00 3663PLN / 1510PLN
Adobe Illustrator Katowice ul. Opolska 22 Wed, 2016-11-09 09:00 2871PLN / 1848PLN
Excel For Statistical Data Analysis Warszawa, ul. Złota 3/11 Mon, 2016-11-14 09:00 2673PLN / 1291PLN
Visual Basic for Applications (VBA) in Excel - Advanced Białystok, ul. Malmeda 1 Mon, 2016-11-14 09:00 3455PLN / 2202PLN
Social Media - facebook, twitter, blog, youtube, google+ Gdynia, ul. Ejsmonda 2 Mon, 2016-11-14 09:00 1881PLN / 1102PLN
Apache Web Server Administration Kraków, ul. Rzemieślnicza 1 Tue, 2016-11-15 09:00 4732PLN / 3177PLN
SQL in MySQL Gdynia, ul. Ejsmonda 2 Thu, 2016-11-17 09:00 2851PLN / 1413PLN
OCUP2 UML 2.5 Certification - Intermediate Exam Preparation Wroclaw, ul.Ludwika Rydygiera 2a/22 Thu, 2016-11-17 09:00 5148PLN / 2175PLN
Creating and managing Web sites Kraków, ul. Rzemieślnicza 1 Mon, 2016-11-21 09:00 5841PLN / 3298PLN
Angular JavaScript Katowice ul. Opolska 22 Mon, 2016-11-21 09:00 5940PLN / 3380PLN
Agile w projektach zdalnych Katowice ul. Opolska 22 Tue, 2016-11-22 09:00 5049PLN / 1962PLN
Effective working with spreadsheet in Excel Rzeszów, Plac Wolności 13 Thu, 2016-11-24 09:00 PLN / 767PLN
Test Automation with Selenium Gdańsk, ul. Powstańców Warszawskich 45 Tue, 2016-12-06 09:00 7722PLN / 3624PLN
Graphic techniques (Adobe Photoshop, Adobe Illustrator) Wroclaw, ul.Ludwika Rydygiera 2a/22 Tue, 2016-12-06 09:00 4851PLN / 2740PLN
Docker and Kubernetes Gdańsk, ul. Powstańców Warszawskich 45 Wed, 2016-12-07 09:00 N/A / 4989PLN

Upcoming Courses

Weekend Hadoop courses, Evening Hadoop training, Hadoop boot camp, Hadoop instructor-led , Hadoop training courses, Hadoop one on one training , Hadoop on-site,Weekend Hadoop training, Hadoop trainer , Hadoop classes, Hadoop private courses, Evening Hadoop courses, Hadoop coaching

Some of our clients