Big Data Hadoop Analyst Training Training Course
Big Data Analyst Training is a practical course recommended to anyone who wants to become a Data Scientist expert in the future. The course focuses on the aspects needed to work as a modern analyst in technology Big Data. During the course, tools are presented that allow access, change, transformation and analysis of complex data structures located in the cluster Hadoop. During the course, topics within the Hadoop Ecosystem technology (Pig, Hive, Impala, ELK and others) will be discussed.
- Functionality of Pig, Hive, Impala, ELK tools, allowing for data collection, results recording and analysis.
- Like Pig, Hive and Impala can improve the performance of common and everyday analytical tasks.
- Performing real-time interactive analyzes of huge data sets to obtain valuable and valuable elements for business and how to interpret the conclusions.
- Performing complex queries on very large volumes of data.
Course Outline
Basics Hadoop.
Introduction to Pig.
Basic data analysis using Pig.
Processing complex data with Pig.
Operations on multiple data sets using Pig.
Troubleshooting and Pig Optimization.
Introduction to Hive, Impala, ELK.
Executing queries in Hive, Impala, ELK.
Data management in Hive.
Data storage and performance.
Analyzes using tools Hive and Impala.
Working with the tool Impala and ELK.
Analyzing text and complex data types.
Optimization Hive,Pig,Impala,ELK.
Interoperability and workflow.
Questions, tasks, certification.
Requirements
This course is suggested for all data analysts, business analysts, developers and administrators who have experience with SQL and/or scripting languages. No knowledge of Apache Hadoop is required before this training.
Open Training Courses require 5+ participants.
Big Data Hadoop Analyst Training Training Course - Booking
Big Data Hadoop Analyst Training Training Course - Enquiry
Big Data Hadoop Analyst Training - Consultancy Enquiry
Testimonials (1)
Część praktyczna.
Arkadiusz Iwaszko
Course - Big Data Hadoop Analyst Training
Related Courses
Hortonworks Data Platform (HDP) for Administrators
21 HoursThis instructor-led, live training in Poland (online or onsite) introduces Hortonworks Data Platform (HDP) and walks participants through the deployment of Spark + Hadoop solution.
By the end of this training, participants will be able to:
- Use Hortonworks to reliably run Hadoop at a large scale.
- Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows.
- Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project.
- Process different types of data, including structured, unstructured, in-motion, and at-rest.
Apache Ambari: Efficiently Manage Hadoop Clusters
21 HoursApache Ambari is an open-source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters.
In this instructor-led live training participants will learn the management tools and practices provided by Ambari to successfully manage Hadoop clusters.
By the end of this training, participants will be able to:
- Set up a live Big Data cluster using Ambari
- Apply Ambari's advanced features and functionalities to various use cases
- Seamlessly add and remove nodes as needed
- Improve a Hadoop cluster's performance through tuning and tweaking
Audience
- DevOps
- System Administrators
- DBAs
- Hadoop testing professionals
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Impala for Business Intelligence
21 HoursCloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.
Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.
Audience
This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.
After this course delegates will be able to
- Extract meaningful information from Hadoop clusters with Impala.
- Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
- Troubleshoot Impala.
Data Analysis with Hive/HiveQL
7 HoursThis course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Administrator Training for Apache Hadoop
35 HoursAudience:
The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment
Goal:
Deep knowledge on Hadoop cluster administration.
Big Data Analytics in Health
21 HoursBig data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights.
The health industry has massive amounts of complex heterogeneous medical and clinical data. Applying big data analytics on health data presents huge potential in deriving insights for improving delivery of healthcare. However, the enormity of these datasets poses great challenges in analyses and practical applications to a clinical environment.
In this instructor-led, live training (remote), participants will learn how to perform big data analytics in health as they step through a series of hands-on live-lab exercises.
By the end of this training, participants will be able to:
- Install and configure big data analytics tools such as Hadoop MapReduce and Spark
- Understand the characteristics of medical data
- Apply big data techniques to deal with medical data
- Study big data systems and algorithms in the context of health applications
Audience
- Developers
- Data Scientists
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice.
Note
- To request a customized training for this course, please contact us to arrange.
Datameer for Data Analysts
14 HoursDatameer is a business intelligence and analytics platform built on Hadoop. It allows end-users to access, explore and correlate large-scale, structured, semi-structured and unstructured data in an easy-to-use fashion.
In this instructor-led, live training, participants will learn how to use Datameer to overcome Hadoop's steep learning curve as they step through the setup and analysis of a series of big data sources.
By the end of this training, participants will be able to:
- Create, curate, and interactively explore an enterprise data lake
- Access business intelligence data warehouses, transactional databases and other analytic stores
- Use a spreadsheet user-interface to design end-to-end data processing pipelines
- Access pre-built functions to explore complex data relationships
- Use drag-and-drop wizards to visualize data and create dashboards
- Use tables, charts, graphs, and maps to analyze query results
Audience
- Data analysts
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Hadoop Administration
21 HoursThe course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment
Course goal:
Getting knowledge regarding Hadoop cluster administration
Hadoop For Administrators
21 HoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.
“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising
Audience
Hadoop administrators
Format
Lectures and hands-on labs, approximate balance 60% lectures, 40% labs.
Hadoop for Developers (4 days)
28 HoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.
Advanced Hadoop for Developers
21 HoursApache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase. These advanced programming techniques will be beneficial to experienced Hadoop developers.
Audience: developers
Duration: three days
Format: lectures (50%) and hands-on labs (50%).
Hadoop for Developers and Administrators
21 HoursHadoop is the most popular Big Data processing framework.
Hadoop for Project Managers
14 HoursIn this instructor-led training in Poland, participants will learn the core components of the Hadoop ecosystem and how these technologies can be used to solve large-scale problems. By learning these foundations, participants will improve their ability to communicate with the developers and implementers of these systems as well as the data scientists and analysts that many IT projects involve.
Audience
- Project Managers wishing to implement Hadoop into their existing development or IT infrastructure
- Project Managers needing to communicate with cross-functional teams that include big data engineers, data scientists and business analysts
Hadoop Administration on MapR
28 HoursAudience:
This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand.
Hadoop with Python
28 HoursHadoop is a popular Big Data processing framework. Python is a high-level programming language famous for its clear syntax and code readibility.
In this instructor-led, live training, participants will learn how to work with Hadoop, MapReduce, Pig, and Spark using Python as they step through multiple examples and use cases.
By the end of this training, participants will be able to:
- Understand the basic concepts behind Hadoop, MapReduce, Pig, and Spark
- Use Python with Hadoop Distributed File System (HDFS), MapReduce, Pig, and Spark
- Use Snakebite to programmatically access HDFS within Python
- Use mrjob to write MapReduce jobs in Python
- Write Spark programs with Python
- Extend the functionality of pig using Python UDFs
- Manage MapReduce jobs and Pig scripts using Luigi
Audience
- Developers
- IT Professionals
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice