Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Plan Szkolenia
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration
Wymagania
Programing skills (preferably python, scala)
SQL basics
21 godzin
Opinie uczestników (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Szkolenie - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Szkolenie - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift