Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Day 1: Theory and Introduction to Distributed Systems
- Introduction
- Introduction to the training structure and agenda, discussion of the training environment.
- Fundamental Concepts of Distributed Systems
- Definition of distributed systems and their importance in modern applications.
- Key challenges: scalability, availability, consistency, fault tolerance.
- Data Consistency Models
- Discussion of Strong Consistency and Eventual Consistency.
- Managing consistency in distributed systems: quorum, read-write quorums, read your own writes.
- Distributed Log Systems and Communication
- The pub/sub pattern and the duality of streams-tables.
- Data compaction and real-time data processing.
- Case Study 1: Example of a High-Performance Application
- Analysis of the architecture of a communication system (e.g., WhatsApp, Signal).
- Challenges related to consistency and data recovery.
Day 2: Practical Aspects of Designing Distributed Systems
- Designing Fault-Tolerant Applications
- Discussion of patterns such as CQRS, Inbox/Outbox, Two-Phase Commit (2PC), Saga, Change Data Capture (CDC), Circuit Breaker, Read Repair.
- Practical examples of application.
- Examples of Non-Relational Databases
- Document-oriented databases (e.g., MongoDB, CouchDB):
- Key-value databases (e.g., Redis):
- Graph databases (e.g., Neo4j, OrientDB):
- Columnar databases (e.g., HBase):
- Object databases (e.g., GridGain):
- Time-series databases (e.g., TimescaleDB, InfluxDB):
- Search Engine (e.g., Apache Solr):
- In-Memory Grid (e.g., Hazelcast, GridGain):
- Modern Databases: Partitioning, Sharding, and Replication
- Partitioning and sharding: Discussion of techniques for dividing data into smaller fragments to improve performance and scalability.
- Data replication: Different types of replication (synchronous, asynchronous), benefits and challenges associated with data replication in distributed environments.
- Secondary indexes: Creating and optimizing queries using secondary indexes to improve performance.
- Case Study 2: Designing a Graph-Based System
- Designing and modeling graphs in distributed systems using Neo4j or OrientDB.
- Practical exercise: graph modeling.
- Real-Time Data Management vs. Traditional Data Warehouses
- Introduction to real-time data processing and batch processing.
- Example of using Timescale in monitoring time-series data.
- NewSQL – Modern Approach to Relational Databases
- Discussion of the NewSQL concept as a combination of the advantages of relational databases with the flexibility and scalability of NoSQL solutions.
- NewSQL Task: Participants will become familiar with popular NewSQL databases, and in a practical task, they will work with CockroachDB. The goal of the task is to implement transactions while maintaining ACID guarantees in a distributed environment.
Day 3: Practical Exercises and Database Optimization
- Practical Tasks Using Non-Relational Databases:
- MongoDB Task: Creating complex queries with data aggregation.
- Participants will work on creating queries using the MongoDB pipeline, grouping and filtering data in real-time.
- Redis Task: Implementing caching mechanisms using Redis.
- Participants will build a system for storing query results in Redis to optimize read performance.
- CouchDB Task: Data synchronization in CouchDB using replication functions.
- The task includes configuring replication between two CouchDB instances and analyzing data conflicts.
- Neo4j Task: Optimizing queries with Cypher in a graph database.
- Participants will analyze a large graph and build query optimizations to find relationships between nodes.
- InfluxDB Task: Processing time-series data and optimizing data retention.
- Exercises using InfluxQL for analyzing data flow and setting retention strategies.
- GridGain Task: Data processing using GridGain, building and optimizing queries in an object environment.
- Participants will optimize the storage and read performance of large objects.
- Apache Solr Task: Implementing full-text search using Solr.
- Creating indexes and optimizing search queries on large datasets.
- MongoDB Task: Creating complex queries with data aggregation.
- Task: Distributed Transactions
- Participants will become familiar with the concept of distributed transactions, including mechanisms ensuring consistency in distributed systems.
- Practical exercise: Implementing distributed transactions using patterns such as Two-Phase Commit (2PC) or Saga.
- Data Recovery and Backups
- Patterns like Last-Writer-Wins, Vector Clocks, CRDT.
- Strategies for data recovery after failures.
- Summary and Discussion
- Q&A session and experience sharing.
The training does not cover relational databases, Elasticsearch, Apache Kafka, Prometheus, or Cassandra.
21 Hours
Testimonials (1)
Combining theory with exercises
Artur - Asseco Poland S.A
Course - Bazy danych w budowaniu wysokowydajnych systemów rozproszonych
Machine Translated