Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Day 1: Theory and Introduction to Distributed Systems
- Introduction
- Overview of the training structure and agenda, discussion of the training environment.
- Basic Concepts of Distributed Systems
- Definition of distributed systems and their significance in modern applications.
- Key challenges: scalability, availability, consistency, fault tolerance.
- Data Consistency Models
- Discussion on Strong Consistency and Eventual Consistency.
- Managing consistency in distributed systems: quorum, Read-Write Quorums, Read Your Own Writes.
- Distributed Log Systems and Communication
- The pub/sub pattern and the stream-table duality.
- Data compaction and real-time data processing.
- Case Study 1: Example of a High-Performance Application
- Analysis of the architecture of a communication system (e.g., WhatsApp, Signal).
- Challenges related to consistency and data recovery.
Day 2: Practical Aspects of Designing Distributed Systems
- Designing Fault-Tolerant Applications
- Discussion on patterns such as CQRS, Inbox/Outbox, Two-Phase Commit (2PC), Saga, Change Data Capture (CDC), Circuit Breaker, Read Repair.
- Practical examples of application.
- Examples of Non-Relational Databases
- Key-value databases (e.g., Redis):
- Graph databases (e.g., Neo4j, OrientDB):
- Columnar databases (e.g., HBase):
- Object databases (e.g., GridGain):
- Time-series databases (e.g., TimescaleDB, InfluxDB):
- Search Engine (e.g., Apache Solr):
- In-Memory Grid (e.g., Hazelcast, GridGain):
- Modern Databases: Partitioning, Sharding, and Replication
- Partitioning and sharding: Discussion on techniques for dividing data into smaller fragments to improve performance and scalability of systems.
- Data replication: Different types of replication (synchronous, asynchronous), benefits and challenges associated with data replication in distributed environments.
- Secondary Indexes: Creating and optimizing queries using secondary indexes to improve performance.
- Case Study 2: Designing a Graph-Based System
- Designing and modeling graphs in distributed systems using Neo4j or OrientDB.
- Practical exercise: graph modeling.
- Real-Time Data Management vs Traditional Data Warehouses
- Introduction to real-time data processing and batch processing.
- Example of using Timescale in time-series data monitoring.
- NewSQL – A Modern Approach to Relational Databases
- Discussion on the concept of NewSQL as a combination of the advantages of relational databases with the flexibility and scalability of NoSQL solutions.
- NewSQL Task: Participants will familiarize themselves with popular NewSQL databases, and in a practical task, they will work with CockroachDB. The goal of the task is to implement transactions while maintaining ACID guarantees in a distributed environment.
Day 3: Practical Exercises and Database Optimization
- Practical Tasks with Non-Relational Databases:
- MongoDB Task: Creating complex queries with data aggregation.
- Participants will work on creating queries using the MongoDB pipeline, grouping and filtering data in real-time.
- Redis Task: Implementing caching mechanisms using Redis.
- Participants will build a system for storing query results in Redis to optimize read performance.
- CouchDB Task: Data synchronization in CouchDB using replication functions.
- The task involves configuring replication between two CouchDB instances and analyzing data conflicts.
- Neo4j Task: Optimizing Cypher queries in a graph database.
- Participants will analyze a large graph and build query optimizations for searching relationships between nodes.
- InfluxDB Task: Processing time-series data and optimizing data retention.
- Exercises using InfluxQL to analyze data flow and set retention strategies.
- GridGain Task: Data processing using GridGain, building and optimizing queries in an object environment.
- Participants will optimize the storage and retrieval of large objects.
- Apache Solr Task: Implementing full-text search using Solr.
- Creating indexes and optimizing search queries on large datasets.
- MongoDB Task: Creating complex queries with data aggregation.
- Distributed Transactions Task
- Participants will learn about the concept of distributed transactions, including mechanisms ensuring consistency in distributed systems.
- Practical exercise: Implementing distributed transactions using patterns such as Two-Phase Commit (2PC) or Saga.
- Data Recovery After Failures and Backups
- Patterns Last-Writer-Wins, Vector Clocks, CRDT.
- Strategies for data recovery after failures.
- Summary and Discussion
- Q&A session and experience sharing.
The training does not cover relational databases, Elasticsearch, Apache Kafka, Prometheus, or Cassandra.
21 Hours
Testimonials (1)
Combining theory with exercises
Artur - Asseco Poland S.A
Course - Bazy danych w budowaniu wysokowydajnych systemów rozproszonych
Machine Translated