Course Outline

Day 1: Theory and Introduction to Distributed Systems

  1. Introduction
    • Introduction to the training structure and agenda, discussion of the training environment.
  2. Basic Concepts of Distributed Systems
    • Definition of distributed systems and their significance in modern applications.
    • Key challenges: scalability, availability, consistency, fault tolerance.
  3. Data Consistency Models
    • Discussion of Strong Consistency and Eventual Consistency.
    • Managing consistency in distributed systems: quorum, Read-Write Quorums, Read Your Own Writes.
  4. Distributed Logging Systems and Communication
    • The pub/sub pattern and stream-table dualism.
    • Data compaction and real-time data processing.
  5. Case Study 1: Example of High-Performance Applications
    • Analysis of the architecture of communication systems (e.g., WhatsApp, Signal).
    • Challenges related to consistency and data recovery.

Day 2: Practical Aspects of Designing Distributed Systems

  1. Designing Fault-Tolerant Applications
    • Discussion of patterns: CQRS, Inbox/Outbox, Two-Phase Commit (2PC), Saga, Change Data Capture (CDC), Circuit Breaker, Read Repair.
    • Examples of practical applications.
  2. Examples of Non-Relational Databases
    • Document Databases (e.g., MongoDB, CouchDB):
    • Key-Value Databases (e.g., Redis):
    • Graph Databases (e.g., Neo4j, OrientDB):
    • Columnar Databases (e.g., HBase):
    • Object Databases (e.g., GridGain):
    • Time-Series Databases (e.g., TimescaleDB, InfluxDB):
    • Search Engines (e.g., Apache Solr):
    • In-Memory Grids (e.g., Hazelcast, GridGain):
  3. Modern Databases: Partitioning, Sharding, and Replication
    • Partitioning and Sharding: Discussion of techniques for dividing data into smaller fragments to improve system performance and scalability.
    • Data Replication: Different types of replication (synchronous, asynchronous), benefits, and challenges related to data replication in distributed environments.
    • Secondary Indexes: Creating and optimizing queries using secondary indexes to improve performance.
  4. Case Study 2: Designing a Graph-Based System
    • Graph-based design and modeling in distributed systems using Neo4j or OrientDB.
    • Practical exercise: graph modeling.
  5. Managing Real-Time Data vs. Traditional Data Warehouses
    • Introduction to real-time data processing and batch processing.
    • Example of using Timescale for monitoring time-series data.
  6. NewSQL – Modern Approach to Relational Databases
    • Discussion of the NewSQL concept as a combination of the advantages of relational databases with the flexibility and scalability of NoSQL solutions.
    • Task for NewSQL: Participants will familiarize themselves with the most popular NewSQL databases and will work with CockroachDB in a practical task. The goal of the task will be to implement transactions with ACID guarantees in a distributed environment.

Day 3: Practical Exercises and Database Optimization

  1. Practical Tasks Using Non-Relational Databases:
    • Task for MongoDB: Creating complex queries with data aggregation.
      • Participants will work on creating queries using the MongoDB pipeline, grouping, and filtering data in real-time.
    • Task for Redis: Implementing caching mechanisms using Redis.
      • Participants will build a system for storing query results in Redis to optimize read performance.
    • Task for CouchDB: Data synchronization in CouchDB using replication functions.
      • The task includes configuring replication between two CouchDB instances and analyzing data conflicts.
    • Task for Neo4j: Optimizing Cypher queries in a graph database.
      • Participants will analyze a large graph and build query optimizations to find dependencies between nodes.
    • Task for InfluxDB: Processing time-series data and optimizing data retention.
      • Exercises using InfluxQL for data flow analysis and setting retention strategies.
    • Task for GridGain: Processing data using GridGain, building and optimizing queries in an object-oriented environment.
      • Participants will optimize the storage and retrieval of large objects.
    • Task for Apache Solr: Implementing full-text search using Solr.
      • Creating indexes and optimizing search queries on large datasets.
  2. Task: Distributed Transactions
    • Participants will learn about the concept of distributed transactions, including mechanisms that ensure consistency in distributed systems.
    • Practical exercise: Implementing distributed transactions using patterns such as Two-Phase Commit (2PC) or Saga.
  3. Data Recovery After Failures and Backups
    • Patterns: Last-Writer-Wins, Vector Clocks, CRDT.
    • Strategies for data recovery after failures.
  4. Summary and Discussion
    • Q&A session and exchange of experiences.

The training does not cover relational databases, Elasticsearch, Apache Kafka, Prometheus, Cassandra.

 21 Hours

Number of participants


Price Per Participant (Exc. Tax)

Testimonials (1)

Provisional Courses

Related Categories