Database Replication - Achieving Availability and Scalability

15 min read | 2025.12.15

What is Replication

Replication is a mechanism that copies and synchronizes database data across multiple servers. It’s used for improving availability, enhancing read performance, and disaster recovery.

Why it’s needed: A single database server becomes a Single Point of Failure (SPOF). Replication allows service continuity during failures and distributes read queries.

Basic Replication Configurations

Master-Slave (Primary-Replica)

flowchart TB
    Writes["Writes"] --> Master["Master<br/>(Primary)"]
    Master -->|Replication| Slave1["Slave 1<br/>(Replica)"]
    Master -->|Replication| Slave2["Slave 2<br/>(Replica)"]
    Master -->|Replication| Slave3["Slave 3<br/>(Replica)"]
    Slave1 --> Reads["Reads"]
    Slave2 --> Reads
    Slave3 --> Reads
  • Master (Primary): Main server accepting writes
  • Slave (Replica): Copies master data, handles reads

Multi-Master

flowchart TB
    Master1["Master 1"] <-->|Sync| Master2["Master 2"]
    Master1 --> RW1["Read/Write"]
    Master2 --> RW2["Read/Write"]

Multiple masters accept writes. Conflict resolution becomes a challenge.

Synchronous vs Asynchronous Replication

Synchronous Replication

flowchart LR
    Client --> Master --> Slave["Slave<br/>(Wait for ACK)"] --> Response["Response to Client"]
AdvantagesDisadvantages
No data lossIncreased latency
Strong consistencyWrites stop on slave failure

Asynchronous Replication

flowchart LR
    Client --> Master --> Response["Response to Client"]
    Master -.->|Applied later| Slave
AdvantagesDisadvantages
Low latencyReplication lag
No impact from slave failurePotential data loss

Semi-synchronous Replication

Commit is confirmed once at least one slave receives the data.

flowchart LR
    Client --> Master --> Slave1["Slave 1<br/>(ACK)"] --> Response["Response to Client"]
    Master -.->|Applied later| Slave2["Slave 2"]

Replication Methods

Statement-based

Replicates SQL statements themselves.

-- Executed on master
INSERT INTO users (name, created_at) VALUES ('Alice', NOW());

-- Same SQL executed on slave
-- Problem: NOW() might return different results

Row-based

Replicates the actual row data that changed.

BeforeAfter
{id: 1, name: 'Alice'}{id: 1, name: 'Bob'}

This diff is applied to slave.

Mixed Mode

Uses statement-based normally, row-based for non-deterministic functions.

Replication Lag

The delay until a slave catches up with the master.

ServerTransactionsStatus
Master1, 2, 3, 4, 5✓ Complete
Slave1, 2, 3✓ Lag: 2 transactions

Cases Where Lag Causes Problems

// Read immediately after write
await db.master.query('UPDATE users SET name = ? WHERE id = ?', ['Bob', 1]);

// Read from slave → May return old data
const user = await db.slave.query('SELECT * FROM users WHERE id = ?', [1]);
// user.name might still be 'Alice'!

Solutions

  • Read Your Writes: Read from master immediately after writing
  • Causal Consistency: Track write timestamps
  • Synchronous Replication: For critical data only

Failover

When the master fails, a slave is promoted to become the new master.

Automatic Failover

flowchart TB
    Step1["1. Detect master failure<br/>Monitoring system detects no response from master"]
    Step2["2. Select slave<br/>Choose the slave with most recent replication progress"]
    Step3["3. Promotion<br/>Promote selected slave to master"]
    Step4["4. Switch connections<br/>Update application connections to new master"]
    Step1 --> Step2 --> Step3 --> Step4

Failover Considerations

ChallengeSolution
Split brainFencing (force stop old master)
Data lossUnapplied transactions in async replication
Connection switchingVirtual IP, DNS update, proxy

Common Implementations

MySQL

-- Master configuration
CHANGE MASTER TO
  MASTER_HOST='master.example.com',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=4;

START SLAVE;

PostgreSQL

# postgresql.conf (master)
wal_level = replica
max_wal_senders = 3

# recovery.conf (slave)
standby_mode = 'on'
primary_conninfo = 'host=master.example.com port=5432'

Managed Services

ServiceReplica Features
Amazon RDSRead Replicas, Multi-AZ
Cloud SQLRead Replicas, High Availability
Azure SQLgeo-replication

Read Scaling Pattern

flowchart LR
    App --> Master["Master<br/>(Writes)"]
    App --> LB["Load Balancer"]
    LB --> R1["Replica 1"]
    LB --> R2["Replica 2"]
    LB --> R3["Replica 3"]
    R1 --> Reads["Reads"]
    R2 --> Reads
    R3 --> Reads

Load balancer distributes read queries across replicas.

Summary

Database replication is a foundational technology for achieving high availability and read scalability. By properly designing synchronous/asynchronous choices, addressing replication lag, and planning failover strategies, you can build robust database systems.

← Back to list