← Back to Databases

Database Replication Explained: Master-Slave and Multi-Master

Database replication copies data across multiple servers to improve availability, distribute read load, and enable disaster recovery. Master-slave replication uses one primary server for writes, while multi-master allows writes on multiple nodes—each with different tradeoffs in consistency, complexity, and failure resilience.

What is Database Replication?

Database replication is the process of maintaining synchronized copies of data across separate database instances. It's the backbone of modern distributed systems, ensuring your application can survive server failures and handle traffic spikes without going dark.

At its core, replication works by capturing changes from a source database and applying them to target databases. The source is called the primary (or master), and the targets are replicas (or slaves). The magic happens in the replication lag—the time between a write on the primary and when it reaches replicas.

Without replication, a single database failure means total data loss or extended downtime. With it, you're buying redundancy and read scalability for the cost of eventual consistency and operational complexity.

Master-Slave Replication Architecture

Master-slave (now often called primary-replica) is the traditional model. One primary server accepts all writes, and one or more replica servers receive those writes asynchronously or synchronously.

How Master-Slave Works

The flow is straightforward:

Client sends a write query to the master
Master executes the write and records it to its binary log (or similar transaction log)
Slave connects to master, reads new log entries, and applies them locally
Read queries can be directed to slaves to distribute load

Most relational databases like MySQL, PostgreSQL, and MariaDB implement master-slave natively. It's simple, battle-tested, and works well for read-heavy workloads.

Setting Up MySQL Master-Slave

Here's a practical example with MySQL:

On the Master:

# my.cnf configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

# Create replication user
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%';

# Get master status
SHOW MASTER STATUS;
# Output: File: mysql-bin.000001, Position: 154

On the Slave:

# my.cnf configuration
[mysqld]
server-id = 2
relay-log = mysql-relay-bin

# Configure replication
CHANGE MASTER TO
  MASTER_HOST='master-ip',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=154;

# Start replication
START SLAVE;

# Check status
SHOW SLAVE STATUS\G

Strengths of Master-Slave

Simple to implement: Most databases support it out of the box
Clear consistency model: Master is always the source of truth
Proven at scale: Used by major platforms for decades
Low write latency: Writes don't wait for replica confirmation
Easy failover: Promote a replica to master when needed

Weaknesses and Challenges

Single point of failure: Master going down stops all writes (until promotion)
Replication lag: Reads from slaves might see stale data
No automatic failover: You must detect failure and promote manually (or use tools)
Limited write scaling: All writes go to one server
Split-brain risk: If master and replica lose connectivity, both might accept writes

Multi-Master Replication Architecture

Multi-master (also called active-active) lets two or more nodes accept writes simultaneously. Changes from any master are replicated to all others. It's attractive for distributed systems but introduces complexity.

How Multi-Master Works

Instead of one primary, every node is a primary:

Client writes to any master node
That master logs the write and replicates it to peer masters
Peer masters apply the same write locally
All nodes eventually hold identical data

MySQL Group Replication, PostgreSQL's Patroni with etcd, and Galera Cluster are popular implementations. They use consensus algorithms to prevent conflicts.

Conflict Resolution Strategies

Multi-master risks write conflicts. Two masters might execute conflicting changes. Here's how they're handled:

Last-Write-Wins (LWW)

Whichever write has the newest timestamp wins. Loser's change is silently discarded.

# Simulating LWW conflict
Master A: UPDATE users SET status='active' WHERE id=1 at timestamp=1000
Master B: UPDATE users SET status='inactive' WHERE id=1 at timestamp=1005

# Result: inactive (B's write wins)

Custom Application Logic

Your app detects conflicts and decides which write to keep. More complex but gives you control.

Operational Transformation (OT)

Advanced systems like CRDTs (Conflict-free Replicated Data Types) ensure conflicts resolve deterministically without losing data. Databases like DynamoDB and Cassandra use variants of this.

    Pro tip: Most production multi-master setups avoid conflicts by partitioning writes. One master handles user accounts, another handles orders. This gives you the benefits without the pain.
  

Strengths of Multi-Master

No single point of failure: Any node can go down; others keep accepting writes
Write scaling: Distribute writes across multiple nodes
Geographic distribution: Local writes in multiple regions with eventual sync
Higher availability: System survives multiple node failures
Low-latency local writes: Write to nearest node, replicate globally

Weaknesses and Challenges

Conflict complexity: Must handle write conflicts intelligently
Eventual consistency: Reads might see stale data; no strong consistency guarantees
Replication overhead: Every node talks to every other node (O(n²) messaging)
Harder to debug: State can diverge; root causes are harder to trace
Split-brain scenarios: Network partitions can create data divergence
Operational complexity: Requires careful monitoring and understanding

Master-Slave vs Multi-Master: Side-by-Side Comparison

Aspect	Master-Slave	Multi-Master
Write location	Master only	Any node
Consistency	Strong (master truth)	Eventual
Failover	Manual or automated	Automatic
Conflict handling	N/A (single writer)	Required (LWW, OT, etc.)
Write latency	Low	Low (local writes)
Operational complexity	Low-moderate	High
Best for	Read-heavy, single region	Multi-region, high availability

Replication Lag: The Hidden Challenge

Replication lag is the delay between a write on the primary and when it appears on replicas. It's measured in milliseconds or seconds, but it matters.

If your application reads from a replica right after writing, you might get stale data. This is called a read-after-write inconsistency.

Solutions include:

Read from master after critical writes: Trade consistency for latency
Sticky reads: Route reads from the same session to the master
Causal consistency: Use version numbers to ensure ordered reads
Accept eventual consistency: Design your app to tolerate stale reads

When to Use Each Model

Choose Master-Slave If:

Your workload is read-heavy (e.g., 95% reads, 5% writes)
You're okay with brief downtime during master failover
Data consistency is critical (financial systems)
You want simplicity and operational ease
You're in a single geographic region

Database Replication Explained: Master-Slave and Multi-Master

What is Database Replication?

Master-Slave Replication Architecture

How Master-Slave Works

Setting Up MySQL Master-Slave

Strengths of Master-Slave

Weaknesses and Challenges

Multi-Master Replication Architecture

How Multi-Master Works

Conflict Resolution Strategies

Last-Write-Wins (LWW)

Custom Application Logic

Operational Transformation (OT)

Strengths of Multi-Master

Weaknesses and Challenges

Master-Slave vs Multi-Master: Side-by-Side Comparison

Replication Lag: The Hidden Challenge

When to Use Each Model

Choose Master-Slave If:

Choose Multi-Master If: