Database replication copies data across multiple servers to improve availability, distribute read load, and enable disaster recovery. Master-slave replication uses one primary server for writes, while multi-master allows writes on multiple nodes—each with different tradeoffs in consistency, complexity, and failure resilience.
Database replication is the process of maintaining synchronized copies of data across separate database instances. It's the backbone of modern distributed systems, ensuring your application can survive server failures and handle traffic spikes without going dark.
At its core, replication works by capturing changes from a source database and applying them to target databases. The source is called the primary (or master), and the targets are replicas (or slaves). The magic happens in the replication lag—the time between a write on the primary and when it reaches replicas.
Without replication, a single database failure means total data loss or extended downtime. With it, you're buying redundancy and read scalability for the cost of eventual consistency and operational complexity.
Master-slave (now often called primary-replica) is the traditional model. One primary server accepts all writes, and one or more replica servers receive those writes asynchronously or synchronously.
The flow is straightforward:
Most relational databases like MySQL, PostgreSQL, and MariaDB implement master-slave natively. It's simple, battle-tested, and works well for read-heavy workloads.
Here's a practical example with MySQL:
On the Master:
# my.cnf configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
# Create replication user
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%';
# Get master status
SHOW MASTER STATUS;
# Output: File: mysql-bin.000001, Position: 154
On the Slave:
# my.cnf configuration
[mysqld]
server-id = 2
relay-log = mysql-relay-bin
# Configure replication
CHANGE MASTER TO
MASTER_HOST='master-ip',
MASTER_USER='repl_user',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
# Start replication
START SLAVE;
# Check status
SHOW SLAVE STATUS\G
Multi-master (also called active-active) lets two or more nodes accept writes simultaneously. Changes from any master are replicated to all others. It's attractive for distributed systems but introduces complexity.
Instead of one primary, every node is a primary:
MySQL Group Replication, PostgreSQL's Patroni with etcd, and Galera Cluster are popular implementations. They use consensus algorithms to prevent conflicts.
Multi-master risks write conflicts. Two masters might execute conflicting changes. Here's how they're handled:
Whichever write has the newest timestamp wins. Loser's change is silently discarded.
# Simulating LWW conflict
Master A: UPDATE users SET status='active' WHERE id=1 at timestamp=1000
Master B: UPDATE users SET status='inactive' WHERE id=1 at timestamp=1005
# Result: inactive (B's write wins)
Your app detects conflicts and decides which write to keep. More complex but gives you control.
Advanced systems like CRDTs (Conflict-free Replicated Data Types) ensure conflicts resolve deterministically without losing data. Databases like DynamoDB and Cassandra use variants of this.
| Aspect | Master-Slave | Multi-Master |
|---|---|---|
| Write location | Master only | Any node |
| Consistency | Strong (master truth) | Eventual |
| Failover | Manual or automated | Automatic |
| Conflict handling | N/A (single writer) | Required (LWW, OT, etc.) |
| Write latency | Low | Low (local writes) |
| Operational complexity | Low-moderate | High |
| Best for | Read-heavy, single region | Multi-region, high availability |
Replication lag is the delay between a write on the primary and when it appears on replicas. It's measured in milliseconds or seconds, but it matters.
If your application reads from a replica right after writing, you might get stale data. This is called a read-after-write inconsistency.
Solutions include: