Have you ever encountered a scenario where a long-running transaction on a PostgreSQL standby node caused performance degradation or table bloat on the primary node?
It seems strange,
a secondary node is supposed to be read-only, so querying it shouldn't impact
the primary at all, right? It is a common misconception that asynchronous
replication protects the primary from everything happening on the secondary.
While async mode protects your primary's write speed, it does not protect your
primary's disk space and vacuum processes.
However,
in PostgreSQL architecture, a long-running transaction (SELECT) on a secondary
node can severely impact or even bring down the primary. This happens primarily
through three specific parameters designed to keep the databases synchronized
and keep the integrity.
Here is exactly why this strange thing happens:
The hot_standby_feedback Trap (Bloating on Primary Node):
If a primary node updates (auto correction mode i.e. delete+insert) or deletes a row, VACUUM will eventually clean up the old "dead" row. But what if another query on the secondary is currently reading that exact row? If the primary deletes it, the secondary query will fail with a "snapshot too old" error. To prevent the query on the secondary from failing, PostgreSQL provides hot_standby_feedback, to be enabled.
- What happens, If we enable hot_standby_feedback: The secondary node now, keep-on messages the primary requesting, "I have a transaction open using transaction ID XXX, hence don’t vacuum anything newer than this XXX".
- What is the Impact on
Primary: If some long report running
on the secondary (typically runs 1-2 hours), the primary is completely
forbidden from cleaning up any dead tuples across the entire
database for those 1-2 hours. Hence, primary tables and indexes will get
huge bloat, causing disk I/O to spike and overall performance to degrade
heavily.
Replication Slots & Disk Space Exhaustion:
If the
secondary node is running a massive transaction, it is consuming heavy CPU,
Memory, and Disk I/O.
- What
happens: The secondary becomes so starved
for resources that the startup process (which replays the Write-Ahead Logs from
the primary) slows down to a grinding halt or stops entirely.
- The Impact on Primary: If you are using physical replication slots,
the primary is forced to hold onto all WAL files until the secondary
confirms it has replayed all of them. If the secondary is bogged down by
a large query and stops replaying, the WAL files will pile up on the
primary's pg_wal directory until it runs out of disk space. When the
primary hits 100% disk usage, the database crashes.
Synchronous Replication Blocking (synchronous_commit):
This is
more problematic now. If we have configured our cluster for high availability
using synchronous replication (where synchronous_commit is set to on, remote_write,
or remote_apply), the primary must wait for the secondary to acknowledge
transactions.
- What
happens in the Secondary: A huge
query on the secondary hogs the disk I/O or locks resources.
- The Impact on Primary: Write transactions (INSERT, UPDATE, DELETE)
on the primary will hang. The primary might successfully execute the write
locally, but the COMMIT will sit in a waiting state until the sluggish
secondary processes the WAL and acknowledges it back to Primary. This
causes application connections to pile up on the primary, eventually
exhausting max_connections and the database crashes.
How to Mitigate This?
In an OLTP production environment, the ultimate goal is to make sure Primary database is always up, running and available. To protect the primary from secondary-node abuse, we have to establish the cited guardrails:
- Tune max_standby_streaming_delay: Set this to a reasonable limit (e.g., 30s or 1min). If a query on the replica blocks replication for longer than this time, PostgreSQL then forcefully cancel the conflicting query on the Secondary node. The user's query fails, but the primary stays alive.
- Use statement_timeout on the Replica: Enforce a strict time limit for read-only queries on the secondary so long running queries/reports are killed automatically.
- Turn off hot_standby_feedback: If we can tolerate occasional query cancellations on the replica due to the error “snapshot too old”, but, turning hot_standby_feedback off ensures the primary's VACUUM is never blocked.