Kasi V Dogga - Ecstatic, Passionate and Certified PostgreSQL and Sybase Database Architect!: 29/03/2026

Thursday, 2 April 2026

PostgreSQL Logical Replication: Configuration and Recovery Protocols!

Deploying and Rescuing PostgreSQL Logical Replication - Handling Dropped Publications & Subscriptions and fixing replication Errors!

I’ve built HA\DR setups for every major RDBMS out there Oracle, MS SQL, Sybase, MySQL, DB2, you name it using native replication and 3rd party tools like Oracle GG, SharePlex, Qlik replicate, IBM Infosphere and Fivetran, Striim besides AWS DMS and Azure DMS.

After those heavy-lifting sessions like running mid summer noon, bringing up PostgreSQL replication is just a gentle walk in the park at sunrise in mild spring..😊

PostgreSQL's logical replication model shares compelling architectural similarities with the core design principles of Sybase Replication :)

DB level replication which is called Multi-Site Availability (MSA) in Sybase (SAP) Replication Server was introduced in version 15.0, so as logical replication for entire schema(s) was introduced in PostgreSQL in version 15.0. Prior versions support only table level replication definitions, publications and subscriptions.

Key details about DB level/MSA in Sybase/SAP Replication Server (SRS):

· Version: Introduced in 15.0 and significantly refined in 15.7.1 and 16.0.

· Purpose: Simplifies replication by enabling database-level replication (using create database replication definition command) rather than managing individual table replication definitions.

· Capabilities: Supports both Data Manipulation Language (DML) and Data Definition Language (DDL) replication.

Key details about PostgreSQL SCHEMA(s) level logical replication:

· Version: Introduced in 15 and significantly refined in 18.

· Purpose: Simplifies replication by enabling schema-level replication (using CREATE PUBLICATION my_db_pub for tables IN SCHEMA app1, app2;) rather than managing individual table replication publications.

· Capabilities: Supports Data Manipulation Language (DML) for version: 15, 16 and 17 and supports Data Definition Language (DDL) too from version: 18.

Quick glance at setting up logical replication:

1. The Pre-requisite (Database Configuration requirements for logical replication)

Set the wal_level to logical in the Primary/Source Database.

On the Publisher (Primary Node), check postgresql.conf file or run SHOW wal_level; to verify the wal_level is set to logical or not (By default it is replica):

2. The Publisher (Primary Node)

Log into the source database where Primary/active data lives.

a. Create the Table: Create a test table or we can skip this step to setup replication to an existing table.

Sample SQL:

CREATE TABLE TEST

(

id SERIAL PRIMARY KEY,

description VARCHAR(100),

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);

(Important: Logical replication requires a Primary Key or a Replica Identity on the table).

b. Create the Publication: This notifies PostgreSQL to start tracking changes for this table.

Sample SQL:

CREATE PUBLICATION test_pub FOR TABLE TEST;

3. The Subscriber (Secondary/Target Database)

Log into the target database where the data to be replicated.

a. Create the Table in the Secondary: Logical replication does not create the target table or replicate schema changes. We must create the target table with the exact source table DDL before subscribing.

Sample SQL:

CREATE TABLE TEST

(

id SERIAL PRIMARY KEY,

description VARCHAR(100),

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);

b. Create the Subscription: The Create subscription command connects to the Primary/Publisher, materializes the initial data (Full Load), and establishes the continuous stream (CDC). We need to replace the connection string with the actual source database conn-info details.

Sample SQL:

CREATE SUBSCRIPTION test_sub

CONNECTION 'host=XXX.XX.XX.XX port=5432 dbname=source_db user=postgres password=’XXXXXX'

PUBLICATION test_pub;

4. Test, Validate and Verify the Replication: Let’s insert and perform some DMLs on the Primary/Source TEST table.

a. Run a couple of Insert statements on Publisher:

Sample SQL:

INSERT INTO TEST (description) VALUES ('Hello from Kasi V Dogga!');

INSERT INTO TEST (description) VALUES ('Hope the Logical replication is active.');

b. Check the data on the Subscriber:

Sample SQL:

SELECT * FROM TEST;

We can see the two tuples appear on the target database to confirm that, we have successfully established a logical replication for a Table.

First things First:

Anytime logical replication breaks for any reason, we need to immediately check the replication slots on the primary. If replication is broken and we cannot fix it immediately, we must drop the slot manually to save the primary database from an out-of-disk outage due to WAL files retention requirements.

SQL to find orphaned slots (on the Primary Node)

SELECT slot_name, plugin, active, restart_lsn,

pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as wal_lag_size

FROM pg_replication_slots;

If wal_lag_size is growing into 100s of gigabytes and active is false, drop the slot (pg_drop_replication_slot('slot_name')) to save the primary, and accept that we will have to do a full data re-sync (copy_data = true/false) later.

Fixing the replication issues:

When logical replication is disrupted, such as by an accidental drop of a publication or subscription, the data becomes inconsistent. In this situation, only the primary node can be relied upon to handle both OLTP (DML operations) and OLAP (read/query workloads). This is actually a destructive administrative action, not just a "pause."

Here are the DBA's actions/options for fixing these scenarios.

Scenario A: The Publication is Dropped (On the Primary Node)

If someone executes DROP PUBLICATION on the source database, the subscriber will immediately start throwing errors in the logs (e.g., ERROR: publication "my_table_pub" does not exist), and replication will halt. Unlike dropping a subscription, dropping a publication does not automatically drop the replication slot, as the publisher still thinks a subscriber is out there, so it will start aggressively hoarding Write-Ahead Logs (WAL) on the primary node. If we do not fix this quickly, the primary server's disk will fill up to 100% and the database will crash.

The Fix: If we identify this issue quickly and the replication slot is still intact, we can seamlessly resume replication without having to recopy all GB/TB of data.

Recreate the Publication (On Primary Node): We need to recreate the publication exactly as it was created before.

Sample SQL:

CREATE PUBLICATION test_pub FOR TABLE test;

Refresh the Subscription (On Secondary/Subscriber Node): We will refresh the subscriber to re-establish replication by reaching the newly created publication, and resume pulling from the exact LSN (Log Sequence Number) where it left off.

Sample SQL:

ALTER SUBSCRIPTION test_sub REFRESH PUBLICATION;

This will make sure the subscriber will reconnect to the existing replication slot and instantly drain the hoarded WAL files. Replication is restored and data will be in sync.

Scenario B: The Subscription is Dropped (On the Secondary/Subscriber Node)

As mentioned earlier, dropping the subscription is a destructive action that usually drops the replication slot on the primary and discards the WAL history. So, we cannot simply resume/refresh the replication.

The Fix:

Verify the Slot is dropped (On Primary/Publisher Node): Ensure the slot was actually dropped to prevent disk bloat.

Sample SQL:

SELECT slot_name, active FROM pg_replication_slots;

-- If the old slot is still there and active=f, drop it:

-- SELECT pg_drop_replication_slot('slot_name');

Re-establish Replication (On Secondary/Subscriber Node): Let the recreate subscription truncate and materialize/sync entire table’s data.

Sample SQL:

CREATE SUBSCRIPTION test_sub

CONNECTION 'CONNINFO'

PUBLICATION test_pub

WITH (copy_data = true);

Scenario C: Transaction Failed in Subscriber/Secondary Node

A duplicate key error during PostgreSQL replication typically occurs in logical replication when the subscriber tries to apply an INSERT/UPDATE that violates a PRIMARY KEY or UNIQUE constraint.

Error:

ERROR: duplicate key value violates unique constraint "pk_test"
DETAIL: Key (id)=(521) already exists.

Common causes:

Data already exists on subscriber but not in sync with publisher
Manual changes done on subscriber (not recommended)
Replication restarted after inconsistency
Dropped/recreated publication or subscription
Missing initial data sync

Fix Options (Based on the failure scenario and data inconsistency)

Option 1: Delete conflicting row on Subscriber (Quick Fix)

DELETE FROM test_table WHERE id = 101;

Then restart replication:

ALTER SUBSCRIPTION test_sub ENABLE;

Option 2: Skip the conflicting transaction (Supported in PostgreSQL 15+)

ALTER SUBSCRIPTION test_sub SKIP (lsn = '0/21DKVR05');

Option 3: Truncate and Resync Table

ALTER SUBSCRIPTION test_sub DISABLE;
TRUNCATE TABLE test_table;
ALTER SUBSCRIPTION test_sub ENABLE;

Option 4: Refresh Subscription (Best Option) – This option re-sync metadata + data

ALTER SUBSCRIPTION test_sub REFRESH PUBLICATION;

Option 5: Recreate Subscription (Full Reset)

DROP SUBSCRIPTION test_sub;
CREATE SUBSCRIPTION test_sub
CONNECTION '...'
PUBLICATION test_pub
WITH (copy_data = true);

Scenario	Recommended Action
Few duplicate errors	Delete conflicting rows and restart the replication
Frequent errors	Disable the replication and Enable with Re-sync affected tables
Major inconsistency	Drop and Recreate subscription with copy_data

Tuesday, 31 March 2026

PostgreSQL HA Cluster using Patroni setup and hot_standby_feedback impact on Primary node!

PostgreSQL HA Cluster using Patroni - Overview of Architecture:

To prevent "split-brain" scenarios during a failure, As per the best practises and recommendations, this setup utilizes a 3-node architecture to maintain quorum.

Node 1, Node 2, Node 3: Each runs PostgreSQL (database), Patroni (cluster manager/failover), and Etcd (distributed consensus store).

HAProxy Node: Acts as the entry point, routing application traffic to the current PostgreSQL Primary.

Step 1: Environment Preparation (Run on all nodes)

Update OS and Install Dependencies: (OS: Ubuntu)

sudo apt update && sudo apt install -y curl wget jq net-tools python3-pip python3-dev libpq-dev

Configure Host Resolution: Add the following mapping to the /etc/hosts file on every server:

10.0.0.11 node1

10.0.0.12 node2

10.0.0.13 node3

10.0.0.10 haproxy-node

Configure the Firewall (Optional but Recommended): Allow traffic on ports 22 (SSH), 5432 (PostgreSQL), 2379 & 2380 (Etcd), and 8008 (Patroni API).

Step 2: Install and Configure Etcd (Run on Nodes 1, 2, 3)

Etcd stores the cluster state and manages leader election.

Install Etcd:

sudo apt install -y etcd-server etcd-client

Configure Etcd: Edit /etc/default/etcd (or /etc/etcd/etcd.conf). Below is an example for node1; ensure you adjust ETCD_NAME and IP addresses for node2 and node3.

ETCD_NAME="node1"

ETCD_LISTEN_PEER_URLS="http://10.0.0.11:2380"

ETCD_LISTEN_CLIENT_URLS="http://10.0.0.11:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.0.11:2380"

ETCD_INITIAL_CLUSTER="node1=http://10.0.0.11:2380,node2=http://10.0.0.12:2380,node3=http://10.0.0.13:2380"

ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_INITIAL_CLUSTER_TOKEN="patroni-cluster"

ETCD_ADVERTISE_CLIENT_URLS="http://10.0.0.11:2379"

Start and Verify the Cluster:

sudo systemctl enable etcd && sudo systemctl restart etcd

etcdctl member list # This should output all 3 nodes

Step 3: Install PostgreSQL (Run on Nodes 1, 2, 3)

Install PostgreSQL 16 (or your preferred version):

sudo apt install -y postgresql-common

sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

sudo apt install -y postgresql-16 postgresql-contrib-16

Disable the Default PostgreSQL Service: Because Patroni will take full control over managing the database lifecycle (startup/shutdown/failover), the default systemd service must be disabled.

sudo systemctl stop postgresql

sudo systemctl disable postgresql

sudo rm -rf /var/lib/postgresql/16/main/* # Clear data so Patroni can bootstrap it

Step 4: Install and Configure Patroni (Run on Nodes 1, 2, 3)

1.     Install Patroni:

sudo apt install -y patroni

2. Configure Patroni: Create/edit /etc/patroni/patroni.yml. Below is a baseline configuration for node1 (Update name and connect_address IPs for nodes 2 and 3):

scope: postgres_cluster

namespace: /db/

name: node1

restapi:

listen: 0.0.0.0:8008

connect_address: 10.0.0.11:8008

etcd:

host: 127.0.0.1:2379

bootstrap:

dcs:

    ttl: 30

    loop_wait: 10

    retry_timeout: 10

    postgresql:

      use_pg_rewind: true

initdb:

    - auth-host: md5

    - auth-local: trust

    - encoding: UTF8

    - data-checksums

postgresql:

listen: 0.0.0.0:5432

connect_address: 10.0.0.11:5432

data_dir: /var/lib/postgresql/16/main

bin_dir: /usr/lib/postgresql/16/bin

authentication:

    superuser:

      username: postgres

      password: supersecretpassword

    replication:

      username: replicator

      password: repsecretpassword

3.    Start Patroni: Start the service on node1 first, let it initialize as the Leader, and then start nodes 2 and 3 sequentially so they join as Replicas.

sudo systemctl enable patroni && sudo systemctl start patroni

4.     Verify Cluster State:

sudo patronictl -c /etc/patroni/patroni.yml list

Step 5: Install and Configure HAProxy (Run on HAProxy Node)

HAProxy sits between your applications and the database, pinging the Patroni API (port 8008) to dynamically determine which node is the current Primary, routing write traffic exclusively to it.

Install HAProxy:

sudo apt install -y haproxy

Configure HAProxy: Append the following block to /etc/haproxy/haproxy.cfg:

listen postgres_cluster

    bind *:5432

    mode tcp

    option httpchk OPTIONS /master

    http-check expect status 200

    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions

    server node1 10.0.0.11:5432 maxconn 100 check port 8008

    server node2 10.0.0.12:5432 maxconn 100 check port 8008

    server node3 10.0.0.13:5432 maxconn 100 check port 8008

Restart HAProxy:

sudo systemctl restart haproxy

Application servers can now connect to PostgreSQL strictly via the HAProxy IP (10.0.0.10:5432). In the event of a primary node crash, Patroni will orchestrate a failover via Etcd, and HAProxy will transparently redirect the connection stream to the newly promoted leader.

Have you ever encountered a scenario where a long-running transaction on a PostgreSQL standby node caused performance degradation or table bloat on the primary node?

It seems strange, a secondary node is supposed to be read-only, so querying it shouldn't impact the primary at all, right? It is a common misconception that asynchronous replication protects the primary from everything happening on the secondary. While async mode protects your primary's write speed, it does not protect your primary's disk space and vacuum processes.

However, in PostgreSQL architecture, a long-running transaction (SELECT) on a secondary node can severely impact or even bring down the primary. This happens primarily through three specific parameters designed to keep the databases synchronized and keep the integrity.

Here is exactly why this strange thing happens:

The hot_standby_feedback Trap (Bloating on Primary Node):

If a primary node updates (auto correction mode i.e. delete+insert) or deletes a row, VACUUM will eventually clean up the old "dead" row. But what if another query on the secondary is currently reading that exact row? If the primary deletes it, the secondary query will fail with a "snapshot too old" error. To prevent the query on the secondary from failing, PostgreSQL provides hot_standby_feedback, to be enabled.

What happens, If we enable hot_standby_feedback: The secondary node now, keep-on messages the primary requesting, "I have a transaction open using transaction ID XXX, hence don’t vacuum anything newer than this XXX".
What is the Impact on Primary: If some long report running on the secondary (typically runs 1-2 hours), the primary is completely forbidden from cleaning up any dead tuples across the entire database for those 1-2 hours. Hence, primary tables and indexes will get huge bloat, causing disk I/O to spike and overall performance to degrade heavily.

Replication Slots & Disk Space Exhaustion:

If the secondary node is running a massive transaction, it is consuming heavy CPU, Memory, and Disk I/O.

What happens: The secondary becomes so starved for resources that the startup process (which replays the Write-Ahead Logs from the primary) slows down to a grinding halt or stops entirely.
The Impact on Primary: If you are using physical replication slots, the primary is forced to hold onto all WAL files until the secondary confirms it has replayed all of them. If the secondary is bogged down by a large query and stops replaying, the WAL files will pile up on the primary's pg_wal directory until it runs out of disk space. When the primary hits 100% disk usage, the database crashes.

Synchronous Replication Blocking (synchronous_commit):

This is more problematic now. If we have configured our cluster for high availability using synchronous replication (where synchronous_commit is set to on, remote_write, or remote_apply), the primary must wait for the secondary to acknowledge transactions.

What happens in the Secondary: A huge query on the secondary hogs the disk I/O or locks resources.
The Impact on Primary: Write transactions (INSERT, UPDATE, DELETE) on the primary will hang. The primary might successfully execute the write locally, but the COMMIT will sit in a waiting state until the sluggish secondary processes the WAL and acknowledges it back to Primary. This causes application connections to pile up on the primary, eventually exhausting max_connections and the database crashes.

How to Mitigate This?

In an OLTP production environment, the ultimate goal is to make sure Primary database is always up, running and available. To protect the primary from secondary-node abuse, we have to establish the cited guardrails:

Tune max_standby_streaming_delay: Set this to a reasonable limit (e.g., 30s or 1min). If a query on the replica blocks replication for longer than this time, PostgreSQL then forcefully cancel the conflicting query on the Secondary node. The user's query fails, but the primary stays alive.
Use statement_timeout on the Replica: Enforce a strict time limit for read-only queries on the secondary so long running queries/reports are killed automatically.
Turn off hot_standby_feedback: If we can tolerate occasional query cancellations on the replica due to the error “snapshot too old”, but, turning hot_standby_feedback off ensures the primary's VACUUM is never blocked.

Kasi V Dogga - Ecstatic, Passionate and Certified PostgreSQL and Sybase Database Architect!

DisCopy

Thursday, 2 April 2026

PostgreSQL Logical Replication: Configuration and Recovery Protocols!

Tuesday, 31 March 2026

PostgreSQL HA Cluster using Patroni setup and hot_standby_feedback impact on Primary node!

Have you ever encountered a scenario where a long-running transaction on a PostgreSQL standby node caused performance degradation or table bloat on the primary node?

Total Pageviews