Kasi V Dogga - Ecstatic, Passionate and Certified PostgreSQL and Sybase Database Architect!

PostgreSQL Version Upgrade & Data Center Migration

On-Premises to Azure VM (or a VM on a private DC too) comprehensive plan

Database Size: ~1.5 TB | Allowed Downtime: 8 Hours (single maintenance window)

1. Current State Assessment

Scope Item	Details
Source Environment	PostgreSQL 11.21 On-Premises
Target Environment	PostgreSQL 18.4 on Azure VM or Private DC (IaaS) — Ubuntu 22.04 LTS
Database Size	~1.5 TB
Allowed Downtime	8 hours (single approved maintenance window)

2. Installed PostgreSQL Extensions

#	Extension	Version	Relocatable	18.4 Compatible	Action Required
1	plpgsql	1.0	false	YES	Built-in, no action needed
2	adminpack	2.0	false	YES	Verify superuser permissions
3	pgcrypto	1.0	true	YES	CREATE EXTENSION command post-restore

All 3 installed extensions are fully compatible with PostgreSQL 18.4 on Azure VM. No extension replacements or workarounds are needed. Relocatable extensions (pgcrypto) require a simple CREATE EXTENSION command post-restore.

3. Gap Analysis & Risk Assessment

3.1 PostgreSQL Version Gap: 11.21 -> 18.4

Jumping 7 major versions introduces breaking changes that must be assessed and tested before production migration:

Change Area	Impact	Risk	Action
pg_upgrade path	Direct 11->18 upgrade supported via pg_upgrade or pg_dump/restore	MED	Test on non-prod first
Query planner changes	Statistics & query plans differ; some queries may regress in v18	HIGH	Benchmark top-50 queries
SSL/TLS defaults	PostgreSQL 17+ enforces stricter SSL defaults by default	MED	Audit all connection strings
Logical replication	Enhanced in v16+. Current streaming replication config needs updating	MED	Review replication config
Legacy application tier	Components on EOL runtimes (e.g., PHP 5.3) still use pg_ functions supported in PG18	LOW	Containerize / upgrade runtime
Partitioning improvements	PG17-18 has major partition improvements vs PG11	LOW	Opportunity to re-partition
pg_stat_activity changes	Column names changed across versions; monitoring queries may break	MED	Audit monitoring dashboards

4. Migration Approach

4.1 Migration Strategy Comparison

Method	Description	Pros	Cons
pg_dump / pg_restore	Full dump on source, restore on Azure. Works across versions.	Simple; reliable; cross-version	7-14 hrs for 1.5 TB in a single pass
pg_upgrade --link	In-place upgrade linking data files. Fast but same host.	Fastest; no copy needed	Needs intermediate host steps; not suited for a cross-datacenter move
pglogical / logical replication (CDC)	Live replication while systems run; cut over with minimal downtime.	Near-zero downtime (<1 hr active cutover); safest for large DBs	More complex setup; requires wal_level=logical
Azure Database Migration Service	Managed Azure DMS service for online migration.	GUI-driven; monitoring included	Limited PG version support; added cost

Recommended Strategy (fits comfortably within the 8-hour maintenance window):

Phase A: Pre-load the ~1.5 TB dataset to Azure via parallel pg_dump/pg_restore (--jobs=8) ahead of the cutover window, while the source database remains fully online (no downtime).

Phase B: Enable logical replication (pglogical or native PostgreSQL logical replication) to continuously sync delta changes during testing and final validation.

Phase C: Execute cutover inside the approved 8-hour window — stop writes, drain replication lag, promote Azure PostgreSQL 18, repoint application connections, validate. Target actual downtime: 3-5 hours, leaving 3-5 hours of contingency buffer for extended validation or a rollback decision before the window closes.

5. Pre-Migration Data Reduction

These actions should be performed on the on-prem source BEFORE migration to reduce the size and shrink the cutover window.

Combined, they can reduce the effective dataset from ~1.5 TB to roughly 1.2 TB:

Action	Details	Est. Saving	Effort
Archive historical data (>1 yr)	Move aged records from high-volume tables to an archive schema/table	~100-200 GB	MED
VACUUM FULL bloated tables	Identify tables with a high dead-tuple ratio and reclaim space	~25-50 GB	LOW
Evaluate temp/ staging tables	Staging/temp tables from batch or reporting jobs may be truncatable	~100-200 GB	HIGH
Partition high-volume tables	Partition by date/month before migration for faster purge and better query performance after	Performance	HIGH
VACUUM ANALYZE	Update statistics for optimal query plans post-restore	Performance	LOW

6. Step-by-Step Migration Plan

Migration Phase Flow

Phase 0	Phase 1	Phase 2	Phase 3	Phase 4	Decommission
Preparation (1 Week)	Azure/DC Infra (1 Week)	Dev/QA/UAT (3 Weeks)	PROD Cutover (8-Hr Window)	Stabilize ENV (2 Weeks)	On-Prem (12+ weeks)

Phase 0

Phase 1

Phase 2

Phase 3

Phase 4

Decommission

Preparation

(1 Week)

Azure/DC Infra

(1 Week)

Dev/QA/UAT

(3 Weeks)

PROD Cutover

(8-Hr Window)

Stabilize ENV

(2 Weeks)

On-Prem

(12+ weeks)

Phase 0: Preparation (Week 1 - Some tasks to be performed parallel)

#	Task	Owner	Duration
0.1	Inventory all application connection strings and pg_hba.conf entries	DBA	2 days
0.2	Identify and document all cron jobs, pg_agent jobs, and scheduled tasks	DBA + App	1 day
0.3	Audit all stored procedures and triggers for PG18 compatibility	DBA	2 days
0.4	Run pg_upgrade --check on a DB clone to identify incompatibilities early	DBA	1 day
0.5	Enable wal_level=logical on source for logical replication (requires PG restart)	DBA	1 day
0.6	Set up Azure subscription, resource groups, VNet, NSG rules, and Azure Bastion	Cloud/Infra	2 days
0.7	Configure VPN Gateway or ExpressRoute between on-prem and Azure VNet	Network	2 days
0.8	Perform VACUUM FULL + ANALYZE on top 20 tables on-prem	DBA	1 day
0.9	Archive or purge non-essential historical data (staging/log tables) per Section 5	DBA + App	1 day
0.10	Collect baseline: pg_stat_statements top-50 query plans, response times	DBA	1 day

Phase 1: Target Infrastructure Setup (Week 2)

#	Task	Owner	Duration
1.1	Provision Servers/VMs per environment: Dev, QA, UAT, PROD, DR, Jumpbox	Cloud/Infra	2 days
1.2	Install Ubuntu 22.04 LTS + PostgreSQL 18.4 on all VMs	DBA	1 day
1.3	Configure postgresql.conf: shared_buffers, work_mem, WAL settings (see Section 8.2)	DBA	1 day
1.4	Configure pg_hba.conf to match on-prem access control patterns	DBA	0.5 days
1.5	Install and verify all 5 PostgreSQL extensions on each instance	DBA	0.5 days
1.6	Set up Backup + WAL archiving to Backup Storage	Cloud/Infra	1 day
1.7	Set up Azure Monitor, Log Analytics workspace, and alert rules	Cloud/Infra	1 day
1.8	Configure Private Endpoints for all DB VMs (no public IP on DB)	Cloud/Infra	0.5 days
1.9	Set up Patroni + etcd for PROD HA cluster with automatic failover	DBA	2 days
1.10	End-to-end connectivity test: on-prem to each Server/VM and integration endpoints	Network + DBA	1 day

Phase 2: Dev / QA / UAT Migration & Testing (Weeks 3-5)

#	Task	Owner	Duration
2.1	Dump Dev/QA subset of the application database (schema + representative dataset)	DBA	1 day
2.2	Restore to Dev Azure VM and verify extensions and object counts	DBA	2-4 hrs
2.3	Update Dev application connection strings to point to Azure Dev DB	App Team	0.5 days
2.4	Run application smoke tests against Azure Dev (all major user flows)	QA	2 days
2.5	Execute baseline top-50 queries; compare plans vs on-prem (EXPLAIN ANALYZE)	DBA	2 days
2.6	Tune postgresql.conf based on query regression findings	DBA	1 day
2.7	Repeat for QA environment with full schema + anonymized production data	DBA + QA	3 days
2.8	Provision UAT with production-scale data for realistic performance testing	DBA	1 day
2.9	Run load tests simulating peak PROD traffic patterns on UAT	QA	3 days
2.10	Application team UAT sign-off — mandatory gate before PROD cutover planning	App + Mgmt	2 days

Phase 3. Production Cutover (8-Hour Maintenance Window)

Cutover Window:

Approved maintenance window not to exceed 8 hours. Target active cutover duration ~4 hours, leaving substantial contingency buffer. Full rollback plan must be documented and tested on UAT first. Go/No-Go decision meeting 24 hours before the window opens.

#	T+	Task	Owner	Duration
3.1	T-7d	Pre-load via parallel pg_dump/pg_restore from on-prem to Azure PROD (schema + data)	DBA	6-10 hrs (off-hours, no downtime)
3.2	T-2d	Start logical replication (pglogical) from on-prem to Azure PROD for delta sync	DBA	Setup 2 hrs
3.3	T-1d	Verify replication lag consistently < 60 seconds; final Go/No-Go review	DBA	Monitor
3.4	T+0:00	Set application to read-only / maintenance mode	App Team	15 min
3.5	T+0:15	Confirm replication caught up (lag = 0)	DBA	30 min
3.6	T+0:45	Stop replication; promote Azure PostgreSQL 18 to standalone primary	DBA	15 min
3.7	T+1:00	Update connection strings / DNS to Azure PROD endpoints	App + DBA	30 min
3.8	T+1:30	Restart application services; run smoke tests on all endpoints	App Team	45 min
3.9	T+2:15	Full functional & integration validation (API, batch/file transfer, mobile, monitoring)	All Teams	1.5 hrs
3.10	T+3:45	Go-live confirmed OR execute rollback to on-prem	Management	Decision point
3.11	T+4:00 - T+8:00	Contingency buffer reserved within the approved window; extended checks if needed	DBA	Buffer
3.12	T+2d	Sustained post-go-live monitoring: slow query log, replication, backups, alerts	DBA	48 hrs

6. Post-Migration Checklist & Tuning

6.1 Validation Checklist

#	Validation Item	Status	Owner
V1	Row counts match top 20 tables between on-prem source and Azure target	[ ] Pass	DBA
V2	All 5 extensions installed and functional on PG18	[ ] Pass	DBA
V3	All application endpoints responding (API, mobile, batch/file transfer, integration/EDI)	[ ] Pass	App Team
V4	First backup successful post-migration (Azure Backup + WAL)	[ ] Pass	DBA
V5	WAL archiving active and replication lag < 5 minutes	[ ] Pass	DBA
V6	No ERROR in PostgreSQL logs for 24 hours post-cutover	[ ] Pass	DBA
V7	Query performance within 20% of baseline (top 50 queries)	[ ] Pass	DBA
V8	Azure Monitor alerts configured and test-fired successfully	[ ] Pass	Cloud/Infra
V9	DR failover test completed successfully in secondary region	[ ] Pass	DBA + Cloud
V10	All connection strings updated; no fallback to on-prem	[ ] Pass	App Team
V11	User acceptance sign-off from each application team	[ ] Pass	App Owners
V12	Security scan: no unintended open ports, NSG rules verified	[ ] Pass	Security

6.2 On-Premises Decommission Plan

Ø Keep on-prem DB servers running for 30 days post-PROD migration as rollback fallback

Ø After the 30-day stability window: stop PostgreSQL services; retain VM snapshots for 90 days

Ø Legacy replica/secondary database servers decommissioned — replaced by Azure HA cluster (Patroni)

Ø Archived/cold data migrated to Azure Blob Storage (Cool tier) before server decommission

Ø Final server decommission at day 90 post-migration with security and management sign-off

7. Recommendations Summary

Pri	Recommendation	Impact	Effort	Wk
P1	Pre-archive/purge staging and high-volume log tables to reduce migration footprint from ~1.5 TB	VERY HIGH	MED	1-2
P1	Use pg_dump --jobs=8 (parallel) over VPN/ExpressRoute to pre-load data ahead of the cutover window (6-10 hrs)	HIGH	LOW	1
P1	Implement logical replication (pglogical/native) for minimal-downtime cutover — target ~3-5 hrs actual downtime within the approved 8-hour window	HIGH	HIGH	4-6
P1	Upgrade legacy application tier off EOL runtimes (e.g., PHP 5.3) in parallel — independent security risk regardless of DB migration	HIGH	HIGH	6+
P2	Set random_page_cost=1.1 and effective_io_concurrency=200 on Azure (SSD-optimised settings)	HIGH	LOW	3
P2	Implement Patroni for PROD HA instead of manual replica management	HIGH	MED	3
P2	Partition high-volume log tables by event/date before migration	HIGH	MED	2-3
P3	Upgrade CentOS 7.9 on remaining on-prem app servers — reached EOL June 2024	MED	HIGH	8+
P3	Consolidate legacy replica/integration systems into Azure VNet via Private Endpoint	MED	MED	6+
P3	Set up pg_cron for automated old-data archival on Azure PROD from day 1	MED	LOW	3

End of Upgrade/Migration Document

PostgreSQL On-Premises to Azure VM Migration | Solution Assessment | Kasi.Dogga@gmail.com

Kasi V Dogga - Ecstatic, Passionate and Certified PostgreSQL and Sybase Database Architect!

DisCopy

Wednesday, 8 July 2026

PostgreSQL Database Upgrade/Migration Assessment & Solution

PostgreSQL Version Upgrade & Data Center Migration

Total Pageviews