Hot, Warm, Cold: Architecting a 3-Tier Data Lifecycle for PostgreSQL
Implementing a robust historical data
archival strategy is one of the most impactful architectural improvements you
can make for a large-scale PostgreSQL estate. It directly solves the pain
points of bloated indexes, extended backup windows, and the aggressive locking
issues, we experience in our day-to-day work loads.
I'm considering a usecase of 13 servers with ~1TB of raw data and ~750GB of archival
data per each database/server on average, the most robust architecture shifts
this data off the relational engine and into highly compressed, immutable
Object Storage in Cloud storage like AWS S3 or Azure BLOB Storage using a middle tier for OLAP like ClickHouse.
Below mentioned options provides a
comprehensive blueprint for the extraction, storage, and retrieval phases,
along with the estimated storage footprint and cost projections.
Phase 1: The Extraction & Transformation Strategy
Once a partition rolls past
the historical retention period and is detached from the main table, it must be
extracted efficiently without causing I/O spikes that compete with our active
DML operations.
- The
Format (Parquet vs. CSV/ZSTD):
* Zstandard (ZSTD) Compressed CSV: If this data is strictly for
compliance and rarely touched, streaming the detached partition via COPY
to a ZSTD-compressed CSV is the fastest and most CPU-efficient method.
o Apache Parquet: If there is a chance the business or
analytics teams will need to query this historical data, extract it into
Parquet format. Parquet is columnar, highly compressed, and can be queried
directly in object storage.
·
The
Workflow:
- Detach
the partition from the parent table (instantly removes it from live query
plans).
- Execute
a background worker script to export the detached table to the target
format into AWS S3 or Azure blob storage.
- Stream
the output directly to Object Storage (e.g., Azure Blob Storage or AWS
S3) to bypass local disk staging.
- Verify
the checksum of the uploaded data before dropping from Source.
- DROP TABLE on the PostgreSQL server to reclaim the premium block storage of ~500GB on each DB Server.
Phase 2: Storage & Compression
Estimates
Relational database data is highly
compressible. When you extract 10TB of raw PostgreSQL data (excluding the
B-Tree indexes), the actual storage footprint drops dramatically.
Assuming a standard compression ratio of
4:1 (75% reduction) using ZSTD or Parquet:
|
Metric/Parameter |
Size Per Server |
Total (13 Servers) |
|
Raw PostgreSQL Data |
~750 GB |
~10 TB |
|
Index Overhead
Removed |
~100 GB |
2.5 TB |
|
Data Payload to
Compress |
~500 GB |
7.5 TB |
|
Estimated Target
Storage (Compressed) |
~150 GB |
~2 TB |
Phase 3: The Cost Estimates &
Tiering
By shifting this data off Premium SSDs
(which typically cost around $150 to $200 per provisioned TB per month
depending on the cloud provider or SAN tier) and moving it to Object Storage,
the ROI is immediate.
Here is the estimated monthly cost to
store the resulting ~2 TB of compressed archive data across standard
cloud object storage tiers:
|
Storage Tier |
Best For |
Approx. Monthly Cost (2 TB) |
Retrieval (per-GB) Characteristics |
|
Hot / Standard Object |
Data queried frequently |
~$50.00 / month |
Millisecond access. Standard
retrieval costs. |
|
Cool / Infrequent Access |
Queried once a month |
~$20.00 / month |
Millisecond access. Higher retrieval
fee. |
|
Cold / Archive / Glacier |
Audits only (1-2x a year) |
~$2.50 / month |
Hours to retrieve. Highest retrieval
fees. |
Phase 4: The Retrieval Architecture
When compliance or business users inevitably ask for data from this Historical Archival period, We need to provide the same without restoring 1TB back into our live transactional database.
- Option A (The Native FDW): If the data was archived as CSV, you can temporarily create a file_fdw (Foreign Data Wrapper) or use s3_fdw/azure_storage_fdw to map the remote file as a read-only table in a staging PostgreSQL instance.
- Option
B (The Data Lake Approach):
If the data was archived as Parquet, point an analytical query engine like
DuckDB or a ClickHouse DB directly at the storage bucket. This allows you
to run standard SQL against the deep archive without touching our primary
PostgreSQL clusters.
OLAP phase:
To bridge the gap between active
transactional data and deep archival storage, we have implemented ClickHouse as
a dedicated OLAP tier. This ensures high-performance, low-latency querying for
frequently accessed historical data, mitigating the latency, indexing, and
functional limitations of querying cold data directly from S3 object storage.
This can be best achieved through an
event-driven Change Data Capture (CDC) pipeline that transforms raw
transactional logs into query-ready analytical states.
Here are the
critical architectural steps to build this pipeline:
- Source Configuration (PostgreSQL): Configure the database for
logical decoding by setting wal_level = logical, establishing a
publication for the specific tables, and creating a logical replication
slot to track WAL consumption.
- Event Capture (Debezium): Attach a CDC connector to the
replication slot to read the WAL stream, translating every INSERT, UPDATE,
and DELETE into a structured event payload (typically JSON or Avro).
- The Streaming Buffer (Kafka): Route these event payloads into
dedicated Apache Kafka topics. This decoupling layer provides durable
storage and acts as a shock absorber during massive transactional bursts
on the source.
- The Ingestion Engine (ClickHouse): Provision a table using the Kafka
engine within ClickHouse. This acts as an active consumer group,
continuously pulling batches of raw events directly from the Kafka broker.
- State Materialization (ClickHouse
MVs): Deploy a
Materialized View to act as the transformation pipeline. It automatically
reads from the Kafka engine table, applies necessary transformations, and
routes the final data into a persistent storage table—most often a ReplacingMergeTree
or CollapsingMergeTree to efficiently process mutations and deduplicate
records.
No comments:
Post a Comment