Hacker News

by vira28on 3/14/2026, 8:17 PMwith 3 comments

by glitterlabson 3/15/2026, 3:13 PM

I’ve been thinking about how Postgres is quietly becoming a streaming backbone, not just a transactional database.

With logical replication and WAL-based CDC, Postgres can act as a real-time event source. Instead of introducing a separate log system early, many teams now stream changes directly out of Postgres into downstream systems.

The architecture is shifting from:

Traditional DB → app → message queue → stream processor

To something more like:

Postgres WAL → CDC → multiple sinks

What’s particularly interesting is using the WAL as the fan-out point. The same stream of changes can be written simultaneously into systems like Apache Kafka for event processing, stream processors like Apache Flink, and lakehouse tables such as Apache Iceberg.

In that model, WAL changes effectively become a unified change stream, with Iceberg acting as a long-lived analytical sink while other systems consume the same stream for real-time workflows.

So the pattern starts looking like:

Postgres WAL → streaming pipeline → Kafka / Flink → Iceberg tables for analytics & historical replay

A few things I’m curious about: • Are people treating Postgres WAL as a long-term system of event truth, or just an integration point? • Does writing CDC streams directly into Iceberg tables change how we think about building data lakes? • At what scale does this pattern start to break down compared to adopting a dedicated log system earlier?

Would love to hear how others are approaching Postgres + streaming + lakehouse architectures in production.

by Shyaamal11on 3/16/2026, 2:15 AM

One thing I’ve seen with this pattern is that Postgres + CDC works really well as an early-stage streaming backbone, especially when the operational DB is already the source of truth.

Using WAL → CDC → downstream systems keeps the architecture simple at first, and tools like Debezium make it relatively straightforward to pipe those changes into Kafka or other processors.

Where things start getting interesting is the analytics side. Once the CDC stream lands in something like Iceberg tables, you effectively get a continuously updated analytical dataset that can be queried with engines like Spark or Trino.

At that point the architecture starts to look less like a traditional “data warehouse pipeline” and more like a streaming-first lakehouse where operational data flows directly into analytical storage.

The main challenge I’ve seen is operational complexity once you start combining: CDC ingestion stream processing lakehouse storage (Iceberg/Delta) distributed query engines That’s where platforms trying to package the open stack together (e.g. Spark + Iceberg + Trino) become interesting. Some newer platforms like IOMETE are basically trying to simplify running that type of lakehouse stack on Kubernetes so teams don’t have to glue everything together manually.

Curious where people think the breakpoint is at what scale does Postgres+CDC stop being “good enough” and you start needing a dedicated log system as the primary event backbone?

by horizontech-devon 3/14/2026, 8:37 PM

Overall I agree with the sentiment. Still a long way to go before this open stack being production ready.

I know places which are easily paying 6 to 7 figures for their “analytics” stack and big part of it is the cloud data warehouse cost.

Postgres Is the Gateway Drug