Hacker Newsnew | past | comments | ask | show | jobs | submit | tudorg's commentslogin

Hi HN,

This is Tudor from Xata. You can think of Xata as an open-source, self-hosted, alternative to Aurora/Neon. Highlight features are:

- Fast copy-on-write branching.

- Automatic scale-to-zero and wake-up on new connections.

- 100% Vanilla Postgres. We run upstream Postgres, no modifications.

- Production grade: high availability, read replicas, automatic failover/switchover, upgrades, backups with PITR, IP filtering, etc.

You can self-host it, or you can use our [cloud service](https://xata.io).

Background story: we exist as a company for almost 5 years, offered a Postgres service from the start, and have launched several different products and open source projects here on HN before, including pgroll and pgstream. About a year and half ago, we’ve started to rearchitect our core platform from scratch. It is running in production for almost an year now, and it’s serving customers of all sizes, including many multi-TB databases.

One of our goals in designing the new platform was to make it cloud independent and with a careful selection of dependencies. Part of the reason was for us to be able to offer it in any cloud, and the other part is the subject of the announcement today: we wanted to have it open source and self-hostable.

Use cases: We think Xata OSS is appropriate for two use cases:

- get fast your preview / testing / dev / ephemeral environments with realistic data. We think for many companies this is a better alternative to seed or synthetic data, and allows you to catch more classes of bugs. Combined with anonymization, especially in the world of coding agents, this is an important safety and productivity enabler.

- offer an internal PGaaS. The alternative we usually see at customers is that they use a Kubernetes operator to achieve this. But there’s more to a Postgres platform than just the operator. Xata is more opinionated and comes with APIs and CLI.

Technical details: We wanted from the start to offer CoW branching and vanilla Postgres. This basically meant that we wanted to do CoW at the storage layer, under Postgres. We’ve have tested a bunch of storage system for performance and reliability and ultimately landed on using OpenEBS. OpenEBS is an umbrella project for more storage engines for Kubernetes, and the one that we use is the replicated storage engine (aka Mayastor).

Small side note on separation of storage from compute: since the introduction of PlanetScale Metal, there has been a lot of discussion about the performance of local storage. We had these discussions internally as well, and what’s nice about OpenEBS is that it actually supports both: there are local storage engines and over-the-network storage engines. For our purpose of running CoW branches, however, the advantages of the separation are pretty clear: it allows spreading the compute to multiple nodes, while keeping the storage volumes colocated, which is needed for CoW. So for now the Xata platform is focused on this, but it’s entirely possible to run Xata with local storage: basically a storage-class change away.

Another small side note: while Mayastor is serving us well, and it’s what we recommend for OSS installations, we have been working on our own storage engine in parallel (called Xatastor). It is the key to having sub-second branching and wake-up times and we’ll release it in a couple of weeks.

For the compute layer, we are building on top of CloudNativePG. It’s a stable and battle-tested operator covering all the production great concerns. We did add quite a lot of services around it, though: our custom SQL gateway, a “branch” operator, control plane and authentication services, etc.

The end result is what we think is an opinionated but flexible Postgres platform. More high level and easier to use than a K8s operator, and with a lot of battery included goodies.

Let us know if you have any questions!


It's funny that this news showed up just as we (Xata) have gone the other direction, citing also changes due to AI: https://xata.io/blog/open-source-postgres-branching-copy-on-...

We did consider arguments in both directions (e.g. easier to recreate the code, agents can understand better how it works), but I honestly think the security argument goes for open source: the OSS projects will get more scrutiny faster, which means bugs won't linger around.

Time will tell, I am in the open source camp, though.


Just wanted to appreciate the open-source work by Xata. I’ve been eyeing pgroll [1] for schema migrations after Liquibase license shenanigans (the only barrier for me is json-based migration instead of sql-based migrations)

[1] https://github.com/xataio/pgroll


This sounds cool, adding to my list of things to try out.

One note, this shouldn't be confused with https://github.com/xataio/pgstream which does logical replication/CDC with DDL changes.


We've renamed the project to pg_trickle: https://github.com/grove/pg-trickle


Thanks. Naming is hard, but we'll look into it.


> The Amp editor extensions will self-destruct on March 5 at 8pm Pacific Time. Time to switch to the Amp CLI.

That's an interesting way to tackle the removal of features.


> The Postgres extension model to capture the metrics (we also experimented with eBPF, but it causes too many kernel-user space context switches when you can do the same in an extension without them), and a small sidecar to push the metrics out via a standardized protocol like OTEL.

The extension model is great, but it doesn't work with existing postgres providers (RDS, Aurora, etc.). Unless one such extension becomes standard enough that all providers will support it. That would be ideal, IMO.

To be clear, I don't mean pg_stat_statements, that is standard enough, but an extension that pushes the actual queries in real-time.

> If it's a network hop, then adds milliseconds, and not microseconds.

Are you talking about connection establishing time or for query delay? I think it should normally be under a millisecond for the later.


> The extension model is great, but it doesn't work with existing postgres providers (RDS, Aurora, etc.). Unless one such extension becomes standard enough that all providers will support it. That would be ideal, IMO.

That's true, but that's a problem of the PGaaS providers for them to fix by providing the best functionality available. I'm planning on following this route in a pure OSS basis.

> Are you talking about connection establishing time or for query delay? I think it should normally be under a millisecond for the later.

Network trip. If the proxy is not co-located with the database but rather a network hop away, that's usually adding at least 1ms there, could be more.


Even then, though, it needs to run on the server so it's hard to guarantee to not impact performance and availability. There are many Postgres/Mysql proxies used for connection pooling and such, so at least we understand their impact pretty well (and it tends to be minimal).


Others have mentioned similar solutions but I’d like to add one: a database solution with CoW branching and PII anonymisation solves the db part in a safe way.

Disclaimer: I work at Xata.io, which provides these features. We have a recent blog post with a demo of this: https://xata.io/blog/database-branching-for-ai-coding-agents


This is really cool and I love to see the interest in fast clones / branching here.

We've built Xata with this idea of using copy-on-write database branching for staging and testing setups, where you need to use testing data that's close to the real data. On top of just branching, we also do things like anonymization and scale-to-zero, so the dev branches are often really cheap. Check it out at https://xata.io/

> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one. For production use, this usually means you create a dedicated template database rather than cloning your live database directly.

This is a key limitation to be aware of. A way to workaround it could be to use pgstream (https://github.com/xataio/pgstream) to copy from the production database to a production replica. Pgstream can also do anonymization on the way, this is what we use at Xata.


Thanks for the report!


Lots of good improvements, my favorites are Oauth, NOT NULL constraint with NOT VALID, uuidv7, RETURNING contains old/new. And I think the async IO will bring performance benefits, although maybe not so much immediately.


PostgreSQL Gains a Built-in UUIDv7 Generation Function for Primary Keys (many interesting details)

https://habr.com/en/news/950340/


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: