This is exciting: > ...This log architecture is optimized for raft consensus-bas...

otterley · on Feb 25, 2022

Raft doesn’t magically make a database multi master. It’s a consensus algorithm that has a leader election process. There’s still a leader, and therefore a single master to which writes must be directed. The problem it solves is ambiguity about who the active master is at the moment.

akorotkov · on Feb 25, 2022

Right. The idea of active-active OrioleDB multimaster is to apply changes locally and in parallel send it to the leader. Then sync on commit and ensure there is no conflicts.

The design document will be published later.

otterley · on Feb 25, 2022

Please have it subjected to a Jepsen test suite ASAP!

lillecarl · on Feb 25, 2022

My thinking is since it's postgres wire compatible the existing tests should work?

yencabulator · on March 3, 2022

Jepsen does much more than basic end-to-end tests, including intentionally partitioning the cluster. Tests written for a non-distributed system are downright friendly compared to what Jepsen does to distributed systems.

k__ · on Feb 25, 2022

Does Jepsen just run automated tests?

lillecarl · on Feb 25, 2022

I mean they're automated but also hand crafted.

https://github.com/jepsen-io/jepsen

collegeburner · on Feb 25, 2022

Those are very expensive and booked months in advance. Aphyr sometimes comments on HN and is of course a better source on this but that's what I recall reading.

otterley · on Feb 25, 2022

It’s the price you pay to reassure customers that your newfangled distributed database won’t experience data loss or inconsistency.

porker · on Feb 25, 2022

Feature wise I like postgresql.

Operationally? Mysql. I pick MySQL 8 whenever I can because over the lifetime of a project it's easier.

Pragmatically I'm coming to the uncomfortable (as a Foss advocate) idea that Microsoft SQL Server is a good choice and worth paying for. Features and relative ease of use.

tpetry · on Feb 25, 2022

Can you explain more why MySQL is operational more easy? I personally don‘t see any difference.

vanviegen · on Feb 25, 2022

Besides what's already mentioned by siblings:

- Forwards (and usually also backwards) compatible disk format, meaning version updates don't require more than a few minutes of downtime. On large datasets, postgresql can require days or weeks.

- Replication works across version differences, making upgrades without any downtime at all easier.

- No need for a vacuum process that can run into trouble with a sustained high write load.

- Page-level compression, reducing storage needs for some types of data quite a bit.

- Clustered primary keys. Without these, some types of queries can become rather slow when your table doesn't fit in memory. (Example: a chat log containing many billions of messages, which you're querying by conversion-id.)

- xtrabackup (innobackupex) is awesome.

Gibheer · on Feb 25, 2022

> - No need for a vacuum process that can run into trouble with a sustained high write load.

MySQL has a similar mechanism called the purge job. It can lead to similar problems, for example DML slow down.

https://mariadb.com/kb/en/innodb-purge/

mattashii · on Feb 25, 2022

> - Forwards (and usually also backwards) compatible disk format, meaning version updates don't require more than a few minutes of downtime.

How so? Postgres' data formats are forwards (and mostly backwards) compatible with release 8.4 in 2009; where effectively only the catalogs need upgrading. Sure, that can be a lot of data, but no different from MySQL or any other RDBMS with transactional DDL.

> - Replication works across version differences

Logical replecation is available since at least 9.6

boxed · on Feb 25, 2022

Not the dump format. The actual data on disk of the database itself.

If you try upgrading even one major postgres version and you're not aware of this you lose all your data (you don't really as you can roll back to the previous version, but it doesn't even tell you that!)

mattashii · on Feb 25, 2022

That is what I'm talking about, yes.

Page format for tables hasn't changed since 8.4, tuple format hasn't changed in a backwards-incompatible manner since 8.4 (probably earlier).

BTree indexes have changed a bit, but ones created in 9.0.x can still be loaded and used in 14.x.

GIN/GiST/SP-GiST/BRIN all don't seem to have had backwards-incompatible changes, looking at https://why-upgrade.depesz.com/show?from=8.4&to=14.2

The only thing that changed is the catalogs, upgradable through pg_upgrade.

justinclift · on Feb 25, 2022

Pretty much every major PG release clearly states the on-disk format isn't fixed between major versions, and is subject to change:

• https://www.postgresql.org/docs/14/upgrading.html

• https://www.postgresql.org/docs/13/upgrading.html

• https://www.postgresql.org/docs/12/upgrading.html

• https://www.postgresql.org/docs/11/upgrading.html

• https://www.postgresql.org/docs/10/upgrading.html

  For major releases of PostgreSQL, the internal data storage format is subject
  to change, thus complicating upgrades.

mattashii · on Feb 25, 2022

Yes, that's more 'keeping the option open' than 'we do this regularly', and I can't seem to find any documentation that MySQL gives any better guarantee.

Note that for all of 10, 11, 12, 13 and 14 no changes have been made in the storage format of tables that made it mandatory to rewrite any user-defined schema.

I will admit that some changes have been made that make the data not forward-compatible in those versions; but MySQL seems to do that at the minor release level instead of the major release level. MySQL 8 doen't even support minor release downgrades; under PostgreSQL this works just fine.

hans_castorp · on Feb 25, 2022

pg_upgrade will take care of that. And you can still run upgrade from 8.4 to 14 using the --link mode which means no data will be copied - only the system catalogs need to be re-created.

boxed · on Feb 28, 2022

You're totally missing the point. It's like you're saying "other than that, how did you like the play mrs Lincoln?"

pg_upgrade should not be needed at all! It isn't for mysql. Or if it is needed it should be automatic.

Just the command line description and the manual of pg_upgrade makes you want to run away screaming. It's not ok.

throwdbaaway · on March 2, 2022

>> Just the command line description and the manual of pg_upgrade makes you want to run away screaming. It's not ok.

Why is it not ok? This is pg_upgrade:

> Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files. If a future major release ever changes the data storage format in a way that makes the old data format unreadable, pg_upgrade will not be usable for such upgrades. (The community will attempt to avoid such situations.)

This is mysql_upgrade:

> Each time you upgrade MySQL, you should execute mysql_upgrade, which looks for incompatibilities with the upgraded MySQL server:

> * It upgrades the system tables in the mysql schema so that you can take advantage of new privileges or capabilities that might have been added.

> * It upgrades the Performance Schema, INFORMATION_SCHEMA, and sys schema.

> * It examines user schemas.

> If mysql_upgrade finds that a table has a possible incompatibility, it performs a table check and, if problems are found, attempts a table repair.

The description seems comparable, and if anything pg_upgrade looks saner by not attempting silly things like "table repair".

>> pg_upgrade should not be needed at all! It isn't for mysql. Or if it is needed it should be automatic.

It isn't for mysql since 8.0.16, where this becomes an automatic process. Personally, I prefer having an explicit step to perform the upgrade, instead of running a new binary to start the database process and also optionally perform the upgrade at the same time.

hans_castorp · on Feb 25, 2022

> Clustered primary keys. Without these, some types of queries can become rather slow when your table doesn't fit in memory. (Example: a chat log containing many billions of messages, which you're querying by conversion-id.)

https://use-the-index-luke.com/blog/2014-01/unreasonable-def...

SigmundA · on Feb 25, 2022

Clustered indexes can be overused but they are sorely missing from PG, having had them forever in SQL server it was another surprise to see they don't exist in PG and there is even a confusing CLUSTER command to reorder the heap table but does not actually make a clustered index.

Clustered index are great for storage space and IO for common write and retrieval patterns. If your table is accessed in one way most of the time and that way needs to be fast and therefore requires an index a clustered index saves write IO (only writing the table no secondary index) disk space (no secondary index redundantly storing the indexed columns) and retrieval time (no indirection when querying the clustered index).

This is great for certain kinds of tables, for instance log tables that need to written to quickly but queried quickly (usually by time range) and grow quite large and are append only. GIS tables which can also get quite large and are by points can be packed really tight and row found quickly. Entity tables that are mostly retrieved via primary key ID during app usage, you trade a slight performance penalty when using secondary index for maximum performance when retrieving by ID which effect everything in including foreign key checks this again saves space on writing the ID twice to disk. Tables that actually are an index for something else such that the table is kept up to date with a trigger etc and usually exist for specific access pattern and performance.

castorp · on Feb 25, 2022

I have been working with Oracle for more than 20 years now. I think I only had very few situations where an "index organized table" (=clustered index) was useful or actually provided a major performance benefit over a "heap table". So I never really miss them in Postgres.

SigmundA · on Feb 25, 2022

It is such a common access pattern that many database engines always have a clustered index (MySql - InnoDB, Sqlite) whether you use them directly or not.

I like having a choice as there is in Sql Server or Oracle, but for many use cases its a waste to write to a heap and to an index (which is just a hidden IOT) then look up in the index and dereference to the heap both in space and time.

electroly · on Feb 25, 2022

> Well, you can’t do that with an equals filter. But how often do you use non-equals filters like > or < on the primary key?

You can; he's wrong. He missed the vastly more common scenario: your primary key is a composite key and you're filtering on a prefix of that composite key. We do that all the time. No < or > required; you get a range of keys with an equijoin. It's additionally common for the first key in the composite key to be some kind of timestamp or monotonically increasing value, so new writes always go together at the end of the table. This sort of key is absolutely begging to be used with a clustered primary key index.

We have a table with 100+ billion rows that uses this pattern. It uses partition switching to append new data. At this size it is absolutely imperative to reduce the number of indexes to keep insert performance high, and we are always querying multiple consecutive rows based on a prefix equijoin. I'd be a fool to follow the author's advice here.

I suspect the author and I live in very different worlds, and he doesn't know my world exists.

darkr · on Feb 25, 2022

> Replication works across version differences, making upgrades without any downtime at all easier.

Postgres has logical replication these days, which can be used to achieve this

funny_falcon · on Feb 25, 2022

Except logical replication doesn't handle DDL yet. And 2 phase commit (prepared transactions) as well.

aeyes · on Feb 25, 2022

Two phase commit for logical replication is now available in Postgres 14.

The biggest issue I see at the moment is that sequences are not replicated.

jamesfinlayson · on Feb 25, 2022

I haven't used the features but I thought replication and failover were much easier (out of the box at least).

otterley · on Feb 25, 2022

You also don’t usually need connection pooling with MySQL because it can handle thousands of simultaneous connections without falling over.

shrumm · on Feb 25, 2022

I know PG14 significantly improved this situation [0]. I'm not familiar enough with MySQL to know how it compares now though.

[0] https://pganalyze.com/blog/postgres-14-performance-monitorin...

jitl · on Feb 25, 2022

There is Vitess for MySQL, no such cluster manager for Postgres. Citus is now owned by Microsoft, and getting harder to use outside of Microsoft’s cloud. Eg, no more Citus on AWS

yyyk · on Feb 25, 2022

SQL Server is good, but note the system is oriented differently than Postgres - Postgres is very friendly towards programming directly in the DB. In SQL Server world, you can do that, but you'd be much better off with a solution where the logic lives elsewhere (Not counting Reports or Bulk Imports which have MSSQL-native solutions).

jamesfinlayson · on Feb 25, 2022

Very much this - most of my experience is with Postgres but the rough impression I have is that SQL Server is the best of both Postgres and MySQL, and Postgres and MySQL are improving, but not in any great direction towards what SQL Server offers.

fuy · on Feb 25, 2022

From the operations perspective, SQL Server is definitely way better that Postgres - monitoring is such a pain point in Postgres, for example. From application development/feature perspective, it's much closer, and server-side programming in Postgres (if that's what you like/need) is simply better that in SQL Server.

fuy · on Feb 25, 2022

*better than, not better that.

darkr · on Feb 25, 2022

> if none of those apply to you, CockroachDB is a better default.

Though it scales horizontally, you will need to throw at least 10x or more hardware at it to achieve the same throughput

vvern · on Feb 25, 2022

Do you have benchmarks to back that up? The efficiency per core is worse, but I’ve not seen 10x worse. One thing to be careful of is making sure to have parallelism in your benchmark. Single-threaded Postgres may well be 10x faster on workloads of interest, but that’s not very representative of real use.

jordanthoms · on Feb 25, 2022

We switched to CockroachDB for some of our most heavily loaded tables a few years ago - and the scalability and operational simplicity has been great.

I don’t have a like for like benchmark, but at least for writes - we’re running with 5 replicas of each range so that’s 5x the disk write i/o just from that.

Add in some network overhead and running raft etc, and you are going to be needing a lot more resources than for PG to maintain a similar throughput - but adding resources when you can scale horizontally is so much easier that the tradeoff is worth it in my experience.

rad_gruchalski · on Feb 25, 2022

I'd go for YugabyteDB instead of CockroachDB but your point of distributed SQL being better is spot on.

vlunkr · on Feb 25, 2022

Maybe postgres is out of date because even an old version like 9.6 is performant, featureful, and stable?

mmontagna9 · on Feb 25, 2022

> In my experience, PG is always the most out-of-date piece of any system.

Yes