http://www.infinisql.org/docs/overview/#idp37097184 gave me that impression.

mtravis · on Nov 25, 2013

Oh, I thought you said 2PC (two-phased commit).

Yes, InfiniSQL uses two phase locking. The Achilles' Heel is deadlock management. By necessity, the deadlock management will need to be single-threaded (at least as far as I can figure). No arguments there, so deadlock-prone usage patterns may definitely be problematic.

I don't think there's a perfect concurrency management protocol. MVCC is limited by transactionid generation. Predictive locking can't get rolled back once it starts, or limits practical throughput to single partitions.

2PL works. It's not ground-breaking (though incorporating it in the context of inter-thread messaging might be unique). And it will scale fine other than for a type of workload that tends to bring about a lot of deadlocks.

leif · on Nov 26, 2013

HAT (http://www.bailis.org/papers/hat-vldb2014.pdf) looks pretty promising in terms of multi-term transactions. It turns out that you can push the problem off to garbage collection in order to make transaction id generation easy, and garbage collection is easier to be sloppy and heuristic about. The only problem is it isn't clear yet that HATs are as rich as 2PL-based approaches, and that nobody's built an industrial strength implementation yet.

MichaelGG · on Nov 26, 2013

TransactionID generation, as you mentioned it, is probably being limited by the incredible expense of cross-core/socket/etc. sync.

Go single-threaded and divide up a single hardware node (server) into one node per core, and your performance should go way up. You'd want to do something like this anyways, just to avoid NUMA penalties. But treating every core as a separate node is just easy and clean, conceptually. I/O might go into a shared pool - you'd need to experiment.

I've seen this improvement on general purpose software. Running n instances where n=number of cores greatly outperformed running 1 instance across all cores.

Only major design change from one node/proc is that your replicas need to be aware of node placement, so they're on separate hardware. You may even consider taking this to another level, so that I can make sure replicas are on separate racks. Some sort of "availability group" concept might be an easy way to wrap it up.

Also: your docs page clearly says 2PC was chosen (it's in the footnote). Maybe I'm misreading what "basis" means.