Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is what Clustrix (YC company) claims to do.


Hi, amalag. Yes, Clustrix is very similar to InfiniSQL (not to mention having been around longer). I believe that InfiniSQL has vastly higher performance at least for the type of workloads that InfiniSQl is currently capable of. InfiniSQL is also open source.

I hope there's room for competition in this space still.


What do you base your performance claim vs Clustrix on?


Here is some back of napkin analysis:

Starting with this benchmark report: http://www.percona.com/files/white-papers/clustrix-tpcc-mysq...

Basically, InfiniSQL does not currently support complex indices, so it can't do a TPC-like transaction.

The maximum throughput on 9 nodes is 128,114 per node per second. I don't know if that's 4 or 8 core nodes. If roughly 10% of transactions are multi-node transactions, then 12,811/node/s for multi-node, and 115,303/node/s for single-node transactions.

I don't know if full redundancy for Clustrix was configured, or a hot spare, so I don't know how many actual usable nodes were configured, but likely fewer than 9. But I don't know the precise number.

Roughly 10% of those transactions are contain records on multiple nodes. Based on 9 nodes, that means about 12811/node/s for distributed transactions combined with 115303/node/s for single node transactions.

InfiniSQL maxsed at over 530,000 multi-node transactions on 12 x 4-core nodes. http://www.infinisql.org/blog/2013/1112/benchmarking-infinis...

That's 44,167 per node.

---------

These were not apples-apples benchmarks, but Clustrix performed about 12,000 multi-node transactions per node per second, along with a whole bunch more single-node transactions.

I don't know how it would perform on the benchmark I used. And I intend to do a tpcc benchmark once InfiniSQL is capable of complex keys (among whatever else it it currently is missing).


Several problems here:

1. Unlike your dataset, the tpcc dataset for the benchmark was not memory resident. Total dataset size was 786G. Just shrinking the dataset to fit in memory would substantially change the numbers.

2. The tpcc workload is much more complex than your benchmark. It doesn't make sense to compare a tpcc transaction, of which there are multiple flavors, to your workload.

3. All Clustrix write transactions are multi-node transactions. There's a 2x data redundancy in the test. We do not have a master-slave model for data distribution. Docs on our data distribution model: http://docs.clustrix.com/display/CLXDOC/Data+Distribution.

-----

Now we've also done tests where the workload is a simple point updates and select on memory resident datasets. For those kinds of workloads, a 20 node cluster can do over 1M TPS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: