I'm curious how QuestDB handles dimensions. OLAP support with reasonably large n...

roskilli · on July 28, 2020

FYI M3 is now backed by M3DB, a distributed quorum read/write replicated time-series based columnar store specialized for realtime metrics. You can associate multiple values/timeseries with a single set of dimensions if you use Protobuf's to write data, for more see the storage engine documentation[0]. The current recommendation is not to limit your queries but limit the global data queried per second[1] by a single DB node by using a limit on the number of datapoints (inferred by blocks of datapoints per series). M3DB also uses an inverted index using FST segments that are mmap'd[2] similar to Apache Lucene and Elastic Search to make multi-dimensional searches on very large data sets fast (hundreds of trillions of datapoints, petabytes of data) which is a bit different to traditional columnar databases which focus on column stores and rarely is accompanied by a full text search inverted index.

[0]: https://docs.m3db.io/m3db/architecture/engine/

[1]: https://docs.m3db.io/operational_guide/resource_limits/

[2]: https://fosdem.org/2020/schedule/event/m3db/, https://fosdem.org/2020/schedule/event/m3db/attachments/audi... (PDF)

ignoramous · on July 28, 2020

Recommended reading on FST for the curious: https://blog.burntsushi.net/transducers/

roskilli · on July 28, 2020

Thank you for mentioning that, Andrew's post is really fantastic covering many things altogether: fundamentals, data structure, real world impact and examples.

FridgeSeal · on July 29, 2020

I love FST's and similar structures, they're just such a cool idea.

Anyone know if there are any other similar interesting blog posts/articles about FST's?

hintymad · on July 28, 2020

Thanks, @roskilli! Nice documentation.

bluestreak · on July 28, 2020

We store "dimensions" as table columns with no artificial limits on column count. If you able to send all dimensions in the same message, they will be stored on one row of data. If dimensions are sent as separate messages, current implementation will store them on different rows. This will make columns sparse. We can change that if need be and "update" the same row as dimensions arrive as long as they have the same timestamp value.

There is an option to store set of dimensions separately as asof/splice join separate tables.

hintymad · on July 28, 2020

Thanks for the explanation.

architectonic · on July 28, 2020

Can you handle multiple time dimensions efficiently? We have 3 of them, can one get away without having to physically store "slices" on one of them?

bluestreak · on July 28, 2020

if you can send all three in the same message, Influx Line Protocol for example, we will store them as 3 columns in one table. Does this help?