I didn't really understand what the product actually did after reading this blog...

yomly · on Feb 19, 2020

This reminds me a lot about Noria DB. Wonder if anyone familiar with both can shed any further light?

benesch · on Feb 19, 2020

Indeed, Materialize is quite similar to Noria, and has the Frank McSherry stamp of awesomeness. [0] We know many of the Noria folks and have a lot of respect for them and their work. I also worked on the Noria project for a summer in college, and am a full-time engineer at Materialize now.

The biggest difference is one of intended use. Noria is, first and foremost, a research prototype, intended to explore new ideas in systems research. Materialize, by contrast, is intended to be a rock-solid piece of production infrastructure. (Much of the interesting research, in timely and differential dataflow, is already done.) We've invested a good bit in supporting the thornier bits of SQL, like full joins, nested subqueries, correlated subqueries, variable-precision decimals, and so on. Noria's support for SQL is less extensive; I think the decisions have been guided mostly by what's necessary to run its lobste.rs and HotCRP benchmarks. Make no mistake: Noria is an impressive piece of engineering, but, as an enterprise looking to deploy Noria, there's no one you can pay for support or to implement feature requests.

One area where Noria shines is in partial materialization. Details are in the Noria paper [1], but the tl;dr is that Noria has a lot of smarts around automatically materializing only the subset of the view that is actually accessed, while presently Materialize requires that you explicitly declare what subsets to materialize. We have some plans for how to bring these smarts to Materialize, but we haven't implemented them yet.

Also worth noting is that Materialize's underlying dataflow engine, differential dataflow, has the ability to support iterative computation, while Noria's engine requires an acyclic dataflow graph. We don't yet expose this power in Materialize, but will soon. Put another way: `WITH RECURSIVE` queries are a real and near possibility in Materialize, while (as I understand it) `WITH RECURSIVE` queries would require substantial retooling of Noria's underlying dataflow engine.

One of the creators of Noria, Jon Gjengset, did an interview on Noria [2] that covered some of differences between Noria and differential dataflow from his perspective, which I highly recommend you check out as well!

[0]: https://twitter.com/frankmcsherry/status/1056957760435376129...

[1]: https://jon.tsp.io/papers/osdi18-noria.pdf

[2]: https://notamonadtutorial.com/interview-with-norias-creator-...

benesch · on Feb 19, 2020

I forgot to mention: differential dataflow is capable of providing much stronger consistency guarantees than Noria is. Differential dataflow is consistency preserving—if your source of truth provides strict serializability, differential can also provide strict serializability—while Noria provides only eventual consistency.

Tapping into the consistency-preserving features in Materialize is a bit complicated at the moment, but we're actively working on improving the integration of our consistency infrastructure for MySQL and PostgreSQL upstreams.

choppaface · on Feb 19, 2020

Do you have an example of where one might use WITH RECURSIVE?

gbrgr · on Feb 22, 2020

A classical example is graph reachability. You'd express reachability via WITH RECURSIVE stating that nodes are reachable if there is a path of arbitrary length between them. (Recursion is needed since a plain SQL query can only query for paths up to a fixed length).

jadbox · on Feb 19, 2020

It also sounds a bit like InfluxDB to me..

Ar-Curunir · on Feb 19, 2020

That’s built on top of differential data flow, the same thing underlying Materialize

benesch · on Feb 19, 2020

It’s not, actually. Noria has its own custom dataflow engine.

Ar-Curunir · on Feb 19, 2020

oops, for some reason I thought that the paper talked about using differential dataflow, but it seems that I was mistaken.

pedalpete · on Feb 18, 2020

I had the same response. So reading what you've posted here, it appears to be a smart(er) cache.

Do you think it's valuable to have this as a service rather than roll your own solution?

jacques_chester · on Feb 18, 2020

I think the main step forward is that it has an efficient means for calculating what views change when a new datum arrives. Which means that it could, in theory, hold a very large amount of views for relatively low overhead.

It's a surprisingly difficult problem. I wouldn't roll my own.

benesch · on Feb 18, 2020

Yes, absolutely. Entire dissertations have been written on the topic: https://dash.harvard.edu/bitstream/handle/1/14226097/KATE-DI...

Efficiently maintaining views over arbitrarily complex computations is one of the two hard problems in computer science for a reason [0].

[0]: https://martinfowler.com/bliki/TwoHardThings.html

jacques_chester · on Feb 18, 2020

Never let HN haters (among whom I am frequently numbered, it ought to be noted) get you down.

After skimming a bit of the differential dataflow writing I am really impressed. This is deep computer science doing what it does best, which is to do much more with much less.

benesch · on Feb 18, 2020

I’m glad to hear it! One of my favorite ways to view Materialize is as bringing differential dataflow to the masses. Differential dataflow is deep, elegant stuff, but it requires some serious CS chops to grok.

SQL is lacking in elegance but abundant in popularity. Building the SQL translation layer has been a fun exercise in bridging the two worlds.

airstrike · on Feb 19, 2020

This comment reminded me of this classic: https://news.ycombinator.com/item?id=9224

Qwuke · on Feb 19, 2020

The differential dataflow codebase was really polished and optimized last I saw - when they say "demand milliseconds" I think they put the effort into delivering that.

Additionally, given it will be Apache licensed in 4 years, I think it'd be good to ask "Can I wait?" before going all in