Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I didn't really understand what the product actually did after reading this blog post or the products page. I found the docs much more edifying:

> Materialize lets you ask questions about your data, and then get the answers in real time.

> Why not just use your database’s built-in functionality to perform these same computations? Because your database often acts as if it’s never been asked that question before, which means it can take a long time to come up with an answer, each and every time you pose the query.

> Materialize instead keeps the results of the queries and incrementally updates them as new data comes in. So, rather than recalculating the answer each time it’s asked, Materialize continually updates the answer and gives you the answer’s current state from memory.

> Importantly, Materialize supports incrementally updating a much broader set of views than is common in traditional databases (e.g. views over multi-way joins with complex aggregations), and can do incremental updates in the presence of arbitrary inserts, updates, and deletes in the input streams.

https://materialize.io/docs/



This reminds me a lot about Noria DB. Wonder if anyone familiar with both can shed any further light?


Indeed, Materialize is quite similar to Noria, and has the Frank McSherry stamp of awesomeness. [0] We know many of the Noria folks and have a lot of respect for them and their work. I also worked on the Noria project for a summer in college, and am a full-time engineer at Materialize now.

The biggest difference is one of intended use. Noria is, first and foremost, a research prototype, intended to explore new ideas in systems research. Materialize, by contrast, is intended to be a rock-solid piece of production infrastructure. (Much of the interesting research, in timely and differential dataflow, is already done.) We've invested a good bit in supporting the thornier bits of SQL, like full joins, nested subqueries, correlated subqueries, variable-precision decimals, and so on. Noria's support for SQL is less extensive; I think the decisions have been guided mostly by what's necessary to run its lobste.rs and HotCRP benchmarks. Make no mistake: Noria is an impressive piece of engineering, but, as an enterprise looking to deploy Noria, there's no one you can pay for support or to implement feature requests.

One area where Noria shines is in partial materialization. Details are in the Noria paper [1], but the tl;dr is that Noria has a lot of smarts around automatically materializing only the subset of the view that is actually accessed, while presently Materialize requires that you explicitly declare what subsets to materialize. We have some plans for how to bring these smarts to Materialize, but we haven't implemented them yet.

Also worth noting is that Materialize's underlying dataflow engine, differential dataflow, has the ability to support iterative computation, while Noria's engine requires an acyclic dataflow graph. We don't yet expose this power in Materialize, but will soon. Put another way: `WITH RECURSIVE` queries are a real and near possibility in Materialize, while (as I understand it) `WITH RECURSIVE` queries would require substantial retooling of Noria's underlying dataflow engine.

One of the creators of Noria, Jon Gjengset, did an interview on Noria [2] that covered some of differences between Noria and differential dataflow from his perspective, which I highly recommend you check out as well!

[0]: https://twitter.com/frankmcsherry/status/1056957760435376129...

[1]: https://jon.tsp.io/papers/osdi18-noria.pdf

[2]: https://notamonadtutorial.com/interview-with-norias-creator-...


I forgot to mention: differential dataflow is capable of providing much stronger consistency guarantees than Noria is. Differential dataflow is consistency preserving—if your source of truth provides strict serializability, differential can also provide strict serializability—while Noria provides only eventual consistency.

Tapping into the consistency-preserving features in Materialize is a bit complicated at the moment, but we're actively working on improving the integration of our consistency infrastructure for MySQL and PostgreSQL upstreams.


Do you have an example of where one might use WITH RECURSIVE?


A classical example is graph reachability. You'd express reachability via WITH RECURSIVE stating that nodes are reachable if there is a path of arbitrary length between them. (Recursion is needed since a plain SQL query can only query for paths up to a fixed length).


It also sounds a bit like InfluxDB to me..


That’s built on top of differential data flow, the same thing underlying Materialize


It’s not, actually. Noria has its own custom dataflow engine.


oops, for some reason I thought that the paper talked about using differential dataflow, but it seems that I was mistaken.


I had the same response. So reading what you've posted here, it appears to be a smart(er) cache.

Do you think it's valuable to have this as a service rather than roll your own solution?


I think the main step forward is that it has an efficient means for calculating what views change when a new datum arrives. Which means that it could, in theory, hold a very large amount of views for relatively low overhead.

It's a surprisingly difficult problem. I wouldn't roll my own.


Yes, absolutely. Entire dissertations have been written on the topic: https://dash.harvard.edu/bitstream/handle/1/14226097/KATE-DI...

Efficiently maintaining views over arbitrarily complex computations is one of the two hard problems in computer science for a reason [0].

[0]: https://martinfowler.com/bliki/TwoHardThings.html


Never let HN haters (among whom I am frequently numbered, it ought to be noted) get you down.

After skimming a bit of the differential dataflow writing I am really impressed. This is deep computer science doing what it does best, which is to do much more with much less.


I’m glad to hear it! One of my favorite ways to view Materialize is as bringing differential dataflow to the masses. Differential dataflow is deep, elegant stuff, but it requires some serious CS chops to grok.

SQL is lacking in elegance but abundant in popularity. Building the SQL translation layer has been a fun exercise in bridging the two worlds.


This comment reminded me of this classic: https://news.ycombinator.com/item?id=9224


The differential dataflow codebase was really polished and optimized last I saw - when they say "demand milliseconds" I think they put the effort into delivering that.

Additionally, given it will be Apache licensed in 4 years, I think it'd be good to ask "Can I wait?" before going all in




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: