Airbnb wrote a few great blog posts about why they built a standardized metrics layer - I think this one gives better technical context on the "what" and "why" than this announcement does:
The interesting bit for me about Minerva (predecessor to Transform) was the engineering rigor that the pipeline allowed. It's pretty easy to write a query that brings a DB to its knees and the source controlled and reviewed intermediary steps for producing metrics seemed like crucial features. If you then have a good catalog of what metrics exist you can let BI tools or business types with their own tools mix and match performant metrics to their hearts delight.
I'm curious what this enables can't be done with a thoughtfully-designed data warehouse. Looking at the "With MetricFlow" and "Without MetricFlow" examples, it seems like it's designed for people who are willing to use a new query language as long as it's not SQL?
On the one hand, the compile targets for MetricFlow and DBT are SQL. So strictly speaking, nothing can be done with these that can't be done with a data warehouse.
SQL doesn't really have a way to define new aggregations well. So taking something like Daily Active Users / Monthly Active Users (DAU/MAU), which is a semi complicated calculation using windowed data, there's no good way to have a repeatable DAU/MAU aggregation concept you could reuse at different levels without having to re-write that sql that defines the aggregation. EG DAU/MAU against the account, or DAU/MAU by user type, or DAU/MAU by region would all be different queries/reports.
Instead if you have a layer that can be given the definitions of how a particular aggregation is structured, and it knows how data is related to each other, you can ask for that metric against different concepts or levels without having to repeat yourself.
Extending that out, if BI tools can understand that layer, then you can get consistent WYSIWYG report building across multiple tools without having to redefine the calculations everywhere.
In practice - complicated time series metrics, especially on top of derived temporal-logic attributes like funnels, activation, etc., are phenomenally difficult to write by hand in SQL for most analysts. We are in the process of switching to dbt metrics and it cuts the effort down for this kind of thing 5x-10x, and the SQL code dbt generates runs signficantly faster too.
Looking at the DBT semantic layer documentation I think I get it. The goal seems to be describing a metric that could be aggregated at multiple levels once instead of defining all of the aggregations separately.
For example, if you want to be able to aggregate revenue by subscription plan and by country over year, month or day, then you would have 6+ different aggregations. The more dimensions and time periods you add, the more the various combinations blow out. (I think the formula is dimension permutations multiplied by the number of time periods.) Having a single definition of the metrics prevents errors from occurring in rote implementations of the metrics. A bit like generics, but for queries.
At this point, I'm curious how they manage to make the integrations with different platforms work. (Ultimately, I think that's why they acquired Transform.)
I think that’s still the outstanding question. Tools like Mode and Hex have already started supporting dbt’s metrics layer, but will big players like Looker and Tableau also move to support it? That’s harder to say
One of Transform's strong suits is their Tableau integration - they have really good tooling for pushing metrics out to Tableau, rather than relying on Tableau to pull them in. Apparently AirBnb was/is still a very dedicated Tableau shop.
Minerva actually had really poor tableau support as of a year ago, that is something Transform improved on. Most of Minerva consumption was done via Superset, GoogleSheets, and email
Seems like a cool place to work, but I'm always a little weary of venture backed open source companies. When the board wants a liquidity event they'll win over users every time- put more features behind a paywall or force everyone onto a SaaS platform.
Anyhoo, all pure speculation. While I've never used it, my friends have great things to say about DBT.
That’s right, they’re solving the same problem, although metric/semantic layers have traditionally lived within BI tools. Looker’s LookML is a great example of a metric layer.
There you define the relationships between tables, which dimensions to expose to your users, and how measures are to be aggregated along which dimensions and at what roll ups. Filters, formatting, and other semantic information rounds out what’s possible, making it easier to build complex reporting without having to pre-build large joined tables in the warehouse.
In theory a semantic layer decoupled from BI can be useful too, as downstream systems can all query a metric. Maybe an operational workflow needs access to revenue numbers or a spreadsheet wants a feed of some fresh aggregates.
The real question is will BI tools adopt a semantic layer they don’t own, giving up control over a key part of their input into their visualizations? Time will tell.
Oracle BI has this separation in the RDP layer. So there is some history of BI tools separating out that layer but the separation doesn't have to result in giving up control. It's more of an organizational decision. Although it's probably not advisable to have wide open access to the semantic layer to the same users creating dashboards, as things can quickly get overcomplicated and hard to maintain.
Because when you have a DAG of 100s of CREATEs that need to be run in dependency, many of which are table materializations and not views, you will end up creating a poorly specified, half baked version of DBT.
- Not just views, but tables, and incremental tables
- Testing framework
I've worked on a few data teams that have all been "copy/paste SQL from GitHub into the Query Editor" and it's (obviously) pretty bad. DBT is super low-lift and professionalizes your data pipeline basically instantly.
Create view is a great V0.5 for a data warehouse and what I recommend people do if possible so they can concentrate on building the right schema, naming standards, etc.
dbt is the V1. You get a lot of tooling, including a proper dag, logging, parametrization. You also get the ability to easily materialize your tables in a convenient format, which is important if (probably when) you figure out consistency is important. Views can take you far, but most orgs will eventually need more, and dbt is designed to be exactly that.
As a side note, moving from views to dbt is actually quite easy. I've done it several times and it's usually taken a couple of developer days to get started and maybe a couple weeks to fully transition.
https://medium.com/airbnb-engineering/how-airbnb-achieved-me...
(As the article notes, Transform's founders are Airbnb vets who worked on Minerva)