Still experimenting with data productivity tooling that doesn't rely on YAML. Been refocused on CLI tooling as a good interface for agentic use recently since Claude Code has gotten so good. Need to do some empirical evaluation on MCP/CLI success rates.
Ugh sorry about that - was trying to bundle the duckdb packages in the docker image and accidentally included them in the prod deploy, and my static hosting is too slow to server those in time. (works in github CI though, of course - I'll need an extra post-deploy validation on the serve latency).
Hey! So sorry about that - thanks for saying something. That's on the initial load of the dashboard link? What's the URL you see it on in console? If you're using the public resolver it will do some CORS validation but should pass for the trilogydata site - are you running locally?
Been exploring the amazing GCAT space dataset - it’s been a good way to drive some dashboard feature experimentation using fun data. Still need to work on my dashboard design skills, though.
Very much agree that to this is the direction data orchestration platforms should go towards - the basic DAG creation can be straightforward, depending on how you do the authoring - (parsing SQL is always the wrong answer, but is tempting) - but backfills, code updates, etc are when it starts to get spicy.
I think this is where it gets interesting. With partition dependency propagation, backfills are just “hey this range of partitions should exist”. Or, your “wants” partitions are probably still active, and you can just taint the existing partitions. This invalidates the existing partitions, so the wants trigger builds again, and existing consumers don’t see the tainted partitions as live. I think things actually get a lot simpler when you stop trying to reason about those data relationships manually!
This is true, but you can get combinatorial complexity explosions, especially with the data modeling patterns for efficiency common at some companies - eg a mix of latest dimensions and historical snapshots, without always having clear delineations about when you're using what. Common example is something like a recursive incremental table that needs to be rebuilt from the first partition seed. Some SQL operations can also be very opaque (syntactically, or in terms of special DB features) as to what partitions are being referenced, especially again when aggregates get involved.
It's absolutely solvable if you're building clean; retrofitting onto existing dataflow is when things get messy, and then managing user/customer expectations of a more strict system. People like to be able to do wild things!
This has been super annoying! I just tell it to make sure the artifact is updated and it usually fixes it, but it's annoying to have to notice/keep an eye on it.
Yeah, minimizing the gap between the semantic layer authoring and adhoc is what you need to do to close that - there has to be a progressive model both for consumption (take this semantic layer, slightly extend/tweak it in an adhoc fashion) and for organically promoting up the adhoc works to the layer.
Right now a lot of semantic tools introduce a big discontinuity in both workflows that keeps the two worlds separate.
Still iterating on Trilogy, a SQL variant with an embedded semantic layer that removes the need for joins and better typing/functions.
Spent some time last month improving array handling + error messages and UX and adding an MCP server option; Claude does pretty well already but there's some syntax/error tweaks to make it simpler for it and humans.
Then pivoting back into scheduling + materialization optimizations (identify common aggregates across several scripts and automatically build the common datasets for reuse).
https://trilogydata.dev/
reply