More

efromvt · 2025-12-15T13:27:37 1765805257

Still experimenting with data productivity tooling that doesn't rely on YAML. Been refocused on CLI tooling as a good interface for agentic use recently since Claude Code has gotten so good. Need to do some empirical evaluation on MCP/CLI success rates.

https://trilogydata.dev/

efromvt · 2025-11-12T16:37:05 1762965425

Ugh sorry about that - was trying to bundle the duckdb packages in the docker image and accidentally included them in the prod deploy, and my static hosting is too slow to server those in time. (works in github CI though, of course - I'll need an extra post-deploy validation on the serve latency).

Reverted, should load now if you have a chance!

efromvt · 2025-11-10T19:38:27 1762803507

Hey! So sorry about that - thanks for saying something. That's on the initial load of the dashboard link? What's the URL you see it on in console? If you're using the public resolver it will do some CORS validation but should pass for the trilogydata site - are you running locally?

efromvt · 2025-11-10T12:07:48 1762776468

Hah, light or dark mode?

efromvt · 2025-09-29T23:34:27 1759188867

Been exploring the amazing GCAT space dataset - it’s been a good way to drive some dashboard feature experimentation using fun data. Still need to work on my dashboard design skills, though.

GCAT: https://planet4589.org/space/gcat/

Dashboard example: https://trilogydata.dev/trilogy-studio-core/#screen=dashboar...

efromvt · 2025-09-29T23:21:25 1759188085

Very much agree that to this is the direction data orchestration platforms should go towards - the basic DAG creation can be straightforward, depending on how you do the authoring - (parsing SQL is always the wrong answer, but is tempting) - but backfills, code updates, etc are when it starts to get spicy.

stuartaxelowen · 2025-09-30T05:38:05 1759210685

I think this is where it gets interesting. With partition dependency propagation, backfills are just “hey this range of partitions should exist”. Or, your “wants” partitions are probably still active, and you can just taint the existing partitions. This invalidates the existing partitions, so the wants trigger builds again, and existing consumers don’t see the tainted partitions as live. I think things actually get a lot simpler when you stop trying to reason about those data relationships manually!

efromvt · 2025-10-13T15:42:31 1760370151

This is true, but you can get combinatorial complexity explosions, especially with the data modeling patterns for efficiency common at some companies - eg a mix of latest dimensions and historical snapshots, without always having clear delineations about when you're using what. Common example is something like a recursive incremental table that needs to be rebuilt from the first partition seed. Some SQL operations can also be very opaque (syntactically, or in terms of special DB features) as to what partitions are being referenced, especially again when aggregates get involved.

It's absolutely solvable if you're building clean; retrofitting onto existing dataflow is when things get messy, and then managing user/customer expectations of a more strict system. People like to be able to do wild things!

efromvt · 2025-09-10T12:18:33 1757506713

This has been super annoying! I just tell it to make sure the artifact is updated and it usually fixes it, but it's annoying to have to notice/keep an eye on it.

efromvt · 2025-08-20T12:34:44 1755693284

MDX really did it well, at the cost of being impenetrable to the average user - the drilldown flow you could get is still hard to beat.

Shameless plug for the list, though - I work on https://github.com/trilogy-data/pytrilogy - semantic layer directly embedded in otherwise (mostly) SQL syntax.

I'll do an equivalent example on the taxi dataset when I have some time.

efromvt · 2025-08-20T12:29:17 1755692957

Yeah, minimizing the gap between the semantic layer authoring and adhoc is what you need to do to close that - there has to be a progressive model both for consumption (take this semantic layer, slightly extend/tweak it in an adhoc fashion) and for organically promoting up the adhoc works to the layer.

Right now a lot of semantic tools introduce a big discontinuity in both workflows that keeps the two worlds separate.

efromvt · 2025-07-28T17:42:47 1753724567

Still iterating on Trilogy, a SQL variant with an embedded semantic layer that removes the need for joins and better typing/functions.

Spent some time last month improving array handling + error messages and UX and adding an MCP server option; Claude does pretty well already but there's some syntax/error tweaks to make it simpler for it and humans.

Then pivoting back into scheduling + materialization optimizations (identify common aggregates across several scripts and automatically build the common datasets for reuse).

Link: https://github.com/trilogy-data/pytrilogy