More

ethegwo · 2025-12-25T03:44:46 1766634286

Yes it's Apache 2, thanks for pointing this out, I'll be fixing this.

ethegwo · 2025-12-25T03:43:43 1766634223

It's currently 3MB, and we've done almost nothing to reduce the file size, so we can expect it to get even smaller.

ethegwo · 2025-12-25T03:41:54 1766634114

Owner of Tonbo here. This critique makes sense in a classic web-app model.

What's shifting is workloads. More and more compute runs in short-lived sandboxes: WASM runtimes (browser, edge), Firecracker, etc. These are edge environments, but not just for web applications.

We're exploring a different architecture for these workloads: ephemeral, stateless compute with storage treated as a format rather than a service.

This also maps to how many AI agent service want per-user or per-workspace isolation at large scale, without operating millions of always-on database servers.

If you're happy running a long-lived Postgres service, Neon or Supabase are great choices.

spwa4 · 2025-12-25T11:37:17 1766662637

This makes no sense. DB connections have been part of the "short-lived sandbox" since the very beginning. CGI, PHP, ... all use database connections, and that's way faster and correcter (with proper transactions) than this approach.

And you use Rust ... so you care about speed and correctness. This seems like a very wrong approach.

ethegwo · 2025-12-26T04:18:44 1766722724

CGI/PHP treated database connections as something that's always available. That pushes a lot of hidden complexity onto the database platform: it has to be reachable from anywhere, handle massive fan-out, survive bursty short-lived clients, and remain correct under constant connect/disconnect.

That model worked when you had a small number of stable app servers. It becomes much harder when compute fans out into thousands or millions of short-lived sandboxes.

We're already seeing parts of the data ecosystem move away from this assumption. Projects like Iceberg and DuckDB decouple storage from long-running database services, treating data as durable formats that many ephemeral compute instances can operate on. That's the direction we're exploring as well.

ethegwo · 2025-12-18T04:14:29 1766031269

Yes We'll provide a report to explain how we tradeoff these things, please stay tuned.

ethegwo · 2025-10-30T02:44:19 1761792259

Request the source? I researched and calculated the cumulative percentage of global carbon emissions from major economies since the industrial revolution: - United States: 24% - China: 15% - Russia: 6.7% - Germany: 5.2% - United Kingdom: 4.4% - Japan: 3.8% - India: 3.5% - France: 2.2% - Canada: 1.9% - Ukraine: 1.7%

source from Global Carbon Project, is this reliable?

margalabargala · 2025-10-30T06:10:23 1761804623

I added up the 2023 numbers from here: https://ourworldindata.org/grapher/cumulative-co-emissions?t...

Someone · 2025-10-30T07:47:42 1761810462

You must have made a mistake in doing that. If you add “world” to the selection, Our World in Data adds them up for you, and you only have to divide 27253/181000. That gives you 0.1506, very close to 15%.

https://ourworldindata.org/grapher/cumulative-co-emissions?t...

margalabargala · 2025-10-30T15:48:24 1761839304

Oops. You're very right. Math error on my part :( thank you for the correction.

ethegwo · 2025-08-22T02:51:48 1755831108

Sure I'd love to, could you share your practice or what you are doing on top of Parquet?

ratmice · 2025-08-23T01:08:35 1755911315

My project hasn't really gotten to the level of actually exporting data to any database yet, but basically I'm just using standard openstreetmap data exported to geoparquet format.

That is then combined with a bunch of time series sensor readings combined with a gps data, (I'm basically still working on the device firmware so I don't have any actual data export happening yet)...

But it is a pretty basic combination of timeseries and geospatial data.

ethegwo · 2025-08-21T03:40:50 1755747650

typed-arrow avoids any kinds of runtime side effects: cost, errors, etc.

ethegwo · 2025-06-03T07:57:24 1748937444

Tonbo IO | Product Engineer | Remote | https://tonbo.io/careers/product-engineer

Tonbo IO | Product Engineer | Remote | https://tonbo.io/careers/database-engineer

We are a startup founded in the fall of 2023, working on offering "headless" real-time data analytics under Postgres. We have released several open-source projects that you can check out at https://github.com/tonbo-io.

Back-End Core Stack: Rust, Apache Arrow/Parquet, Open Telemetry

Front-End Core Stack: Python/JavaScript, NodeJs/Deno, Next.js/React, Drizzle, D3.js, WebAssembly

ethegwo · on Dec 1, 2024

We are building tonbo: https://github.com/tonbo-io/tonbo , An embedded KV database allows to use S3 as storage backend, and we are trying to implement SQLite virtual table on it: https://github.com/tonbo-io/sqlite-tonbo a real pay-as-you-go DB.

grapesodaaaaa · on Dec 1, 2024

That’s really cool! I’m personally really interested in serverless DB offerings. I’m not sure if yours scales well, but I always seem to hit the limits of a single RDBMS instance at some point as a product matures.

There are plenty of ways to scale out traditional RDBMS, but serverless offerings make it so easy to scale out.

ethegwo · on Dec 2, 2024

Thanks, easy-to-scale is the first thing we consider, also using S3 as a shared storage service makes architecture easy to achieve this.

ethegwo · on Dec 1, 2024

Random reads and sequential writes are enough to build a log-structured database, but S3 does not really support the latter.

rad_gruchalski · on Dec 1, 2024

One can do sequential writes by simply writing to a chunk objects at a key with the offset in the name. For example, sizes in MBs:

tmp/uploads/object-0, tmp/uploads/object-1024, tmp/uploads/object-2048

would be a rolled up object of size 2048MB + whatever is in the object-2048 file.