Hacker Newsnew | past | comments | ask | show | jobs | submit | setr's commentslogin

Isn’t that also like having two watches? You’ll never know the time

If you're on a desert island and you have 2 watches instead of 1, the probability of failure (defined as "don't know the time") within T years goes from p to p^2 + epsilon (where epsilon encapsulates things like correlated manufacturing defects).

So in a way, yes.

The main difference is that "don't know the time" is a trivial consequence, but "crash into a white truck at 70mph" is non-trivial.

But it's the same statistical reasoning.


It's different because the challenge with self-driving is not to know the exact time. You win for simply noticing the discrepancy and stopping.

Imagine if the watch simply tells you if it is safe to jump into the pool (depending on the time it may or may not have water). If watches conflict, you still win by not jumping.


I mean daemon was the previous winner before agent, and that had a solid mystical-djinni element to it. Monkey would have naturally gone the way of the daemon, as software development “matures” and undergoes corporate sterilization

If you discard the human-readability component of it, JSON is an incredibly inefficient choice of encoding. Other than its ubiquity, you should only be using JSON because it’s both human and machine readable (and being human-readable is mainly valuable for debugging)

The SQL standard defines more of an aesthetic than an actual language. Every database just extends it arbitrarily and anything beyond rudimentary queries is borderline guaranteed to be incompatible with other databases.

When it comes to procedural logic in particular… you have almost zero chance you’re dropping into that into another database and it working — even for rudimentary usage.

SQL-land is utterly revolting if you have any belief in standards being important. Voting for Oracle (itself initialized as a shallowly copied dialect of IBM SQL, and deviated arbitrarily) as the thing to call “standard” is just offensive.


I was not aware that IBM copied Ada.

I was aware that EnterpriseDB developed "deep Oracle compatibility" and sold the resulting code to IBM for Db2 several years ago.

I think you are [more than] a bit behind the times?

https://www.cnet.com/culture/ibm-puts-oracle-to-the-sword-wi...


From-before-select has nothing to do with composition as far as I can think of? That’s to solve the auto-complete issue — put the table first so the column list can be filtered.

Things like allowing repeat clauses, compute select before where, etc are what solve for composition issues


That page has a reasonable re-creation, with trivial usage at call-sites, of each missing feature though? The only one that looks a bit revolting is the large pipe example


I’m not clear on how you’re deviating from a normal columnar/OLAP database?

> I found that these columnar stores could also be used to create regular relational database tables.

Doesn’t every columnar store do this? Redshift, IQ, Snowflake, ClickHouse, DuckDB etc

> but it proves that it is possible to structure relational data such that query speeds can be optimal without needing separate indexing structures that have to be maintained.

Doesn’t every columnar database already prove this?


I am not an expert on all the other columnar stores out there; but it is my understanding that they are used almost exclusively for OLAP workloads. By 'regular database tables', I meant those that handle transaction processing (inserts, updates, deletes) along with queries.

My system does analytics well, but it is also very fast with changing data.

I also think that some of those systems (e.g. Duckdb) also use indexes.


They’re used by OLAP workloads because columnar properties fits better — namely, storing data column-wise obviously makes row-wise operations more expensive, and column-wise operations cheaper; this usually corresponds to point look-ups vs aggregations. Which cascades into things like constraint-maintenance being more expensive, row-level triggers becoming a psychotic pattern, etc. Column-wise (de-)compression also doubles-down on this.

They still do all the regular CRUD operations and maintain transactional semantics; they just naturally prefer bulk operations.

Redshift is the most pure take on this I’ve seen; to the point that they simply don’t support most constraints, triggers and data is allocated in 2MB immutable chunks such that non-bulk-operations undergo ridiculous amounts of write amplification and slow to a crawl. Afaik other OLAP databases are not this extreme, and support reasonable throughput on point-operations (and triggers, constraints, etc) — in the sense that it’s definitely slower, but not comically slower. (Aside: Aurora is also a pure take on transactional workloads, such that bulk aggregations are comically slow)

> I also think that some of those systems (e.g. Duckdb) also use indexes.

I’m pretty sure they all use indexes, in the same fashion I expect you to (I’m guessing your system doesn’t do table-scans for every single query). Columnar databases just get indexes like zone-maps for “free”, in the sense that it can simply be applied on top of the actual dataset without having to maintain a separate copy of the data ALA row-wise databases do. So it’s an implicit index automatically generated on every column — not user-maintained or specified. I expect your system does exactly the same (because it would be unreasonable not to)

> My system does analytics well, but it is also very fast with changing data.

Talk more, please & thank you. I expect everything above to be inherent properties/outcomes of the data layout so I’m quite curious what you’ve done


Several of your assumptions are correct.

My project Didgets (short for Data Widgets), started out as a file system replacement. I wanted to create an object store that would store traditional file data, but also make file searches much faster and more powerful than other file systems allow; especially on systems with hundreds of millions of files on them. To enhance this, I wanted to be able to attach contextual tags to each Didget that would make searches much more meaningful without needing to analyze file content during the search.

To facilitate the file operations, I needed data structures to support them. I decided that these data structures (used/free bitmaps, file records, tags, etc.) should be stored and managed within other Didgets that had special handling. Each tag was basically a key-value pair that mapped the Didget ID (key) to a string, number, or other data type (value).

Rather than rely on some external process like Redis to handle tags, I decided to build my own. Each defined tag has a data type and all values for that tag are stored together (like column values in a columnar store). I split the tag handling into two distinct pieces. All the values are deduplicated and reference counted and stored within a 'Values Didget'. The keys (along with pointers to the values) are stored within a "Links Didget'.

This makes analytic functions fast (each unique value is stored only once) and allows for various mapping strategies (one-to-one, one-to-many, many-to-one, or many-to-many). The values and the links are stored within individual blocks that are arranged using hashes and other meta-data constraints. For any given query, usually only a small number of blocks need to be inspected.

I expected analytic operations to be very fast, like with other OLAP systems; but I was pleasantly surprised at how fast I could make traditional OLTP operations run on it.

I have some short demo videos that show not only what it can do, but also benchmark many operations against other databases. Links to the videos are in my user profile.


Afterwards, you can run into a theater and yell “fire!”


Because getting any hardware out of infra-team on premise is utterly miserable, across the board.


That's not the only alternative.

Rent your VPS and add in extra volumes for like $10 per 100GB.


Funny thing but netcup has $10 per 1 TB

Netcup is under-rated but there are also other providers too at lowendbox/lowendtalk and I am interested to try out hetzner too sometime.


And if you want to go even cheaper, check out Hetzner their EX63 (go to custom) > 4x 7.68TB drives for like 140 Euro.

Not counting the fact that Netcup is raided (also Netcup is limited to 8TB on a VPS).

That is like 4.7 Euro /TB. That is like 4$/TB. 6 Euro / TB in a raid 5 setup.

I do not understand why they are not using this new pricing model on their older servers. There the best you can get is like 10 Euro /TB (for the single 15TB U.2).


> Funny thing but netcup has $10 per 1 TB

Nice to know, but I was just guessing at what a reasonable price would be :-)


I always just used it to confirm your last action on a POST —> GET sequence. Eg confirming that your save went through/rejected (the error itself embedded & persisted in the actual page). Or especially if saving doesn’t trigger a refresh so success would be otherwise silent (and thus indistinguishable from failing to click).

You could have the button do some fancy transformation into a save button but I prefer the core page being relatively static (and I really don’t like buttons having state).

It’s the only reasonable scenario for toasts that I can think of though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: