Hacker Newsnew | past | comments | ask | show | jobs | submit | nycdotnet's commentslogin

I am a C# dev by day and love working with it. I miss interfaces, Linq, and the nicer pattern matching features of C# when using GDScript, but overall GDScript is quite adequate for what it needs to do and the game dev loop feels faster when using it. They can interop as well without too much friction, so if you have the .NET version of Godot, it can have some code in C# where (if?) you need it and other code in GDScript when you don’t.


might be geometry dash.


What OP is describing is Tomb of the Mask or in that genre. There’s a good GIF embedded here showing the navigation mechanic: https://en.wikipedia.org/wiki/Tomb_of_the_Mask


That’s the one!


This is Conway’s Law: You ship your org chart.


The shuttle’s solid rocket boosters splashed down in the ocean via parachute, and were recovered and reused. The main engines and thrusters/rcs were also reused. Only the external tank was disposed. The issue with the shuttle (among many) was that the reuse was not actually economical due to the maintenance required between each launch.


> the reuse was not actually economical

Well - yes :) That is what reusable actually means. Anything's "reusable" if you spend enough money. The idea is it's worth doing.


In some software platforms, the tooling makes it really easy to use a debugger to see what’s happening, so it’s common for everyone on the team to use them all the time.

The comment you’re responding to mentioned pulling code into a function. As an example, if there’s a clever algorithm or technique that optimizes a particular calculation, it’s fine to write code more for the machine to be fast than the human to read as long as it’s tidy in a function that a dev using a debugger can just step over or out of.


More succinct than I managed.


If you want to experience the Neuromancer vibe (but not story line), in 2025, from a contemporaneous source, I can’t recommend the Commodore 64 adaptation enough.


Saw these guys on JoCo a few weeks ago. Great set. Birdhouse in your soul was one of our wedding songs. Hard to believe Flood is 35 years old. Thanks for linking this.


It's amused me for years that among all the songs I listen to that "can't possibly be that old", the album leadin track is self-documenting: "our brand new record // for nineteen ninety"...


Hi boat friend!


Excited for this release! Have you heard from Intel yet? “Core 2 Duo” was the name of one of their processors in the early days of multicore on a single package.

Edit: preordered!


Today, I very literally received a vintage laptop with a Core 2 Duo sticker on it.


This comment aged me 100 years.


Yeah, that lawsuit is probably going to bankrupt them here.


Also a weird way to make your product unsearchable.


Record selector to drive a data grid. Ex: Filter employees by location, or active/terminated, or salary/hourly, etc. and let the user choose one or many of these filters.


Agree. With patterns like this you are leaning on your db server’s CPU - among your most scarce resources - versus doing this work on the client on a relatively cheap app server. At query time your app server knows if $2 or $3 or $4 is null and can elide those query args. Feels bad to use a fast language like Rust on your app servers and then your perf still sucks because your single DB server is asked to contemplate all possibilities on queries like this instead of doing such simple work on your plentiful and cheap app servers.


I've seen many devs extrapolate this thinking too far into sending only the most simple queries and doing all of the record filtering on the application end. This isn't what I think you're saying -- just piggybacking to try and explain further.

The key thing here is to understand that you want the minimal correct query for what you need, not to avoid "making the database work".

The given example is silly because there's additional parameters that must be either NULL or have a value before the query is sent to the DB. You shouldn't send queries like:

    SELECT \* FROM users
    WHERE id = 1234
        AND (NULL IS NULL OR username = NULL)
        AND (NULL IS NULL OR age > NULL)
        AND (NULL IS NULL OR age < NULL)
But you should absolutely send:

    SELECT \* FROM users
    WHERE id = 1234
        AND age > 18
        AND age < 35


While sub-optimal, your first example is probably fine to send and I'd expect to be simplified early during query planning at a negligible cost to the database server.

What you shouldn't send is queries like:

    SELECT \* FROM users
    WHERE ($1 IS NULL OR id = $1)
        AND ($2 IS NULL OR username = $2)
        AND ($3 IS NULL OR age > $3)
        AND ($4 IS NULL OR age < $4)
because now the database (probably) doesn't know the value of the parameters during planning and needs to consider all possibilities.


Interesting. I try to use prepared statements to avoid redoing the planning. But since the database schema is small compared to the data, the cost of query planning is quickly negligible compared to running an generic query that is inefficient.


One of my rules: don't send NULL over the wire, but of course always check for NULL on the server if you're using db functions.


While this is no excuse for sending sloppy queries to the database server, my rule of thumb with databases - as I was told by my elders - is ”if it can be reasonably done in the database, it should be done by the database”. Data base engines are meant to be quite performant at what they do, possibly more than your own code.


Databases aren't magic.

They have limited CPU and IO resources. They can only optimize within the bounds of the current table/index structure. And sometimes they make a bad optimization decision and need to be pushed to do the right thing.

Databases can, for example, sort things. However, if that thing being sorted isn't covered by an index then you are better off doing it in the application where you have a CPU that can do the n log n sort.

Short quips lead to lazy thinking. Learn what your database can and can't do fast and work with it. If something will be just as fast in the application as it would be in the database you should do it in the application.

I've seen the end result of the "do everything in the database" thinking and it has created some of the worst performance bottlenecks in my company. You can do almost everything in the database. That doesn't mean you should.


That’s an optimization, and does not mean that the general rule is not valid.

If that happens to be a bottleneck and you can do better, you should definitely do it in code locally. But these are two ifs that need to evaluate to true


Your bad query can not be a bottleneck but can negatively impact the performance of everyone else.

Databases are highly susceptible to the noisy neighbor problem.

Databases aren't magic. If you can do something better or the same outside the database, you should. 1000 cpu cycles are better spent on the application than the database. You can easily add more application servers.

Your general rule is invalid because of this. It doesn't have to be a bottleneck before it can be a problem. It's a "general rule" I particularly hate because I do performance tuning at my company and have OFTEN seen this be a root cause to negative outcomes. Devs using these sorts of shorthands without understanding what their database is and isn't good at. It's right up there with "premature optimization" which gets parroted at me by jr devs that want to write an n^3 algorithm when an n algorithm exists if they'd just use a data structure besides `List`.

Don't shut your brain off when coding. Sometimes it is better to make the database do something, sometimes it isn't. When that is true is context and situation dependent.


I don't think you're disagreeing with OP. Seems like you both reached the same conclusion through different means and said it differently:

"Sometimes it is better to make the database do something, sometimes it isn't. When that is true is context and situation dependent."

"if it can be reasonably done in the database, it should be done by the database”"

In other words, sometimes it's reasonably better to make the database do something, and sometimes it's unreasonable. Context dependent, of course.


> If that happens to be a bottleneck and you can do better, you should definitely do it in code locally. But these are two ifs that need to evaluate to true

If the OP said what you are saying, I'd probably agree. However, the above statement makes it clear that the OP is saying "put it in the database unless you can prove it doesn't belong there".

That is what I disagree with. There's a lot of reasonable things you can do with a database which aren't the best thing to do from both a system and performance perspective. It is, for example, reasonable to use the sort method on a database. It's also not something you should do without proper covering indexes. Especially if the application can reasonably do the same sort.


Better off doing it in the application? Bro im not pulling that much data to do a top N query


> Short quips lead to lazy thinking. Learn what your database can and can't do fast and work with it.


This is actually an incredible way of articulating something that's been on my mind for quite a while. Thank you for this, I will use this.

The received wisdom is, of course, to lean on the DB as much as possible, put all the business logic in SQL because of course the DB is much more efficient. I myself have always been a big proponent of it.

But, as you rightly point out, you're using up one of your infrastructure's most scarce and hard-to-scale resources - the DB's CPU.


I think there are two different concerns here though:

The article recommends something that may lead to using the wrong query plans. In the "right" conditions, you will do full table scans of all your data for every query.

This is making the DB waste a lot of CPU (and IO).

Wasting resources like that is different from just where to do work that has to be done anyway!

I am a proponent of shifting logic toward the DB, because likely it ends up there anyway and usually you reduce the resource consumption also for the DB to have as much logic as possible in the DB.

The extreme example is you want to sum(numbers) -- it is so much faster to sum it in one roundtrip to the DB, than to do a thousand roundtrips to the DB to fetch the numbers to sum them on the client. The latter is so much more effort also for the DB server's resources.

My point is: Usually it is impossible to meaningfully shift CPU work to the client of the DB, because the client needs the data, so it will ask for it, and looking up the data is the most costly operation in the DB.


The answer is "it depends".

Sum is a good thing to do in the Db because it's low cost to the db and reduces io between the db and app.

Sort can be (depending on indexes) a bad thing for a db because that's CPU time that needs to be burned.

Conditional logic is also (often) terrible for the db because it can break the optimizer in weird ways and is just as easily performed outside the db.

The right action to take is whatever optimizes db resources in the long run. That can sometimes mean shifting to the db, and sometimes it means shifting out of the db.


It's hard to think of situations where you don't want to do the sorting on the DB. If you're sorting small numbers of rows it's cheap enough that it doesn't matter, and if you're sorting large numbers of rows you should be using an index which makes it vastly more efficient than it could be in your app.

And if your conditional logic is breaking the optimizer then the solution is usually to write the query more correctly. I can't think of a single instance where I've ever found moving conditional logic out of a query to be meaningfully more performant. But maybe there's a specific example you have in mind?


> if you're sorting large numbers of rows you should be using an index

Perhaps, depends on what the table is doing and needs to be optimized for.

Indexes are not free, they have a write penalty as they need to be updated every time the data in the index is updated.

> I can't think of a single instance where I've ever found moving conditional logic out of a query to be meaningfully more performant. But maybe there's a specific example you have in mind?

Certainly.

In one of our applications we effectively represent the types as subtables with a root table for the parent type. There were roughly 10 different types with different columns per type.

One way the queries were written, which is slow, was that on insertion the client app would send in (effectively) a block of data with all columns for these types to insert. In the database, the conditional logic would pull out the type id from the input and make the decision on that type information for which subtable would be inserted.

There's really no way to make this something the SQL optimizer can well consume.

The right solution was to instead break this up in the application and per type do the insertions directly into the table type in question. It simplified both sides of the code and ran faster.


I agree that for a) inserts and b) single entity access -- in both these cases the backend can do a lot of the preparation. And your example is both a) and b). We are then in O(1) optimization territory.

If you are only processing a single entity -- the backend should tell the DB exactly what to do. And one shouldn't have if-statements in SQL of course that is "doing it wrong".

But if you have a chunk of 10000 entities like that in your example, all of different types, then you will have to insert some subset of data into all those tables (1000 in one tables, 500 another table, and so on). That logic is well suited for where conditions without much overhead.

But yes for inserts most of the logic can usually be shifted to the DB client as that is where the data resides already. The problem I was talking about was meaningfully shifting for to the client for queries, where the client has no data to work with and must fetch it from the DB.

Let us take your example and turn it into "fetch 10000 such objects". Fetching the right rows for all of them at once using joins and where conditions (+temporary tables and multiple return sets in the same query roundtrip) is going to be more efficient for the DB than the backend first fetching the type, then branching on type, then fetching from another table and so on.


> Fetching the right rows for all of them at once using joins and where conditions (+temporary tables and multiple return sets in the same query roundtrip) is going to be more efficient for the DB than the backend first fetching the type, then branching on type, then fetching from another table and so on.

Nope, not if done correctly.

Now, this isn't to say there's not valid reasons to do it all at once in the DB, the chief among them being ACID requirements. However, from an efficiency standpoint both for the application and the DB the most efficient action is to first request from the parent table and then turn around and, in parallel, send out requests for the child tables of the various types as needed.

Assuming you have a connection pool, the overhead of doing multiple requests in parallel is small. The DB has less data to lookup. The DB has less temporary memory to store (which in our case was a problem). The response io is smaller (not a bunch of empty columns sent back) and both the DB and the downstream application are capable of querying against these tables in parallel.

There is a latency downside in needing the load up the parent table first, if the datasize is large enough then you could overcome that problem by making batch requests to the DB as the parent dataset comes back. Say every 1k values of a given type start the parallel request to load that data.

Splitting the request into these smaller and parallel requests also has systemic benefits to the DB, new writers are able to sneak in which isn't possible when you try to do everything at one go (another issue we had).

The added benefit here is the optimizer in the DB is more likely to do the right thing for the subtable requests than it is for the temp table mess request. A simple fetch is far easier to optimize than a complex one.


> However, from an efficiency standpoint both for the application and the DB the most efficient action is to first request from the parent table and then turn around and, in parallel, send out requests for the child tables of the various types as needed.

This is not true. It is generally considered an anti-pattern.

The fundamental reason it is not true is because it is generally orders of magnitude faster for the DB to do a join or subquery within the same query, rather than perform an entire query, output it, transfer potentially large amounts of data across the network, process that, do the whole thing in reverse, etc.

I don't know how you learned that queries should be split up like that, but it is generally horrendous from a performance standpoint. There is no "correct" way to do it that can compensate for the massive overhead. The correct way is to do it all in a single query with joins and subqueries (including possibly correlated subqueries) whenever possible.

Perhaps you learned this pattern from a services architecture, where it is correct because the pieces of data all sit in different services. But when all the data resides in the same database, splitting up queries is not generally something you want to do unless circumstances force you to.


> This is not true. It is generally considered an anti-pattern.

By who?

> it is generally orders of magnitude faster for the DB to do a join or subquery within the same query, rather than perform an entire query, output it, transfer potentially large amounts of data across the network, process that, do the whole thing in reverse, etc.

If you are talking about a straight relationship, then yes, this is true.

IE

    select a from foo join bar on a=b join baz on b=c
However, when you start talking about the exact scenario I put forward it becomes slower. Primarily because the single thread/connection handling the request also has to store sometimes a non-trivial amount of data in memory while it is doing all the query processing. Especially with some conditional logic in there that makes it hard for the db to immediately return until after it's collected the entire dataset.

This becomes extra true when you start dealing with wide datasets that have a large amount of reuse throughout the dataset.

If Foo has a Bar, and there are 10 million foo and 1000 Bar used throughout them, then it's faster, less network, and less data intense to load up bar separately from foo.

> I don't know how you learned that queries should be split up like that

Profiling, benchmarking, and common sense. Furthermore, if you look at how NoSQL Dbs operate, you'll notice they all came to exactly the same conclusion WRT performance.

> but it is generally horrendous from a performance standpoint.

Says you.

> There is no "correct" way to do it that can compensate for the massive overhead. The correct way is to do it all in a single query with joins and subqueries (including possibly correlated subqueries) whenever possible.

You are VASTLY overestimating the "massive overhead" of separate queries. Assuming your DB isn't in literally another country, it'll be ms at most for the round trip. For some applications dealing with small amount of data that may be a game changer, but for the apps I work on and their scenario that's peanuts compared to the actual data fetching time.

With our joining tables, we aren't sending back the full dataset. We are sending back the ids to load which correspond to the subtypes to load. You can stuff a ton of 64bit values into 1KB on the request. Further, there are ways to structure those 64bit values with some RMDBS to correspond them with the sub tables clustered index (which we do).

The only added overhead is resending the ids. Which, again, I'll point out is peanuts in a modern networking setup. I've benchmarked it, what actually takes the most time in terms of db interaction is authentication when a connection needs to be redone. Everything else pales in comparison.

> But when all the data resides in the same database, splitting up queries is not generally something you want to do unless circumstances force you to.

Correct, in the exact scenario I laid out circumstances force us to. These are not narrow tables, few types, or small datasets that we are dealing with.

But even if they were, there's legitimate reasons to consider this approach. For example, when the joined table is mostly static and highly cacheable then it'd make sense splitting it from a general join to store off in a cache store. That's sort of the whole reason memcache exists as a product.


You seem to be taking some kind of extreme example of a particular use case of yours and trying to make it a general rule. But it's not.

And none of this is "says me", it's standard practice, it's relational databases 101. And none of this is about NoSQL, it's about relational databases. NoSQL performance can be abysmal for trying to do things relational databases are meant for.

And the overhead is not about network latency, it's about all of the overhead involved in serializing, transferring, deserializing, and then doing it all over again in the other direction.

Your comment seems to boil down to:

> If Foo has a Bar, and there are 10 million foo and 1000 Bar used throughout them, then it's faster, less network, and less data intense to load up bar separately from foo.

I assume you're not retrieving 10 million Foo for the user, god forbid -- you're retrieving 20 or 50 or something user-friendly. Then you should join to Bar. It is slower and more overhead to load up Bar separate from Foo. It is an anti-pattern.

If you are getting results to the contrary, your query may not be written correctly -- e.g. you are joining 10 million rows of Foo to Bar in a subquery without a WHERE clause, and then only applying the WHERE at a higher level (in which case one solution is to move the WHERE clause into the subquery). Or your tables may not be architected suitably for the queries you need to perform, and you need to revisit your normalization strategy.

Again, there are super-super-complex queries where yes it becomes necessary to split them up. But that is not the "rule", it is not the starting point -- it is what you do only when you've exhausted all possible options of keeping it in the query. It is never a recommendation of how to use databases in a general sense, which is what you are suggesting.


> use case of yours and trying to make it a general rule

This is a fair critique. Definitely our system is a bit unique in what it works with and the amount of random data it needs to pull together.

> it's about all of the overhead involved in serializing, transferring, deserializing, and all the way back.

Serializing and deserializing are typically not a huge cost in DB communications. Most DB protocols have binary data transfer protocols which minimize the amount of effort on server or client side needed to transform the data into native language datatypes. It's not going to be a Json protocol.

Transfer can be a problem, though, if the dataset is large.

> I assume you're not retrieving 10 million Foo for the user, god forbid

In our most extreme cases, yeah we are actually pulling 10 million foo. Though a lot of our ETL backend is where these big data requests are happening as the upstream data is being processed. That's primarily where I end up working rather than the frontend service.

And I'll agree with you. If you are talking about requests which result in the order of 10 to 100 items then yes, it's faster to do that all within the database. It depends (which is what I've been trying to communicate throughout this thread).

> you are joining 10 million rows of Foo to Bar in a subquery without a WHERE clause

No, properly formed SQL. The issue is the mass of data being transferred and, as I mentioned earlier, the temporary memory being stored in the DB while it waits to transfer everything to the application.

Splitting things into the smaller and multiple queries ends up being faster for us because the DB doesn't end up storing as much temp data, nor does it end up serializing a bunch of `null` values which ultimately take up a significant chunk of the transfer.

Also, you should recognize that now you are talking about query structure that it's not universal on what's the best/fastest way to structure a query. What's good for postgresql might be bad for mssql.


> The received wisdom is, of course, to lean on the DB as much as possible, put all the business logic in SQL because of course the DB is much more efficient. I myself have always been a big proponent of it.

This is not received wisdom at all and the one edict I have when leading a project is no stored procedures for any OLTP functionality.

Stored Procs make everything about your standard DevOps and SDLC process harder - branching, blue green deployments and rolling back deployments.


>Stored Procs make everything about your standard DevOps and SDLC process harder - branching, blue green deployments and rolling back deployments.

There is a naming/namespacing strategy incorporating a immutable version identifier that makes this easier, which I have described here:

https://news.ycombinator.com/item?id=35648974

Note that this requires a strategy for cleaning up old procedures.

It also is possible to individually hash each procedure, which is more sophisticated, and would allow for incremental creation of new procedures.


That’s actually an ingenious solution. I can’t find any flaws in it.


There's a different reason to lean on the DB: it's the final arbiter of your data.

Much harder to create bad/poisoned data if the DB has a constraint on it (primary, foreign, check, etc) than if you have to remember it in your application (and unless you know what serializable transactions are, you are likely doing it wrong).

Also you can't do indexes outside of the DB (well, you can try).


Replying to this whole sub-thread, not just this post specifically;

All SQL advice has to take _context_ into account. In SQL, perhaps more than anywhere else, context matters. There's lots of excellent SQL advice, but most of it is bound to a specific context, and in a different context it's bad advice.

Take for example the parent comment above; In their context the CPU of the database server is their constraining resource. I'm guessing the database if "close" to the app servers (ie low network latency, high bandwidth), and I'm also guessing the app developers "own" the database. In this context moving CPU to the app server makes complete sense. Client-side validation of data makes sense because they are the only client.

Of course if the context changes, then the advice has to change as well. If the network bandwidth to the server was constrained (cost, distance etc) then transporting the smallest amount of data becomes important. In this case it doesn't matter if the filter is more work for the server, the goal is the smallest result set.

And so it goes. Write-heavy systems prefer fewer indexes. Read-heavy systems prefer lots of indexes. Databases where the data client is untrusted need more validation, relation integrity, access control - databases with a trusted client need less of that.

In my career I've followed a lot of good SQL advice - advice that was good for my context. I've also broken a lot of SQL "rules" because those rules were not compatible, or were harmful, in my context.

So my advice is this - understand your own context. Understand where you are constrained, and where you have plenty. And tailor your patterns around those parameters.


The most sensible and pragmatic advice in this thread.


I think the conventional wisdom is to lean on the DB to maintain data integrity, using constraints, foreign keys, etc.; not to use it to run actual business logic.


This is Brent Ozar's old theme.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: