Yeah, I see that a lot. People always leave off the last part of the quote, whic...

socialdemocrat · on Jan 14, 2020

So annoying. I would get people telling me to not make an obvious performance improvement that adds no complexity to the code and which is obvious. Yet some basically insist on using the least performant solution possible as somehow being good software engineering. It is insane how rule bound people can get. No wonder religion exits. People just live inventing rules and forcing others to follow them.

dahart · on Jan 14, 2020

I think you’re right, programmers (and people) can be blinded by rules. Knuth never meant to suggest people should use the least performant solution available. He’d be horrified by some of the things people justify with his quote. The idea was always to make good engineering choices from the start, including using tools and techniques that are known to give good performance, but to wait to get low-level and measure and count cycles until the code is mostly done being written and not going to change significantly later. Choosing a less performant option because it’s slower is a bad choice. Choosing a slower option that is easier to refactor as you go, when it’s clear refactoring will happen, over a faster solution that will make refactoring harder, that is a perfectly acceptable engineering decision.

m_mueller · on Jan 14, 2020

there's tons of dogma in programming. likely more than in most professions because it's not really a scientifically driven field. e.g. I'd love to just put all Emacs and VI zealots together in a room and show them Engelbart's 1968 demo, Clockwork Orange style.

socialdemocrat · on Jan 14, 2020

Hahaha that is a good one. I remember as a C++ programmer there was several occasions where I saw a goto statement would have given the cleanest and most maintainable code (typically exiting deeper loops). Yet I always picked more convoluted solutions because I knew what an immense shit-storm I would have cause if I had checked in code with just a single goto statement.

It would not have mattered that I could have provided a rational explanation for why that was a rational choice in that instance. They would have just kept reciting scripture and called me a heretic.

Meanwhile people will let you you commit the worse most unmaintainable code, as long as it doesn't break any the 10 commandments of coding or whatever the equivalent would be.

m_mueller · on Jan 15, 2020

I actually almost wanted to mention goto as an example of this kind of dogma.

davedx · on Jan 14, 2020

Good to include the whole quote.

I wonder if using string builders is really a critical 3%... and how many people who do practice premature optimization actually measure if their optimization of choice is in their program's critical 3%.

kaslai · on Jan 14, 2020

String builders are often not a critical optimization, however the additional cognitive burden on the reader is nearly zero. In some languages, string builders can even overload operator += which makes the type of the object the only visible distinction outside of the final conversion to string.

In languages that have immutable strings, a chain of `+=` operators is basically O(n^2) vs O(n) for a string builder. For how easy the optimization is, there's little excuse to not use them for any bulk append operations.

kragen · on Jan 14, 2020

The standard approach to this in Python is

    rv = []
    for x in y:
        rv.append(f(x))
        if g(x):
            rv.append(h(x))
    return ''.join(rv)

This gives you the same O(N²) to O(N) speedup you would get from a StringBuilder.

More recently, though, I've often been preferring the following construction instead:

    for x in y:
        yield f(x)
        if g(x):
            yield h(x)

This is sometimes actually faster (when you can pass the result to somefile.writelines, for example, which does not append newlines to the items despite its name) and is usually less code. If you want to delegate part of this kind of string generation to another function, in Python 3.3+, you can use `yield from f(x)` rather than `for s in f(x): yield s` or the just `yield f(x)` you use if `f` returns a string, and the delegation is cleaner and more efficient than if you're appending to a list and the other function is internally joining a list to give you a string.

However, if you're optimizing a deeply nested string generator, you're better off using the list approach and passing in the incomplete list to callee functions so they can append to it. Despite the suggestive syntax, at least last time I checked, `yield from` doesn't directly delegate the transmission of the iterated values; on this old netbook, it costs about 240 ns per item per stack level of `yield from`. (By comparison, a simple Python function call and return takes about 420 ns on the same machine.)

But if you really wanted your code to run fast you wouldn't have written it in Python anyway. You'd've used JS, LuaJIT, or Golang. Or maybe Scheme. Or C or Rust. But not Python.

davedx · on Jan 15, 2020

Okay, so I checked this for JavaScript, and it's not actually true -- in Chrome, a vanilla += is faster than pushing into an array and joining.

https://jsperf.com/javascript-concat-vs-join/2

This is why you really should always benchmark. In my view, "premature optimization" is not so much about optimizing too early in a project, it's about writing code a particular way you assume will make it faster without testing first.

kaslai · on Jan 15, 2020

So that means JS strings aren't truly immutable in a modern environment (which is fine). The runtime environment is internally using an approach similar to a string builder, which is a good optimization.

I agree that you shouldn't operate on assumptions alone for a decision like whether or not you should use a string builder. That's where prior experience should come in to play to guide your decisions. For instance, I am not a JS developer, so I have no prior experience to inform a decision to use a builder vs concat in JS.

I cited that case in particular since the slowness of concatenation was called out in the article, and in some languages it actually does make a huge difference at a very small complexity cost.