Why is code coming out of research labs/universities so bad?
1. DON'T SEE WHY CLEAR CODE MATTERS
Academic projects are typically one-offs, not grounded in a wider context or value chain. Even if the researcher would like to build something long-term useful and robust, they don't have the requisite domain knowledge to go that deep. The problems are more isolated, there's little feedback from other people using your output.
2. DON'T WANT TO WRITE CLEAR CODE
Different incentives between academic research (publications count, citation count...) and industry (code maintainability, modularity, robustness, handling corner cases, performance...). Sometimes direct opposites (fear of being scooped if research too clear and accessible).
3. DON'T KNOW HOW TO WRITE CLEAR CODE
Lack of programming experience. Choosing the right abstraction boundaries and expressing them clearly and succinctly in code is HARD. Code invariants, dependencies, comments, naming things properly...
But it's a skill like any other. Many professional researchers never participated in an industrial project, so they don't know the tools, how to share or collaborate (git, SSH, code dissemination...), so they haven't built that muscle.
The GOOD NEWS is, contrary to popular opinion, it doesn't cost any more time to write good code than bad code (even for a one-off code base). It's just a matter of discipline and experience, and choosing your battles.
What resonates with people, and why, is a rather deep question. Indicative of an arbitrage opportunity (lucrative), if you can really get to the bottom of it.
It would befit someone of your intellect to try to figure out why the post was so popular, instead of an arrogant dismissal.
What makes you think a highly upvoted online discussion is something that resonates with people, especially in the era of strong correlation between anonymous foul-mouthed posts and massive vote manipulation.
But let's discuss that post in case you couldn't assess the level of ignorance of that shit-bag:
- A universal claim, e.g., one starting with "every [javascript] project ...", is fairly easy to debunk (I guess it's fair that a high-school dropout like him did't know that), and lo-and-behold, it did not take me more than 10 minutes of google and github search to find javascript projects with a near-complete absence of code comments, and with variable names resembling the ones that moron was complaining about.
- He is a total hypocrite, as pointed out multiple times on reddit as well as HN, for pissing on other developers about short variable names and yet making a post and comments full of acronyms himself.
- If JS developers are 'inbred peasants' (his own characterization), the fact that one of those visits a machine-learning forum and throws a temper-tantrum at the whole community for variable naming and code comments, only goes further to confirm the impression that the JS community carries some of the least-educated, least-knowledgeable nasty teenagers who just discovered the developer console of a browser they use 24x7 to cast slurs on each other, and now they think they're the gods of computer science.
Even if you ignore all that, the biggest thrust of that shit-post is a wholly subjective one, that variable names he's encountering while reading machine learning code are _not to his liking_. That is it. I could just as well go ahead and say, ctx_h is a perfectly fine variable name, 'ctx' stands for the word 'context' (a well-known shorthand), the underscore is borrowed from the latex convention of subscripting, hence the 'h' is a subscript. And while it is not clear from the name what 'h' should stand for, it's obvious that ctx_h is a special case of some 'context', and it's completely fair to expect the reader to understand this source code in light of the paper associated with it, (which by the way is the source's documentation and, in a sense, a super-polished form of code-comments). Not to mention, this naming convention is practised even more faithfully in the mathematics community, where you would find names like x_i, a_0, all over a theorem or proof (again underscore representing a subscript). And yet my whole argument would be based on a subjective opinion.
While I completely admit that academics, by virtue of being domain-experts first and software-developers second, are more likely to suffer from problems of lack of clean coding and established software-engineering practices, it is far from being a black-and-white case. Not even close. Spending half a decade in a grad school after spending many years in the software industry, and advocating use of modern software-engineering practices, I recently took up work at one of the big software companies, and was shocked to find out the quality of their C++ code was worse than any of the Fortran and C++ codebases I encountered at the university. And personally, I've found machine learning python codes to be a fair bit cleaner than most C++ codes I've come across.
I'm not against criticism, and I think machine learning community could use a lesson or two on software engineering, but if you're up for such undertaking (criticizing the whole community) you better make sure you don't come across as a complete ignoramus and a hypocrite.
Because it quite obviously does, even by your own admission. Are you arguing against yourself now?
That reddit post is clearly tongue-in-cheek, written consciously in an exaggerated voice to spark interesting discussions (which it did) -- not a peer-reviewed journal article. But I have no doubt you're aware of that, please stop trolling.
https://news.ycombinator.com/item?id=14692691
Copy&pasting my response there:
---
Why is code coming out of research labs/universities so bad?
1. DON'T SEE WHY CLEAR CODE MATTERS
Academic projects are typically one-offs, not grounded in a wider context or value chain. Even if the researcher would like to build something long-term useful and robust, they don't have the requisite domain knowledge to go that deep. The problems are more isolated, there's little feedback from other people using your output.
2. DON'T WANT TO WRITE CLEAR CODE
Different incentives between academic research (publications count, citation count...) and industry (code maintainability, modularity, robustness, handling corner cases, performance...). Sometimes direct opposites (fear of being scooped if research too clear and accessible).
3. DON'T KNOW HOW TO WRITE CLEAR CODE
Lack of programming experience. Choosing the right abstraction boundaries and expressing them clearly and succinctly in code is HARD. Code invariants, dependencies, comments, naming things properly...
But it's a skill like any other. Many professional researchers never participated in an industrial project, so they don't know the tools, how to share or collaborate (git, SSH, code dissemination...), so they haven't built that muscle.
The GOOD NEWS is, contrary to popular opinion, it doesn't cost any more time to write good code than bad code (even for a one-off code base). It's just a matter of discipline and experience, and choosing your battles.