This hack is supposed to be for huge data: 10kb or more, thus comfortably more than a page. If the >10kb wall o' code was wrapped in a parse-as-JSON-at-runtime function call which was was preceded by a three-line comment describing a quick and dirty benchmark showing that it saves a useful number of milliseconds on page load in a fairly typical use case, and if the web resource was intended to be loaded many millions of times, I would nod and approve when reviewing the code. The way the original objector writes, it sounds as though nothing would suffice to justify this hack, and certainly not a mere benchmark and 3 lines of comments preceding it. That attitude seems like unreasonable blinkered zealotry, or some other kind of tunnel vision, e.g. someone who has just never thought seriously about the appropriate tradeoffs in maintaining a web resource which gets loaded millions of times a month.
Both proof and explanation seem to be important. Two experiences with explanation made a particular impression on me.
First I found it hard to get much out of a Real Analysis course when I was a grad student. Only partly because of lack of explanations, admittedly. Probably even more of a problem was another kind of cultural mismatch between physical scientist me and the math prof that taught the course. My interest was mostly that I was actually doing path integral Monte Carlo calculations (for my Chemistry thesis) and wanted to make sure I understood the fundamentals. The prof, like many (most?) mathematicians seemed to be more interested in investigating ingeniously weird boundary cases. So the course didn't seem to teach me much about the gotchas that might come up in actual statistics or numerical analysis, and instead more about the ingenuity of mathematicians in constructing absurdly farfetched abuses of e.g. the axiom of choice. But besides that cultural mismatch, lack of an explanatory framework sure didn't help. Thus, I was very happy decades later when I ran across Terence Tao's book on measure theory (available as a free manuscript online), which had a lot of the same kind of material with quite a good framework of motivation and explanation wrapped around its proofs.
I also like Vapnik's _The Nature of Statistical Learning Theory_ which as I understand it is highly parallel to a much longer proof-heavy version of most of the same material. I much preferred this book to the approach in my undergraduate course in statistics. Again, the difference wasn't only lack of explanations (also, e.g., not enough enough grounding in proof or plausibly provable propositions, and too narrowly based in a frequentist worldview assumed without any explicit justification), but the lack of explanatory framework sure didn't help, so later I welcomed Vapnik's explanations of his approach. I have never been motivated to read the proof-heavy parallel book by Vapnik, but I do find it reassuring that it exists in case I ever work with problems that were weird enough to start raising red flags about breaking the comfortable assumptions in my informal understanding of the statistics.
"For no particular reason" may not be quite right. One candidate for a particular reason is the same grouchiness about orthodox[] technology that shows up in many double standards. E.g., chemicals that fail the Ames Test are alarmingly sinister if they are synthetic pesticides, vs. so perfectly fine that only a pedant would ever consider the issue if they are pesticidal toxins present at elevated levels in pest-resistant breeds of crops. Any given number of people killed in a coal mine or nuclear accident is a very big deal compared to the same number killed by bacterial infection related to organic farming. Low-tech paintlike or gluelike goo and gunk is not generally subject to witch hunt standards for sinister toxicity when used to caulk or paint charming fishing boats, but is fair game when used in fracking a formation which contains oil. Microquakes are so irrelevant that only a pedant would ask about them when the facility in question is geothermal, but scary as hell for an oil fracking facility. Bird kills are absolutely intolerable ecological atrocities for oil spills, irrelevant pedantry for wind farms.
[] I don't know a word for the distinction I'm trying to get at here with "orthodox". I mean the way that various important technological niches seem to get a pass: e.g. optics (eyeglasses, cameras), and selective breeding of crop species. It's hard for me to imagine persistent enthusiasm for rumors that eyeglasses cause brain cancer in the way that rumors about low-intensity low-frequency EM radiation (from power lines to cell phones) persist. The distinction seems to be roughly "stuff descended from the early Industrial Revolution, the Scottish Enlightenment, and/or sufficiently hardcore scientific method that the Royal Society would be respectfully impressed."
(responding not to what is in the article, but only to your comment on how difficult it is to study what is more nearly "advanced mathematics")
I got 800 on the 1980s-era math SATs, came in third in the Portland OR area in a math contest in high school, and did OK at Caltech (not in a math major), but I'm no Terry Tao, and I very much doubt I'd've been anything very special in a good math undergrad program. Some years after graduation, I found it challenging but doable to get my mind around a fair fraction of an abstract-algebra-for-math-sophomores textbook, including a reasonable amount of group theory (enough to formalize a significant amount of the proof of Solow theorem as an exercise in HOL Light, and also various parts of the basics of how to get to the famous result on impossibility of a closed-form solution for roots of a quintic).
From what I've seen of real analysis and measure theory (a real analysis course in grad school motivated by practical path integral Monte Carlo calculations, plus various skimming of texts over the years), it'd be similarly manageable to self-learn it.
One problem is that some math topics tend to be poorly treated for self-learning, not because they are insanely difficult but because the author seems never to have stepped back and carefully figured out how to express what is going on in a precise self-contained way, just relying (I guess) on a lot of informal backup from a teaching assistant explaining things behind the scenes. On a small scale, some important bit of notation or terminology can be left undefined, which is usually not too bad with modern search engines but was a potential PITA before that. On a larger scale, I found the treatment of basic category theory in several introductory abstract algebra texts seemed prone to this kind of sloppiness, not taking adequate care to ground definitions and concepts in terms of definitions and concepts that a self-studying student could be expected to know, and that's harder to solve with a search engine, tending to lead into a tangle of much more category theory and abstraction than one needs to know for the purpose at hand. My impression is that mathematicians are worse at this than they need to be, in particular worse than physicists: various things in quantum mechanics seem as nontrivial and slippery as category theory to me, but the physicists seem to be better at introducing it and grounding it. (Admittedly, though, physicists can ground it in a series of motivating concrete experiments, which is an aid to keeping their arguments straight which the mathematicians have to do without.)
I have been much more motivated to study CS-related and machine-learning-related stuff than pure math, and I have been about as motivated to self-study other things (like electronics and history) as pure math, so I have probably put only a handful of man-months into math over the years. If I had put several man-years into it, it seems possible that I could have made progress at a useful fraction of the speed of progress I'd expect from taking college math courses in the usual way.
I think it would be particularly manageable to get up to speed on particular applications by self-study: not an overview of group theory in the abstract, but learning the part of group theory needed to understand the famous proof about roots of the quintic, or something hairier like (some manageable-size fraction of) the proof of the classification of finite simple groups. Still not easy, likely a level harder than teaching oneself programming, but not an incredible intellectual tour de force.
"Myself, only after 5 years of mathematics I'm somehow comfortable to study subjects by myself, and it's still hard."
Serious math seems to be reasonably difficult, self-study or not. Even people taking college courses in the ordinary way are seldom able to coast, right?
As someone self-studying measure theory right now, I completely agree on the quality of math textbooks for more esoteric subjects. It's like the authors expect the books to only be used in conjunction with TAs or classes.
Any advice on how to use those textbooks the best way?
I have used all-uppercase to make a distinction like class vs. instance in variables (in case-sensitive languages in which the class might be an ordinary held-in-a-variable value, like Javascript), and I might do it again. But it's very unusual, and it's also the kind of practice that is more suitable for a 1KLOC project that will receive 100 hours of effort from a single maintainer over its lifetime than for a bigger project with many maintainers and a highly motivated community of attackers.
I don't think there was ever a case when I was tempted to name two functions with different cases, but if I ever had to write a modest-sized 1-maintainer system in which many functions came in exactly two different flavors, I might be tempted. (Perhaps threadsafe locked vs. raw? or some C++-like distinction between raw functions and closure-like class instances which can be used in a function-call context? or raw functions vs. wrappers with the extra plumbing required to let them be invoked from a scripting language?)
afterthought: And now that I think of it, in old C code I think I vaguely remember working with macro vs. function implementations of the same operation distinguished by capitalizing the name, and I don't think the name convention was an urgent problem. C macros can breed various errors, but I think bitbang_macro vs. bitbang would breed pretty much the same errors as BITBANG vs. bitbang.
In those situations, it would have been much more readable to have classFoo vs foo, foo() vs nonThreadSafeFoo(), etc.
The bigger point is that while you can come up with creative ways to take advantage of case-sensitivity, it's not that you would have missed it if the language was case-insensitive. From that point of view, case-sensitivity has no benefit, but only a cost: leads to irritating errors from the compiler, or runtime errors in dynamically typed languages.
If something has no benefit, and only a cost, we should get rid of it.
Why would Carter know about QM? As far as I know, little in a nuclear reactor can be usefully analyzed at the quantum level even today. (And FWIW, even if some things in flight can be usefully analyzed in terms of simple limiting cases of the Navier-Stokes equations, I'd be surprised to hear someone assuming that a Republican president "knows a fair bit about fluid mechanics" just because he was an officer and pilot.)
Most of the nuclear fission stuff that is simple enough that you might hope to analyze with back-of-the-envelope QM has so much energy that it tends to act like a classical particle of negligibly short wavelength (but not enough energy to bring QM back into the picture by QED creating new particles upon collision). And while there is probably various quantum mechanical stuff deeply involved in a power reactor in one way or another --- e.g., electrical conductivity tends to involve band theory, and water is often used as a working fluid and at a fundamental level its thermal properties depend on stuff like hydrogen bonds --- I doubt it was helpful for a naval nuclear reactors guy to try to analyze things at that level in the middle of the last century.
The US Naval Academy has a pretty good science program, and (IIRC) during Carter's era, all midshipmen were trained as engineers first, with the requisite science background, including (presumably) what QM was known in 1946. Moreover, as a nuke officer during that era he was personally vetted by Adm. Hyman Rickover, who had placed legendarily[0] high demands on the intellect, technical skill, moral integrity, and ingenuity of the Nuclear corps.
Rickover was promoted to the rank of vice admiral in 1958, the same year that he was awarded the first of two Congressional Gold Medals. Rickover exercised tight control for the next three decades over the ships, technology, and personnel of the nuclear Navy, interviewing and approving or denying every prospective officer being considered for a nuclear ship.
Adm. Rickover [0] certainly set the standard with personal vetting of nuclear offices, but with the stakes of a reactor mishap being so high, who can really blame him?
Contrast Adm. Rickover's tough stance with the managers in charge of the shuttle program [1] that led to STS-51L Challenger disaster.
The Admiral in charge of the US Navy nuclear propulsion program still personally interviews and has final say on any individual applying for surface or subsurface nuke programs. From what I hear, it's a fairly nerve-wracking experience!
"gov't taxes corporations' profits and individuals' revenues"
Say what? As I understand it, a basic theme of income taxation for me as an individual in the USA, and indeed most (perhaps all?) western countries is that if I buy something for B and sell it for S, I get taxed for S-B income. And as I understand it, that basic theme really is the way that it works for a lot of businesspeople, though they may need to be quite careful to jump through certain hoops (particular kinds of recordkeeping e.g.) to ensure that it works that way reliably. And it is roughly the way it works for individuals not ordinarily considered businesspeople when they buy and sell things like residences and securities, although it's sometimes wrapped up in extra weirdness like special real estate tax categories and short term vs. long term security capital gains.
What country or countries are you referring to?
Or are you just referring to the fact that employees employment expenses are not as eligible for deduction as many business expenses, securities transactions, and real estate transactions? (And, um, bringing in "corporations" for some rhetorical reason that I can't fathom?) That would make it roughly true to say "taxes business profits but taxes labor revenues." But to characterize that as "gov't taxes corporations' profits and individuals' revenues" seems more nearly false.
Also, that is a radically different tax treatment of labor revenues and business revenues, but you don't explain what you find particularly curious about that. For good or for ill, radically different economic policy treatment of labor revenues and business revenues is pretty widespread, not limited to tax policy. E.g., consider how business monopolistic collusion to restrict supply is broadly forbidden even when the collusion is wholly voluntary, while labor unions are not just allowed to collude voluntarily to restrict the supply of labor but supported in actively preventing rivals from providing a supply of labor.
but (1) it's obviously not a very precise term and (2) it would be surprising if all of the relevant techniques were available in an unclassified review from 2006.
"The real trick is to see how well your model extrapolates from the data you have out into the future."
That is the most common way to show the modeller is not shamelessly overfitting.:-| Another way, though, is less common but not vanishingly uncommon: the model may be so much simpler than the data it fits that overfitting is not a plausible explanation. (Roughly there are too many bits of entropy in the match to the data to have been packed into the model no matter how careless or dishonest you might have been about overfitting.) E.g., quantum mechanics is fundamentally pretty simple --- I can't quantify it exactly, but I think 5 pages of LaTeX output, in a sort of telegraphic elevator pitch cheat sheet style, would suffice to explain it to 1903 Einstein or Planck well enough that they could quickly figure out how to do calculations. Indeed, one page might suffice. And there are only a few adjustable parameters (particle/nucleus masses, Planck's constant, and less than a dozen others). And it matches sizable tables of spectroscopic data to more than six significant figures. (Though admittedly I dunno whether the non-hydrogen calculations would have been practical in 1903.) For the usual information-theoretical reasons, overfitting is not a real possibility: even if you don't check QM with spectroscopic measurements on previously unstudied substances, you can be pretty sure that QM is a good model. (Of course you still have to worry about it potentially breaking down in areas you haven't investigated yet, but at least it impressively captures regularities in the area you have investigated.)
It's not just a question of how the model extrapolates from the input data itself. The actual input data may be in question as well, because there are always judgments involved in deciding how to measure, what "unreasonable" datapoints will be discarded, etc.
Read, for example, here:
"It is indisputable that a theory that is inconsistent with empirical data is a poor theory. No theory should be accepted merely because of the beauty of its logic or because it leads to conclusions that are ideologically welcome or politically convenient. Yet it is naive in the extreme to suppose that facts – especially the facts of the social sciences – speak for themselves. Not only is it true that sound analysis is unavoidably a judgment-laden mix of rigorous reasoning (“theory”) with careful observation of the facts; it is also true that the facts themselves are in large part the product of theorizing. ..."
While the general gist of your argument is right, I think there are some non-trivial ways to overfit. There are some 25 constants in the standard model apparently that describe the world around us to enormous precision. This is so little information that of course the trivial 'overfitting by encoding observations directly' will fail, but we could still be overfitting by having an excess number of variables: perhaps there's really some mechanism in neutrino physics that explains neutrino oscillation without needing some constants to describe how it really happens. This might in turn boost tremendously our predictive precision for neutrino oscillation to match the precision of the other more fundamental variables in the model, for example. But I think you're right that it's so little data that we have some strong information theoretic guarantees that at least the model will have predictive power matching the precision of previous measurements.
Well - that's true apart from co-incidence. You can have a very simple theory which says "x is directly caused by y" and there is a lot of good data, and a great fit. But it's kist a co-incidence and breaks down immediately.
Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.
The real test is that you have a theory that is meaningful and has explanatory power. If it grants insight on the mechanisms that are driving the relationships or generating the data and these make sense - you are pretty golden.
Another one is that the theory makes unexpected predictions that you can then test. This is a real winner, and why complex physics is so well regarded.
I think the information theoretic approach to modeling concerns actually implies such "simpler is better" approaches as Occam's Razor. At least that's my take on [http://arxiv.org/abs/cond-mat/9601030], which derives a quantitative form of it.
I haven't read that paper, and the abstract makes my head spin! I'll have a look later, and try and figure out the argument. I agree with you that things like the I-measure are based on the idea that simpler is good, and it works well in practice - both in Machine Learning and in the real world - which is why humans tend to prefer it. But (the paper you cite aside) I don't know of a deep reason why simple is preferred by nature.
Also there is a deep cognitive bias here, perhaps we lack the machinery to understand the world as it really is!
> Occam's razor is a rule of thumb and an aesthetic boon, but nothing more.
Occam's razor is a bit more than that. It isn't just that given a theory X and a theory Y = X + ε, both of which fit the facts, you should prefer X because it's "cleaner" or more aesthetically pleasing or whatever. You should prefer X because you can prove it is more likely to be true.
No, it was a thought experiment I made up, not an exercise I've ever seen performed: how abbreviated a description of quantum mechanics could I get away with and still convey the idea to on-the-eve-of-QM scientists?
The QM equations are naturally very short, the stuff that I would worry about expressing concisely are concepts like what probability amplitude is, how it connects to prior-to-QM concepts of probability, the interpretation of what it means to make an ideal measurement, stuff like that. I don't know of any bright concise formulation of that stuff, and I'm not sure how I'd do it. I am fairly sure, though, that 5 pages could get the job done well enough to connect to spectroscopic observations.
Note also in the original story it was intended to be given to Einstein and Planck, deeply knowledgeable in classical physics, so it'd be natural to use analogies that would be more meaningful to them than to the typical CS/EE-oriented HN reader. For example, I'd probably try to motivate the probability amplitude by detailed mathematical analogy to the wave amplitudes described by the classical wave PDEs that E. and P. knew backwards and forwards, and I don't think a concise version written that way would work as well for a typical member of the HN audience.
I believe he is referring to the 'postulates of quantum mechanics', you can find several formats from a quick google search.
Dirac, 1929:
"The fundamental laws necessary for the mathematical treatment of a large part of physics, and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved."
I think you can do it, but you'd probably want to start with density matrices, or use the Heisenberg picture to keep your wavefunction super-simple. If we're talking to geniuses then maybe we can include a one-off statement, 'if det ρ = 0 so ρ = ψ ψ† for some "column vector" ψ, then the squared magnitudes of ψ's components are probabilities to be in that component's corresponding state.' to get the gist of it.