I'm not sure i understand the wild hype here in this thread then.
Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.
Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.
Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.
But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.
How did you arrive at "ridiculous"? What we're seeing here is incredible progress over what we had a year ago. Even ARC-AGI-2 is now at over 50%. Given that this sort of process is also being applied to AI development itself, it's really not clear to me that humans would be a valuable component in knowledge work for much longer.
It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation. One that is collective, one that will accelerate human abilities far beyond what the academic establishment is currently capable of, but that is still fundamentally organic. I don't see a problem with this--AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful. This is what we tell children so that they won't shit their beds at night, otherwise they get a spanking. The real world is not composed of rewards and punishments.
It does seem that the venn diagram of "roko's basilisk" believers and "AGI is coming within our lifetimes" believers is nearly a circle. Would be nice if there were some less... religious... arguments for AGI's imminence.
I think the “Roko’s Basilisk” thing is mostly a way for readers of Nick Land to explain part of his philosophical perspective without the need for, say, an actual background in philosphy. But the simplicity reduces his nuanced thought into a call for a sheeplike herd—they don’t even need a shepherd! Or perhaps there is, but he is always yet to come…best to stay in line anyway, he might be just around the corner.
> It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation.
To me that doesn't sound qualitatively different from a PhD student. Are they just cognitive augmentation for their mentor?
In any case, I wasn't trying to argue that this system as-is is AGI, but just that it's no longer "ridiculous", and that this to me looks like a herald of AGI, as the portion being done by humans gets smaller and smaller
People would say the same thing about a calculator, or computation in general. Just like any machine it must be constructed purposefully to be useful, and once we require something which exceeds that purpose it must be constructed once again. Only time will tell the limits of human intelligence, now that AI is integrating into society and industry.
>AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful.
>The real world is not composed of rewards and punishments.
Most "AGI advocates" say that AGI is coming, sooner rather than later, and it will fundamentally reshape our world. On its own that's purely descriptive. In my experience, most of the alleged "smiting" comes from the skeptics simply being wrong about this. Rarely there's talk of explicit rewards and punishments.
I should be the target audience for this stuff, but I honestly can't name a single person who believes in this "Roko's basilisk" thing. To my knowledge, even the original author abandoned it. There probably are a small handful out there, but I've never seen 'em myself.
> it's really not clear to me that humans would be a valuable component in knowledge work for much longer.
To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.
> Even ARC-AGI-2 is now at over 50%.
Any measure of "are we close to AGI" is as scientifically meaningful as "are we close to a warp drive" because all anyone has to go on at this point is pure speculation. In my opinion, we should all strive to be better scientists and think more carefully about what an observation is supposed to mean before we tout it as evidence. Despite the name, there is no evidence that ARC-AGI tests for AGI.
> To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.
Unlike space colonisation, there are immediate economic rewards from producing even modest improvements in AI models. As such, we should expect much faster progress in AI than space colonisation.
But it could still turn out the same way, for all we know. I just think that's unlikely.
The minerals in the asteroid belt are estimated to be worth in the $100s of quintillions. I would say that’s a decent economic incentive to develop space exploration (not necessarily colonization, but it may make it easier).
Not going to happen as long as the society we live in has this big of a hard on for capitalism and working yourself to the bone is seen as a virtue. Every time there’s a productivity boost, the newly gained free time is immediately consumed by more work. It’s a sick version of Parkinson’s law where work is infinite.
Let me put it like this: I expect AI to replace much of human wage labor over the next 20 years and push many of us, and myself almost certainly included, into premature retirement. I'm personally concerned that in a few years, I'll find my software proficiency to be as useful as my chess proficiency today is useful to Stockfish. I am afraid of a massive social upheaval both for myself and my family, and for society at large.
Here “much of” is doing the heavy lifting. Are you willing to commit to a percentage or a range?
I work at an insurance company and I can’t see AI replacing even 10% of the employees here. Too much of what we do is locked up in decades-old proprietary databases that cannot be replaced for legal reasons. We still rely on paper mail for a huge amount of communication with policyholders. The decisions we make on a daily basis can’t be trusted to AI for legal reasons. If AI caused even a 1% increase in false rejections of claims it would be an enormous liability issue.
Yes, absolutely willing to commit. I can't find a single reliable source, but from what I gather, over 70% of people in the West do "pure knowledge work", which doesn't include any embodied actuvities. I am happy to put my money that these jobs will start being fully taken over by AI rapidly soon (if they aren't already), and that by 2035, less than 50% of us will have a job that doesn't require "being there".
And regarding your example of an insurance company, I'm not sure about that industry, but seeing the transformation of banking over the last decade to fully digital providers like Revolut, I would expect similar disruption there.
I would easily take the other side of this bet. It just reminds me when everyone was sure back in 2010 that we’d have self driving cars within 10 years and human drivers would be obsolete. Today replacing human drivers fully is still about 10 years away.
Yes, getting the timelines right is near impossible, but the trajectory is clear to me, both on AI taking over pure knowledge work and on self-driving cars replacing human drivers. For the latter, there's a lot of inertia and legalities to overcome, and scaling physical things is hard in general, but Waymo alone crossed 450,000 weekly paid rides last month [0], and now that it's self-driving on highways too, and is slated to launch in London and Tokyo this year, it seems to me that there's no serious remaining technical barrier to it replacing human drivers.
As for a bet, yes, I'd really be happy to put my money where my mouth is, if you're familiar with any long bets platform that accepts pseudonymous users.
There are other bounds here at play that are often not talked about.
Ai runs on computers. Consider the undecidability of Rices theorem. Where compiled code of non trivial statements may or may not be error free. Even an ai can’t guarantee its compiled code is error free. Not because it wouldn’t write sufficient code that solves a problem, but the code it writes is bounded by other externalities. Undecidability in general makes the dream of generative ai considerably more challenging than how it’s being ‘sold.
You don’t even need AGI for that though, just unbounded investor enthusiasm and a regulatory environment that favors AI providers at the expense of everyone else.
My point is there are number of things that can cause large scale unemployment in the next 20 years and it doesn’t make sense to worry about AGI specifically while ignoring all of the other equally likely root causes (like a western descent into oligarchy and crony capitalism, just to name one).
This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...
Recent case:
I have a bar with a number of weights supported on either end:
|---+-+-//-+-+---|
What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.
I wonder what the cause of that kind of blindness is.
The problem is unclear. I think you have a labelled graph G=(V, E) with labels c:V->R, such that each node in V consists of a triple (L, R, S) where L is a sequence of weights are on the left, R is a sequence of weights that are on the right, and S is a set of weight that have been taken off. Define c(L, R, S) to be the centre of mass. Introduce an undirected edge e={(L, R, S), (L', R', S')} between (L, R, S) and (L', R', S') either if (i) (L', R', S') results from taking the first weight off L and adding it to S, or (ii) (L', R', S') results from taking the first weight off R and adding it to S, or (iii) (L', R', S') results from taking a weight from W and adding it to L, or (iv) (L', R', S') results from taking a weight from W and adding it to R.
There is a starting node (L_0, R_0, {}) and an ending node ({}, {}, W) , with the latter having L=R={}.
I think you're trying to find the path (L_n, R_n, S_n) from the starting node to the ending node that minimises the maximum absolute value of c(L_n, R_n, S_n).
That problem is not clearly stated, so if you’re pasting that into an AI verbatim you won’t get the answer you’re looking for.
My guess is: first move the weights to the middle, and only then remove them.
However “weights” and “bar” might confuse both machines and people into thinking that this is related to weight lifting, where there’s two stops on the bar preventing the weights from being moved to the middle.
The problem is stated clearly enough that humans that we ask the question of will sooner or later see that there is an optimum and that that optimum relies on understanding.
And no, the problem is not 'not clearly stated'. It is complete as it is and you are wrong about your guess.
And if machines and people think this is related to weight lifting then they're free to ask follow up questions. But even in the weight lifting case the answer is the same.
Illusion of transparency. You are imagining yourself asking this question, while standing in the gym and looking at the bar (or something like this). I, for example, have no idea how the weights are attached and which removal actions are allowed.
Yeah, LLMs have a tendency to run with some interpretation of a question without asking follow-up questions. Probably, it's a consequence of RLHFing them in that way.
And none of those details matter to solve the problem correctly. I'm purposefully not putting any answers here because I want to see if future generations of these tools suddenly see the non-obvious solution. But you are right about the fact that the details matter, one detail is mentioned very explicitly that holds the key.
Sure they, do, the problem makes no sense as stated. The solution to the stated problem is to remove all weights all at once, solved. Or even two at a time, opposite the centre of gravity. Solved, but not what you're asking I assume?
You didn't even label your ASCII art, so I've no clue what you mean, are the bars at the end the supports or weights? Can I only remove one weight at a time? Initially I assumed you mean a weightlifting bar the weights on which can only be removed from its ends. Is that the case or what? What's the double slash in the middle?
Also: "what order and/or arrangement or of removing the weights" this isn't even correct English. Arrangement of removing the weights? State the problem clearly, from first principles, like you were talking to a 5 year old.
The sibling comment is correct, you're clearly picturing something in your mind that you're failing to properly describe. It seems obvious to you, but it's not.
> Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.
This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.
Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.
Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.
"the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."
"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"
Not saying it's not an amazing setup, i just don't understand the word "AI" being used like this when it's the setup / system that's brilliant in conjunction with absolute experts.
Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.
> There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.
so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"
I do / have done research in building deep learning models and custom / novel attention layers, architectures, etc., and AI (ChatGPT) is tremendously helpful in facilitating (semantic) search for papers in areas where you may not quite know the magic key words / terminology for what you are looking for. It is also very good at linking you to ideas / papers that you might not have realized were related.
I also found it can be helpful when exploring your mathematical intuitions on something, e.g. like how a dropout layer might effect learned weights and matrix properties, etc. Sometimes it will find some obscure rigorous math that can be very enlightening or relevant to correcting clumsy intuitions.
Dunno why you are being downvoted, probably cope. It is well known by now that antidepressants are only marginally effective on average [1-2]. You're right they should probably only be prescribed for quite severe or treatment-resistant depression. Although the treatment-by-severity effect has been somewhat disputed [3-4], it has rough support [5], and makes sense since it is dubious that we should be giving ineffective medication with serious costs and side-effects to people with moderate depression.
My take is pessimsitic estimates of AD effectiveness assume you get one Rx and don't follow up and adjust dose and medication choice. I was lucky when I took ADs to have a good primary care doc who had a psychiatric nurse practitioner working at his office and being a good self-advocate.
The "sequential treatment" or "tailored treatment" approach is at least plausible and what is done in practice, yes, if the prescribing doctor is good, and if this is feasible for the patient.
However, since this takes time, and most depression is temporary, it is hard to know if you really are tailoring the medication to the person in many cases, or it has just been long enough you are seeing regression to the mean (or a placebo response, which is still strong even in treatment-resistant depression https://jamanetwork.com/journals/jamanetworkopen/fullarticle...).
There aren't really any double-blinded or even just properly placebo-controlled / no-treatment controlled studies to test this, but the closest thing to looking at the sequential approach also doesn't find very impressive results (https://bmjopen.bmj.com/content/13/7/e063095.abstract).
I do believe the drugs help some people, and almost certainly take some experimentation / tailoring. The average effects are just very weak.
This is great, there is still so much potential in AI once we move beyond LLMs to specialized approaches like this.
EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.
EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.
"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"
It is far more than an LLM, and math != "language".
> Aristotle integrates three main components (...)
The second one being backed by a model.
> It is far more than an LLM
It's an LLM with a bunch of tools around it, and a slightly different runtime that ChatGPT. It's "only" that, but people - even here, of all places - keep underestimating just how much power there is in that.
Transformer != LLM. See my edited top-level post. Just because Aristotle uses a transformer doesn't mean it is an LLM, just as Vision Transformers and AlphaFold use transformers but are not LLMs.
LLM = Large Language Model. Large refers to both the number of parameters (and in practice, depth) of the model, and also implicitly the amount of data used for training, and "language" means human (i.e. written, spoken) language. A Vision Transformer is not an LLM because it is trained on images, and AlphaFold is not an LLM because it is trained molecular configurations.
Aristotle works heavily with formalized LEAN statements and expressions. While you can certainly argue this is a language of sorts, it is not at all the same "language" as the "language" in LLMs. Calling Aristotle an "LLM" just because it has a transformer is more misleading than truthful, because every other single aspect of it is far more clever and involved.
Sigh. If I start with a pre-trained LLM architecture, and then do extensive further training / fine-tuning with different data and loss functions and custom similarity metrics for specialized search and specialized training procedures, and use feedback from other automated systems, we are far, far more than an LLM. That's the point. Calling something like this an LLM is as deeply misleading as calling AlphaFold an LLM. These tools goes far beyond simple LLMs. The special losses and metrics are really so important here and are why these tools can be so game-changing.
In this context, we're not even talking about "math" (as a broad, abstract concept). We're strictly talking about converting English to Lean. Both are just languages. Lean isn't just something that can be a language. It's a language.
There is no reason or framing where you can say Aristotle isn't a language model.
That's true, and a good fundamental point. But here it's much simpler than that: math is a language the same way code is, and if there's one thing LLMs excel at, it's reading and writing code and translating back and forth between code and natural language.
> It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.
I think it's because it comes off as you are saying that we should move off of GenAI, and alot of people use LLM when they mean GenAI.
Ugh, you're right. This was not intended. Conflating LLMs with GenAI is a serious error, but you're right, it is obviously a far more common error than I realized. I clearly should have said "move beyond solely LLMs" or "move beyond LLMs in isolation", perhaps this would have avoided the confusion.
This is a really hopeful result for GenAI (fitting deep models tuned by gradient descent on large amounts of data), and IMO this is possible because of specific domain knowledge and approaches that aren't there in the usual LLM approaches.
1. "The search algorithm is a highly parallel Monte Carlo Graph Search (MCGS) using a large transformer as its policy and value functon." ... "We use a generative policy to take progressively widened [7] samples from the large action space of Lean tactics, conditioning on the Lean proof state, proof history, and, if available, an informal proof. We use the same model and prompt (up to a task token) to compute the value function which guides the search."
See that 'large transformer' phrase? That's where the LLM is involved.
2. "A lemma-based informal reasoning system which generates informal proofs of mathematical state-ments, breaks these proofs down into lemmas, formalizes each lemma into Lean, and iterates this process based on formal feedback" ... "First, the actions it generates consist of informal comments in addition to Lean tactics. Second, it uses a hidden chain of thought with a dynamically set thinking budget before predicting an action."
Unless you're proposing that this team solved AGI, "chain of thought" is a specific term of art in LLMs.
3. "A geometry solver which solves plane geometry problems outside of Lean using an approach based on AlphaGeometry [45]." ... following the reference: "AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. "
AlphaGeometry, like all of Deepmind's Alpha tools, is an LLM.
Instead of accusing people of not reading the paper, perhaps you should put some thought into what the things in the paper actually represent.
If you think "transformer" = LLM, you don't understand the basic terminology of the field. This is like calling AlphaFold an LLM because it uses a transformer.
No, it isn't. They call out ExIt as an inspiration as well as AlphaZero, and the implementation of these things (available in many of their authors' papers) is almost indistinguishable from LLMs. The architecture isn't novel, which is why this paper is about the pipeline instead of about any of the actual processing tools. Getting prickly about meaningless terminology differences is definitely your right, but for anyone who isn't trying to define a policy algorithm for a transformer network, the difference is immaterial to understanding the computation involved.
Equating LLMs and transformers is not a meaningless terminology difference at all, Aristotle is so different from the things people call LLMs in terms of training data, loss function, and training that this is a grievous error.
"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"
It is far more than an LLM, and math != "language".
You cannot say if this is a substantial change or not, because you need to know by how much the groups actually differ on average, i.e. you need the unstandardized effect size, expressed as a mean difference in the scale sum scores, or as an actual percentage of symptoms reduced, or etc. In general, there are monstrous issues with standardized mean differences, even setting aside the interpretability issues [1-3].
Good point. Would it be roughly accurate to say: "consider someone who's more depressed than 75% of the *study treated* population becoming completely average *among the study treated population*"?
Nope, you can't say how many people return to average from standardized effect sizes. I wish we had a standardized effect size that was more useful and actually meant something. Cohen actually proposed something called a U3 statistic that told us the percent overlap of two distributions, but that still doesn't tell us anything meaningful about practical significance.
You can't make decisions / determine clinical value from standardized effect sizes sadly, so when I see studies like this, my assumption is unfortunately that the researchers care only about publishing, and not about making their findings useful :(
Unfortunately, if exercise is only nearly as effective as therapy for depression, it may mean that the benefits of exercise are not actually really clinically observable, if measured properly and not just based on arbitrary statistical significance.
Standardized effect sizes like the ones reported here have no clinical meaning, they are purely statistical. To measure if these kinds of changes matter, you need to determine the Minimal (Clinically) Important Difference [1-2]. I.e. can clinicians (or patients) even notice the observed statistical difference.
In practice, this is a change of about 3-5 points on most 20+ item rating scales, or a relative reduction of 20-30% of the total (sum) score of the scale [1-2]. Unfortunately, anti-depressants are under or just barely reach this threshold [3-4], and so should be widely to be considered ineffective or only borderline effective, on average. Of course this is complicated by the fact that some people get worse on these treatments, and some people experience dramatic improvements, but, still, the point is, depression is extremely hard to treat.
EDIT: There is less data on MCIDs for therapy, but at least one review suggests therapy effects can be in the 10+ point range [5]. But the way the exercise study is presented, with a standardized effect size, we can have no idea if the results matter at all [6].
[2] Masson, S. C., & Tejani, A. M. (2013). Minimum clinically important differences identified for commonly used depression rating scales. Journal of clinical epidemiology, 66(7), 805-807. [https://www.jclinepi.com/article/S0895-4356(13)00056-5/fullt...]
[5] Cuijpers, P., Karyotaki, E., Weitz, E., Andersson, G., Hollon, S. D., & van Straten, A. (2014). The effects of psychotherapies for major depression in adults on remission, recovery and improvement: a meta-analysis. Journal of affective disorders, 159, 118–126. https://doi.org/10.1016/j.jad.2014.02.026 [https://pubmed.ncbi.nlm.nih.gov/24679399/]
[6] Pogrow, S. (2019). How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings. The American Statistician, 73(sup1), 223–234. https://doi.org/10.1080/00031305.2018.1549101
It means nothing, standardized effect sizes have no clinical meaning here, they are purely statistical. To measure if these kinds of changes matter, you need to determine the Minimal (Clinically) Important Difference [1-2]. I.e. can clinicians (or patients) even notice the observed statistical difference.
In practice, this is a change of about 3-5 points on most 20+ item rating scales, or a relative reduction of 20-30% of the total (sum) score of the scale [1-2]. Unfortunately, anti-depressants are under or just barely reach this threshold [3-4], and so should be widely to be considered ineffective or only borderline effective, on average. Of course this is complicated by the fact that some people get worse on these treatments, and some people experience dramatic improvements, but, still, the point is, depression is extremely hard to treat.
Unfortunately, this also means that if exercise is only nearly as effective as therapy for depression, it may mean that the benefits of exercise are not actually really clinically observable, if measured properly and not just based on arbitrary statistical significance.
EDIT: There is less data on MCIDs for therapy, but at least one review suggests therapy effects can be in the 10+ point range [5]. But the way the exercise study is presented, with standardized effect sizes, we have no idea if the results matter at all [6].
[2] Masson, S. C., & Tejani, A. M. (2013). Minimum clinically important differences identified for commonly used depression rating scales. Journal of clinical epidemiology, 66(7), 805-807. [https://www.jclinepi.com/article/S0895-4356(13)00056-5/fullt...]
[5] Cuijpers, P., Karyotaki, E., Weitz, E., Andersson, G., Hollon, S. D., & van Straten, A. (2014). The effects of psychotherapies for major depression in adults on remission, recovery and improvement: a meta-analysis. Journal of affective disorders, 159, 118–126. https://doi.org/10.1016/j.jad.2014.02.026 [https://pubmed.ncbi.nlm.nih.gov/24679399/]
[6] Pogrow, S. (2019). How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings. The American Statistician, 73(sup1), 223–234. https://doi.org/10.1080/00031305.2018.1549101
Nutrition science is not science in almost any of the ways a real science needs to be, and there is almost zero "real, good science" to be found in it. The reasons this statement is true (as well as the precise qualifications of the exceptions to this) are well laid out by tsimionescu in response to your post.
The measurement, control, confounds, and even basic concepts are atrocious here, this is possibly the only field as bad as or even worse than e.g. social psychology. And this is all ignoring the massive economic interests involved.
It is in fact only science illiteracy that would lead one to think nutrition science is a serious science. At the most absolute charitable, it is a protoscience like alchemy (which did have some replicable findings that eventually led to real chemistry, but which was still mostly nonsense at core).
This. The amount of faith in nutrition "science" indicates severe science illiteracy in the public.
In general there are way too many confounds, and measurement is far too poor and unreliable (self-report that is wrong in quality and quantity; you can't track enough people for the amount of time where supposed effects would manifest), there is almost zero control over what people eat (diets and available foods even considerably over a decade for whole countries, never mind within individuals), and much of the things being measured lack even face/content validity in the first place (e.g. "fat" is not a valid category, and even "saturated vs. unsaturated" is a matter of degree, and each again with different kinds in each category).
We are missing so much of the basics of what are required for a real science here I think it is far more reasonable to view almost all long-term nutritional claims as pseudoscience, unless the effect is clear and massive (e.g. consumption of large amounts of alcohol, or extremely unique / restrictive diets that have strong effects, or the rare results of natural experiments / famines), or so extremely general that it catches a sort of primary factor (too much calories is generally harmful, regardless of the source of those calories).
Maybe it'll become actual science one day, but that won't be for decades.
reply