It comes from batching and multiple streams on a GPU. More people sharing 1 GPU makes everyone run slower but increases overall token throughput.
Mathematically it comes from the fact that this transformer block is this parallel algorithm. If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
Its true for basically all hardware and most models. You can draw this Pareto curve of how much throughput per GPU vs how many tokens per second per stream. More tokens/s less total throughput.
See this graph for actual numbers:
Token Throughput per GPU vs. Interactivity
gpt-oss 120B • FP4 • 1K / 8K • Source: SemiAnalysis InferenceMAX™
> If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
I think you skipped the word “total throughout” there right? Cause tok/s is a measure of throughput, so it’s clearer to say you increase throughput/user at the expense of throughput/gpu.
I’m not sure about the comment about speculative decode though. I haven’t served a frontier model but generally speculative decode I believe doesn’t help beyond a few tokens, so I’m not sure you can “speculatively decode harder” with fewer users.
When you are know a doctor and overhear conversations with some ranting doctor friends you learn.
It's not a small problem in Canada. Funny was this patient who got rear ended like 8 times in a few years and needed time off and massage treatment every time.
Shameless grifters are everywhere my dude. This victimhood grifiting in the article above has been obvious for over a decade. If you listened to those anecdotes and vibes you would have known this well in advanced.
Anecdotal stuff / vibes are actually really useful. The "scientific" stuff isn't as formal as you might imagine. Going to conferences is a good way to learn that the vibes are what you are going to learn.
You'd think science is supposed to be this amazing rigorous way to do things. But the way you collect the data and the way you do the analysis and the reports you choose to write is anything but. Ultimately because, well, grifters are everywhere.
> It's not a small problem in Canada. Funny was this patient who got rear ended like 8 times in a few years and needed time off and massage treatment every time.
This seems … reasonable? Car crashes are the leading cause of life-altering injuries in North America and back pain is notoriously hard to prove to the point that a hostile audience can’t say they’re overstated. If you look at case studies from the American opioid crisis, a disturbing number of them start with someone getting in a road or workplace accident and not having a full pain management regimen.
You don't even need a study to prove you wrong, it should be common sense that being rear ended has a good chance of causing chronic neck injuries, let alone 8 of them. But I got you numbers anyway:
> NP pain is common after involvement in a motor vehicle collision (MVC) with 86% of injured occupants reporting NP pain.6 In Ontario 17.6% of those exposed to an MVC report a personal injury...
> Neck injury resulting from an MVC is associated with a high rate of chronicity. Prognosis studies indicate 50% of injured people continue to experience NP a year after the collision.
> Anecdotal stuff / vibes are actually really useful.
You can just say you've already made up your mind despite not having personal experience, your anecdotes being biased, and having no statistically relevant evidence - and nothing will change your mind.
Don't waste people's time pretending to be genuine and poison the intellectual well by trying to normalize "feelings based reality".
> When you are know a doctor and overhear conversations with some ranting doctor friends you learn.
What a joke. If a retail worker serves 300 people in a day and then comes home and complains about some guy who yelled at them, it doesn't mean that there's an epidemic of people yelling. That person is 0.3% of interactions but will make up 100% of the complaints because they stood out
You don't even seen to have the gall of asking the doctors youry eavesdropping on if they concur, because surely that would have been your evidence instead.
I don't need a study to prove anything about a dude abusing the medical system. You don't just come in for an extension conveniently as things are winding down multiple times because whoops, turns out got rear ended again.
>> Anecdotal stuff / vibes are actually really useful.
> You can just say you've already made up your mind despite not having personal experience, your anecdotes being biased, and having no statistically relevant evidence - and nothing will change your mind.
I absolutely can and do change my mind, lots of times. I'm not an old fart set in his ways. What I'm saying is that there is an overcorrection towards this "intellectual well" way of thinking. It's not that statistics is useless, it's that someone telling you of an issue based on vibes / personal experience, or from a single sample is useful even if editors would desk reject it.
It's as though people behave as the numbers in papers come from God instead of a study done with limitations by people who have agendas and make mistakes. It can be right, I love good papers, and despite what you might have concluded actually I love rigor, mathematics in particular is amazing, but when you also take the approach of rejecting any story or opinion because a stat said so (often ignoring how that stat may have been collected or data analysed), this is where problems happen.
It’s kinda difficult in an antidotal observation to prove if someone is truly disabled but three of my observations that rubbed me the wrong way was the co-worker who ran regular marathons but was planning to seek disability pay for a physical injury; the co-worker that never seemed to have a hearing issue over the ten yrs I worked w her but instantly claimed to have had damage from earlier; and the co-worker who got back injury disability but surfed at lunchtime.
I’m sure there are edge cases that can explain all of these. But I’m feeling better if the person really needs the assistance get it.
You can have confounding effects. Specifically note Cochrane’s Aphorism.
"The correlation between any variable and smoking is likely to be higher than the correlation between that variable and the disease."
If you aren't controlling for substance uses (which anyone who has walked by a construction site would know.) You are going to misread an effect. Smoking in particular is actually just that bad for you.
The confounding variable is probably wealth. Being rich is very important for longevity. The effect size for wealth is likely bigger than the effect size for strength training. So construction workers age badly because they are poor, despite all the strength training.
Not being rich per se, but probably stress. The body has no innate knowledge of how wealthy you are, outside of some information stored in the neocortex about financial details (which has little influence on the overall functioning and regulation of the organism as a whole). But it does keep track of a very important signal, and that is neuroception, or safety, absence of threats. And being wealthy, absence of sources of stress, or ability to avoid them, brings about that state of feeling secure, safe, which affects every cell of the body and leads to a good regulation of the whole organism.
Your body does keep track of your place in the social hierarchy with hormones like Vasopressin, Oxytocin, Testosterone and Estrogen. Social hierarchies are biology not culture. You can tell it's biology because all social animals have social hierarchies.
However, this is a very complicated and poorly understood field. Current research struggles with a chicken and egg problem. Does high testosterone cause high status, or do high status men produce more testosterone? The answer seems to be both simultaneously.
Your scientific study does not support your claim (body keeps track of social status) and the other is a men's health magazine article. Hardly the cutting edge of science
Wealth per se has nothing to do with longevity, as a minute's thought will make plain. What wealth does do is enable certain things that help with longevity, such as better medical care. If you're using wealth as a measure, you need to realize that it's only a proxy, and you'll get better data by looking at the actual behaviors that it's a proxy for.
Basically a good point.
Merely never ending up in situations where it's a struggle to make ends meet has a huge impact on stress though.
You often don't even have to use the wealth in order to benefit with respect to stress.
Just my experience but I have never found the medical industry useful for health. I have found they mostly tinker with feedback loops to give the illusion of health.
Eating right, exercise, supplementation of the things I am missing from my diet, clean air, avoiding chronic stressful situations and people are the only things I have found to benefit me. But that's just my own anecdotal experience. (n=1)
At minimum medical industry is good for providing various measurements regarding the state of your health and environment. This can get quite pricey quite fast.
Thing is, better food is available to the poor as well, you just have to be willing to put in the work for it. Buy vegetables and make salads instead of spending the same amount of money at McDonald's, for example. The price of fresh vegetables at Walmart has never been out of reach even for someone working 40 hours for minimum wage. Housing might be ridiculously expensive, and medical care if you don't have insurance? Good luck. But basic vegetables? Rice and beans? (Which make for a complete set of amino acids, BTW: there's a reason rice and beans is such a popular dish in Central America). Those have stayed affordable even when the price of other things has gone up.
Now, I'll grant that there are plenty of poor people who are drinking soda and eating junk food. Not going to deny that. But I have always been able to go to Walmart and buy lettuce and tomatoes for my salads, and I've never seen the price of those basics skyrocket like the price of eggs (at one point) or meat have. So the poor people who are drinking soda instead of water, and eating chips instead of salads? They're choosing those foods, not being forced into them by poverty.
There are plenty of areas where rich people have a big advantage over poor people in terms of access to things that provide longevity. But food, at least in America (the only country whose food prices I'm familiar enough with to talk intelligently about), just isn't one of them.
Now, you could argue that poor people didn't grow up with parents who taught them how to cook healthy food on a tight budget. Yes, that's true for many (not all) of the poor (again, at least in America, I don't know enough about other countries here). But there, it's not being poor that's keeping them from eating healthy, it's not being taught. Money isn't the limiting factor there.
Nutrition too. Not to paint everyone in the construction industry with the same brush, but there’s often a lot of cheap, high calorie, fast food and sugary drinks on site and in work trucks. This is manageable for younger workers, but by a certain age, the job responsibilities become less physically demanding, the metabolism slows down, and the eating habits remain.
The above is from the "sparks of AGI paper" on GPT-4, where they were floored that it could coherently reason through the 3 steps of inverting things (6 -> 9 -> 7 -> 4) while GPT 3.5 was still spitting out a nonsense argument of this form:
This is from March 2023 and it was genuinely very surprising at the time that these pattern matching machines trained on next token prediction could do this. Something like a LSTM can't do anything like this at all btw, no where close.
To me its very surprising that the C compiler works. It takes a ton of effort to build such a thing. I can imagine the flaws actually do get better over the next year as we push the goalposts out.
Specifically this paragraph is what I find hilarious.
> According to the report, the issue became apparent in OpenAI’s Codex, an AI code-generation tool. OpenAI staff reportedly attributed some of Codex’s performance limitations to Nvidia’s GPU-based hardware.
OpenAI has a whole history of trying to scoop other providers. This was a whole thing for Google launches, where OpenAI regularly launched something just before Google to grab the media attention.
GPT-4o vs. Google I/O (May 2024): OpenAI scheduled its "Spring Update" exactly 24 hours before Google’s biggest event of the year, Google I/O. They launched GPT-4o voice mode.
Sora vs. Gemini 1.5 Pro (Feb 2024): Just two hours after Google announced its breakthrough Gemini 1.5 Pro model, Sam Altman tweeted the reveal of Sora (text-to-video).
ChatGPT Enterprise vs. Google Cloud Next (Aug 2023): As Google began its major conference focused on selling AI to businesses, OpenAI announced ChatGPT Enterprise.
There is a lot to discuss there but I find this paragraph interesting.
> What we are seeing here is the replacement of normal human relationships with virtual monetized relationships.
> It could be argued that virtual prostitutes have seen an increase in their living standards, but in reality, they have likely just replaced being in a relationship in which a man pays for gifts, trips, and other things with virtual prostitution.
The classic incel meme involves a sort of diagram where there is small fraction of men getting the lions share of the sexual attention from women.
But there is this sort of dual one where an extremely small number of women on onlyfans, who themselves are a small fraction of all women in total, make up the majority of all income/attention.
If you think about what a plane does to keep itself up, it sweeps through a curtain of air which ends up blowing downwards.
In a second it must blow down a large volume of air with enough speed to equal the impulse created by gravity in a second.
Basically m_air × v_down = m_plane × gravity × time
The energy you need to do this is the same quadratic, 1/2 m_air × v_down^2
A larger volume of air with a smaller v_down (a huge curtain of air of a fast plane with very wide wings) is more efficient then the smaller disk of air with high velocity of a helicopter.
But if the plane isn't moving forward the curtain has no volume and the plane stalls and falls. But helicopters have no trouble lifting off vertically.
Mathematically it comes from the fact that this transformer block is this parallel algorithm. If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
Its true for basically all hardware and most models. You can draw this Pareto curve of how much throughput per GPU vs how many tokens per second per stream. More tokens/s less total throughput.
See this graph for actual numbers:
Token Throughput per GPU vs. Interactivity gpt-oss 120B • FP4 • 1K / 8K • Source: SemiAnalysis InferenceMAX™
https://inferencemax.semianalysis.com/
reply