Four independent Chinese companies released extremely good open source models in...

jjice · 2025-11-06T20:32:43 1762461163

I get what you mean, but OpenAI did release the gpt-oss in August, just three months ago. I've had a very good experience with those models.

https://openai.com/index/introducing-gpt-oss/ (August 5th)

I like Qwen 235 quite a bit too, and I generally agree with your sentiment, but this was a very large American open source model.

Unless we're getting into the complications on what "open source" model actually means, in which case I have no clue if these are just open weight or what.

pu_pe · 2025-11-07T07:06:42 1762499202

You're totally right. Ironically I am using gpt-oss for a project right now, I think its quality is comparable to the ones I mentioned.

seunosewa · 2025-11-06T17:53:04 1762451584

The Chinese are doing it because they don't have access to enough of the latest GPUs to run their own models. Americans aren't doing this because they need to recoup the cost of their massive GPU investments.

0xjmp · 2025-11-06T19:39:38 1762457978

I must be missing something important here. How do the Chinese train these models if they don't have access to the GPUs to train them?

barrell · 2025-11-06T19:44:16 1762458256

I believe they mean distribution (inference). The Chinese model is currently B.Y.O.GPU. The American model is GPUaaS

0xjmp · 2025-11-06T20:01:28 1762459288

Why is inference less attainable when it technically requires less GPU processing to run? Kimi has a chat app on their page using K2 so they must have figured out inference to some extent.

jychang · 2025-11-06T23:20:08 1762471208

That entirely depends on the number of users.

Inference is usually less gpu-compute heavy, but much more gpu-vram heavy pound-for-pound compared to training. General rule of thumb is that you need 20x more vram for training a model with X params, than for inference for that same size model. So assuming batch size b, then serving more than 20*b users would tilt vram use on the side of inference.

This isn't really accurate; it's an extremely rough rule of thumb and ignores a lot of stuff. But it's important to point out that inference is quickly adding to costs for all AI companies. Deepseek claims that they used $5.6mil to train Deepseek R1; that's about 10-20 trillion tokens at their current pricing- or 1 million users sending just 100 requests at full context size.

root_axis · 2025-11-06T23:04:36 1762470276

> it technically requires less GPU processing to run

Not when you have to scale. There's a reason why every LLM SaaS aggressively rate limits and even then still experiences regular outages.

throwaway314155 · 2025-11-06T20:12:30 1762459950

tl;dr the person you originally responded too is wrong.

Der_Einzige · 2025-11-07T06:10:43 1762495843

That's super wrong. A lot of why people flipped out about Deepseek V3 is because of how cheap and how fast their GPUaaS model is.

There is so much misinformation both on HN, and in this very thread about LLMs and GPUs and cloud and it's exhausting trying to call it out all the time - especially when it's happening from folks who are considered "respected" in the field.

riku_iki · 2025-11-06T21:17:38 1762463858

> How do the Chinese train these models if they don't have access to the GPUs to train them?

they may be taking some western models: llama, chatgpt-oss, gemma, mistral, etc, and do postraining, which required way less resources.

simonw · 2025-11-07T04:25:35 1762489535

If they were doing that I expect someone would have found evidence of it. Everything I've seen so far has lead me to believe that these Chinese AI labs are training their own models from scratch.

riku_iki · 2025-11-07T04:27:27 1762489647

not sure what kind of evidence it could be..

simonw · 2025-11-07T05:58:07 1762495087

Just one example: if you know the training data used for a model you can prompt it in a way that can expose whether or not that training data was used.

The NYT used tricks like this as part of their lawsuit against OpenAI: page 30 onwards of https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

riku_iki · 2025-11-07T06:02:15 1762495335

You either don't know which training data was used for say chatgpt oss, or training data can be included into some open dataset like pile or similar. I think this test is very unreliable, and even if someone come to such conclusion, not clear what is the value of such conclusion, and if that someone can be trusted.

simonw · 2025-11-07T07:43:27 1762501407

My intuition tells me it is vanishingly unlikely that any of the major AI labs - including the Chinese ones - have fine-tuned someone else's model and claimed that they trained it from scratch and got away with it.

Maybe I'm wrong about that, but I've never heard any of the AI training experts (and they're a talkative bunch) raise that as a suspicion.

There have been allegations of distillation - where models are partially trained on output from other models, eg using OpenAI models to generate training data for DeepSeek. That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

I don't think OpenAI ever released evidence that DeepSeek had distilled from their models, that story seemed to fizzle out. It got a mention in a congressional investigation though: https://cyberscoop.com/deepseek-house-ccp-committee-report-n...

> An unnamed OpenAI executive is quoted in a letter to the committee, claiming that an internal review found that “DeepSeek employees circumvented guardrails in OpenAI’s models to extract reasoning outputs, which can be used in a technique known as ‘distillation’ to accelerate the development of advanced model reasoning capabilities at a lower cost.”

riku_iki · 2025-11-07T18:52:04 1762541524

Additionally, it would be interesting to know if there is dynamics in opposite directions, US corps (oai, xai) can now incorporate Chinese models into their core models as one/several expert towers.

riku_iki · 2025-11-07T16:44:12 1762533852

> That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

there was obviously llama.

zackangelo · 2025-11-07T03:41:43 1762486903

What 1T parameter base model have you seen from any of those labs?

riku_iki · 2025-11-07T03:46:05 1762487165

its moe, each expert tower can be branched from some smaller model.

jychang · 2025-11-11T00:46:55 1762822015

That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.

lossolo · 2025-11-06T21:35:46 1762464946

This is false. You can buy whole H100 clusters in China and Alibaba, Bytedance, Tencent etc have enough cards for training and inference.

Shenzhen 2025 https://imgur.com/a/r6tBkN3

the_mitsuhiko · 2025-11-06T17:58:26 1762451906

And Europeans don't it because quite frankly, we're not really doing anything particularly impressive with AI sadly.

abecode · 2025-11-06T21:18:06 1762463886

At ECAI conference last week there was a panel discussion and someone had a great quote, "in Europe we are in the golden age of AI regulation, while the US and China are in the actual golden age of AI".

speedgoose · 2025-11-06T18:03:03 1762452183

To misquote the French president, "Who could have predicted?".

https://fr.wikipedia.org/wiki/Qui_aurait_pu_pr%C3%A9dire

embedding-shape · 2025-11-06T19:35:53 1762457753

He didn't coin that expression did he? I'm 99% sure I've heard people say that before 2022, but now you made me unsure.

Sharlin · 2025-11-06T19:50:57 1762458657

"Who could've predicted?" as a sarcastic response to someone's stupid actions leading to entirely predictable consequences is probably as old as sarcasm itself.

speedgoose · 2025-11-07T08:00:52 1762502452

People said it before, but he said it without sarcasm about things that many people could in fact predict.

seydor · 2025-11-06T19:58:42 1762459122

We could add cookie warnings to AI, everybody loves those

DrNosferatu · 2025-11-06T20:13:14 1762459994

Europe should act and make its own, literal, Moonshot:

https://ifiwaspolitical.substack.com/p/euroai-europes-path-t...

imtringued · 2025-11-07T09:10:39 1762506639

>Moonshot 1: GPT-4 Parity (2027) >Objective: 100B parameter model matching GPT-4 benchmarks, proving European technical viability

This feels like a joke... Parity with a 2024 model in 2027? The Chinese didn't wait, they just did it.

The timeline for #1 LLM is also so far into the future that it is entirely plausible that by 2031, nobody uses transformer based LLMs as we know them today anymore. For reference: The attention paper is only 8 years old. Some wild new architecture could come out in that time that makes catching up meaningless.

DrNosferatu · 2025-11-07T15:32:15 1762529535

Note the EU-Moonshot project is based on own silicon / compute sovereignty.

GPT4 parity on a own silicon trained indigenous model is just an early goal.

Indeed, the ultimate goal is EU LLM supremacy - which means under democratic control.

toephu2 · 2025-11-07T03:27:22 1762486042

Europe gave us cookie popups on every single website.

Gigachad · 2025-11-07T06:43:03 1762497783

Only ones with invasive spyware cookies. Essential site function cookies do not require a consent banner.

alpineman · 2025-11-06T20:36:33 1762461393

actually Mistral is pretty good and catching up as the other leading models stagnate - the coding and OCR is particularly good

utopiah · 2025-11-06T21:06:00 1762463160

> we're not really doing anything particularly impressive with AI sadly.

Well, that's true... but also nobody else is. Making something popular isn't particularly impressive.

saubeidl · 2025-11-06T20:28:30 1762460910

Honestly, do we need to? If the Chinese release SOTA open source models, why should we invest a ton just to have another one? We can just use theirs, that's the beauty of open source.

hex4def6 · 2025-11-06T22:03:23 1762466603

For the vast majority, they're not "open source" they're "open weights". They don't release the training data or training code / configs.

It's kind of like releasing a 3d scene rendered to a JPG vs actually providing someone with the assets.

You can still use it, and it's possible to fine-tune it, but it's not really the same. There's tremendous soft power in deciding LLM alignment and material emphasis. As these things become more incorporated into education, for instance, the ability to frame "we don't talk about ba sing se" issues are going to be tremendously powerful.

uvaursi · 2025-11-06T19:53:58 1762458838

[flagged]

jacquesm · 2025-11-06T20:30:43 1762461043

What a load of tripe.

saubeidl · 2025-11-06T20:22:30 1762460550

I'm tired of this ol' propaganda trope.

* We're leading the world in fusion research. https://www.pppl.gov/news/2025/wendelstein-7-x-sets-new-perf...

* Our satellites are giving us by far the best understanding of our universe, capturing one third of the visible sky in incredible detail - just check out this mission update video if you want your mind blown: https://www.youtube.com/watch?v=rXCBFlIpvfQ

* Not only that, the Copernicus mission is the world's leading source for open data geoobservation: https://dataspace.copernicus.eu/

* We've given the world mRNA vaccines to solve the Covid crisis and GLP-1 antagonists to solve the obesity crisis.

* CERN and is figuring out questions about the fundamental nature of the universe, with the LHC being by far the largest particle accelerator in the world, an engineering precision feat that couldn't have been accomplished anywhere else.

Pioneering, innovation and drive forward isn't just about the latest tech fad. It's about fundamental research on how our universe works. Everyone else is downstream of us.

CamperBob2 · 2025-11-07T02:14:55 1762481695

Don't worry, we in the US are hot on your heels in the own-goal game ( https://www.space.com/space-exploration/nasa-is-sinking-its-... ).

All you have to do is wait by the Trump River and wait for our body to come floating by.

uvaursi · 2025-11-07T14:07:39 1762524459

I’m confused. Who is this “We”? Do you realize how behind in many respects most of Europe is? How it’s been parceled up and destroyed by the EU? Science projects led by a few countries doesn’t cut it.

It’s not propaganda at all. The standards of living there are shit. But enjoy the particle collider, I guess?

saubeidl · 2025-11-07T14:44:29 1762526669

We is Europe. Like everywhere else, we are behind in some aspects and ahead in others.

> The standards of living there are shit.

Now you're just trolling. I've lived in both the US and in multiple EU countries. Let me tell you, the standard of living in the US does not hold a candle to the one in the EU.

fspeech · 2025-11-07T03:32:59 1762486379

There is also Minimax M2 https://huggingface.co/MiniMaxAI/MiniMax-M2

lvl155 · 2025-11-06T18:23:28 1762453408

The answer is simply that no one would pay to use them for a number of reasons including privacy. They have to give them away and put up some semblance of openness. No option really.

tokioyoyo · 2025-11-06T18:26:12 1762453572

I know first hand companies paying them. Chinese internal software market is gigantic. Full of companies and startups that have barely made into a single publication in the west.

lvl155 · 2025-11-06T19:58:29 1762459109

Of course they are paying them. That’s not my point. My point is this is the only way for them to gain market share and they need Western users to train future models. They have to give them away. I’d be shocked if compute costs are not heavily subsidized by CCP.

overfeed · 2025-11-06T20:18:48 1762460328

> My point is this is the only way for them to gain market share and they need Western users to train future models.

And how would releasing open-weight models help with that? Open-weights invite self-hosting, or worse, hosting by werstern GPUaaS companies.

spwa4 · 2025-11-06T20:38:48 1762461528

But the CCP only has access to the US market because they joined the WTO, but when they joined the WTO they signed a treaty that they wouldn't do things like that.

janalsncm · 2025-11-06T19:09:32 1762456172

I don’t think there’s any privacy that OpenAI or Anthropic are giving you that DeepSeek isn’t giving you. ChatGPT usage logs were held by court order at one point.

It’s true that DeepSeek won’t give you reliable info on Tiananmen Square but I would argue that’s a very rare use case in practice. Most people will be writing boilerplate code or summarizing mundane emails.

nylonstrung · 2025-11-06T18:39:17 1762454357

There are plenty of people paying, the price/performance is vastly better than the Western models

Deepseek 3.2 is 1% the cost of Claude and 90% of the quality

thomashop · 2025-11-06T19:38:01 1762457881

Why is privacy a concern? You can run them in your own infrastructure

fastball · 2025-11-06T19:43:17 1762458197

Privacy is not a concern because they are open. That is the point.

thomashop · 2025-11-06T20:47:24 1762462044

Ah understood i misread

quleap · 2025-11-07T09:34:49 1762508089

ByteDance’s Volcengine is doing very well offering paid LLM services in China. Their Doubao Seed models are on par with other state-of-the-art models.

zbyforgotp · 2025-11-07T14:03:17 1762524197

The American labs are paranoid. The secrecy kills innovation. Open Source means ideas can meet and have sex and produce offsprings.

fungi · 2025-11-07T01:14:48 1762478088

microsofts phi models are very good smaller models under MIT license.

sampton · 2025-11-06T17:58:16 1762451896

Meta gave up on open weight path after DeepSeek.

gordonhart · 2025-11-06T18:09:42 1762452582

It’s more fair to say they gave up after the Llama 4 disaster.

RIMR · 2025-11-06T18:17:42 1762453062

Love their nonsense excuse they they are trying to protect us from misuse of "superintelligence".

>“We believe the benefits of superintelligence should be shared with the world as broadly as possible. That said, superintelligence will raise novel safety concerns. We’ll need to be rigorous about mitigating these risks and careful about what we choose to open source.” -Mark Zuckerberg

Meta has shown us daily that they have no interest in protecting anything but their profits. They certainly don't intend to protect people from the harm their technology may do.

They just know that saying "this is profitable enough for us to keep it proprietary and restrict it to our own paid ecosystem" will make the enthusiasts running local Llama models mad at them.

brandall10 · 2025-11-06T19:55:21 1762458921

Also, the Meta AI 'team' is currently retooling so they can put something together with a handful of Zuck-picked experts making $100m+ each rather than hundreds making ~$1m each.

Der_Einzige · 2025-11-07T06:11:53 1762495913

Too bad those experts are not worth their 300 million packages. I've seen the google scholars of the confirmed crazy comp hires and it's not Yann Lecun tier that's for sure.

raincole · 2025-11-06T21:08:02 1762463282

Do you think which one has the higher market share:

1) The four models you mentioned, combined

or

2) ChatGPT

?

What gives? Because if people are willing to pay you, you don't say "ok I don't want your money I'll provide my service for free."

pphysch · 2025-11-06T21:15:26 1762463726

Open-weight (Chinese) models have infinitely more market share in domains where giving your data to OpenAI is not acceptable

Like research labs and so on. Even at US universities

raincole · 2025-11-06T23:10:31 1762470631

Cool, and? If these models were hosted in China, the labs you mentioned wouldn't be paying them, right?

Now you have the answer to "what gives" above.

pphysch · 2025-11-07T00:06:47 1762474007

"And" therefore OpenAI has little to offer when it comes to serious applications of AI.

Best they can hope for is getting acquired by MS for pennies when this scheme collapses.