Hacker Newsnew | past | comments | ask | show | jobs | submit | davidmckayv's commentslogin

This glosses over a fundamental scaling problem that undermines the entire argument. The author's main example is Claude Code searching through local codebases with grep and ripgrep, then extrapolates this to claim RAG is dead for all document retrieval. That's a massive logical leap.

Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems. Even with 2M token context windows, you can't fit an entire enterprise knowledge base into context. The author acknowledges this briefly ("might still use hybrid search") but then continues arguing RAG is obsolete.

The bigger issue is semantic understanding. Grep does exact keyword matching. If a user searches for "revenue growth drivers" and the document discusses "factors contributing to increased sales," grep returns nothing. This is the vocabulary mismatch problem that embeddings actually solve. The author spent half the article complaining about RAG's limitations with this exact scenario (his $5.1B litigation example), then proposes grep as the solution, which would perform even worse.

Also, the claim that "agentic search" replaces RAG is misleading. Recent research shows agentic RAG systems embed agents INTO the RAG pipeline to improve retrieval, they don't replace chunking and embeddings. LlamaIndex's "agentic retrieval" still uses vector databases and hybrid search, just with smarter routing.

Context windows are impressive, but they're not magic. The article reads like someone who solved a specific problem (code search) and declared victory over a much broader domain.


I agree.

A great many pundits don't get, that RAG means: "a technique that enables large language models (LLMs) to retrieve and incorporate new information"

So, RAG is a pattern that is as a principle applied to almost every process. Context windows? Ok, I won't get into all the nitty gritty details here (embedded, small storage device, security, RAM defects, cost and storage of contexts for different contexts etc.), just a hint, that the act of filling a context is what? Applied RAG.

RAG is not a architecture, it is a principle. A structured approach. There is a reason, why nowadays many refer to RAG as search engine.

All we know about knowledge, there is only one entity with a infinite context window. We still call it God not cloud.


Indeed, the name is Retrieval Augmented Generation... so this is generation (synthesis of text) augmented by retrieval (of data from external systems). the goal is to augment the generation, not to improve retrieval.

the improvements needed for the retrieval part are then another topic.


Agentic retrieval is really more a form of deep research (from a product standpoint there is very little difference). The key is that LLMs > rerankers, at least when you're not at webscale where the cost differential is prohibitive.


LLMs > rerankers. Yes! I don't like rerankers. They are slow, the context window is small (4096 tokens), it's expensive... It's better when the LLM reads the whole file versus some top_chunks.


Rerankers are orders of magnitude faster and cheaper than LLMs. Typical latency out of the box on a decent sized cross encoder (~4B) will be under 50ms on cheap gpus like an A10G. You won’t be able to run a fancy LLM on that hardware and without tuning you’re looking at hundreds of ms minimum.

More importantly, it’s a lot easier to fine tune a reranker on behavior data than an LLM that makes dozens of irrelevant queries.


This is worth emphasizing. At scale, and when you have the resources to really screw around with them to tune your pipeline, rerankers aren't bad, they're just much worse/harder to use out of the box. LLMs buy you easy robustness, baseline quality and capabilities in exchange for cost and latency, which is a good tradeoff until you have strong PMF and you're trying to increase margins.


More than that, adding longer context isn’t free either in time or money. So filling an LLM context with k=100 documents of mixed relevance may be slower than reranking and filling with k=10 of high relevance.

Of course, the devil is in the details and there’s five dozen reasons why you might choose one approach over the other. But it is not clear that using a reranker is always slower.


Is letting an agent use grep not a form of RAG? I know usually RAG is done with vector databases but grep is definitely a form of retrieval, and it’s augmenting the generation.


RAG doesn’t just mean word vectors but can include keyword search. Claude using grep is a form of RAG.


In practice this is not how the term is used.

It bugs me, because the acronym should encompass any form of retrieval - but in practice, people use RAG to specifically refer to embedding-vector-lookups, hence it making sense to say that it's "dying" now that other forms of retrieval are better.


But couldn’t an LLM search for documents in that enterprise knowledge base just like humans do, using the same kind of queries and the same underlying search infrastructure?


I wouldn't say humans are efficient at that so no reason to copy, other than as a starting point.


Maybe not efficient, but if the LLMs can't even reach this benchmark then I'm not sure.


Yes but that would be worse than many RAG approaches, which were implemented precisely because there is no good way to cleanly search through a knowledge base for a million different reasons.

At that point, you are just doing Agentic RAG, or even just Query Review + RAG.

I mean, yeah, agentic RAG is the future. It's still RAG though.


This was essentially my response as well, but the other replies to you also have a point, and I think the key here is the 'Retrieval' in RAG is very vague, and depending on who you were and what you were getting into RAG for, the term means different things.

I am definitely more aligned with needing what I would rather call 'Deep Semantic Search and Generation' - the ability to query text chunk embeddings of... a 100k PDF's, using the semantics to search for the closeness of the 'ideas', those fed into the context of the LLM, and then the LLM generate a response to the prompt citing the source PDF(s) the closest matched vectors came from...

That is the killer app of a 'deep research' assistant IMO and you don't get that via just grepping words and feeding related files into the context window.

The downside is, how to generate embeddings of massive amounts of mixed-media files and store in a database quickly and cheaply compared to just grepping a few terms from said files? A CPU grep of text in files in RAM is like five orders of magnitude faster than an embedding model on the GPU generating semantic embeddings of the chunked file and then storing those for later.


Appreciate the feedback. I’m not saying grep replaces RAG. The shift is that bigger context windows let LLMs just read whole files, so you don’t need the whole chunk/embed pipeline anymore. Grep is just a quick way to filter down candidates.

From there the model can handle 100–200 full docs and jot notes into a markdown file to stay within context. That’s a very different workflow than classic RAG.


That's fair, but how do you grep down to the right 100-200 documents from millions without semantic understanding? If someone asks "What's our supply chain exposure?" grep won't find documents discussing "vendor dependencies" or "sourcing risks."

You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG. And doing that intelligently means you're back to using embeddings anyway.

The workflow works great for codebases with consistent terminology. For enterprise knowledge bases with varied language and conceptual queries, grep alone can't get you to the right candidates.


the agent greps for the obvious term or terms, reads the resulting documents, discovers new terms to grep for, and the process repeats until its satisfied it has enough info to answer the question

> You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG.

in this scenario "you" are not implementing anything - the agent will do this on its own

this is based on my experience using claude code in a codebase that definitely does not have consistent terminology

it doesn't always work but it seemed like you were thinking in terms of trying to get things right in a single grep when it's actually a series of greps that are informed by the results of previous ones


Classical search


Which is RAG. How you decide to take a set of documents to large for an LLM context window and narrow it down to a set that does fit is an implementation issue.

The chunk, embed, similarity search method was just a way to get a decent classical search pipeline up and running with not too much effort.


I think the most important insight from your article, which I also felt, is that agentic search is really different. The ability to retarget a search iteratively fixes both the issues of RAG and grep approaches - they don't need to be perfect from the start, they only need to get there after 2-10 iterations. This really changes the problem. LLMs have become so smart they can compensate for chunking and not knowing the right word.

But on top of this I would also use AI to create semantic maps, like hierarchical structure of content, and put that table of contents in the context, let the AI explore it. This helps with information spread across documents/chapters. It provides a directory to access anything without RAG, by simply following links in a tree. Deep Research agents build this kind of schema while they operate across sources.

To explore this I built an graph MCP memory system where the agent can search both by RAG and text matching, and when it finds top-k nodes it can expand out by links. Writing a node implies having the relevant nodes first loaded up, and when generating the text, place contextual links embedded [1] like this. So simply writing a node also connects it to the graph in all the right points. This structure fits better with the kind of iterative work LLMs do.


I was previously working at https://autonomy.computer, and building out a platform for autonomous products (i.e., agents) there. I started to observe a similar opportunity. We had an actor-based approach to concurrency that meant it was super cheap performance-wise to spin up a new agent. _That_ in turn meant a lot of problems could suddenly become embarrassingly parallel, and that rather than pre-computing/caching a bunch of stuff into a RAG system you could process whatever you needed in a just-in-time approach. List all the documents you've got, spawn a few thousand agents and give each a single document to process, aggregate/filter the relevant answers when they come back.

Obviously that's not the optimal approach for every use case, but there's a lot where IMO it was better. In particular I was hoping to spend more time exploring it in an enterprise context where you've got complicated sharing and permission models to take into consideration. If you have agents simply passing through the permission of the user executing the search whatever you get back is automatically constrained to only the things they had access to in that moment. As opposed to other approaches where you're storing a representation of data in one place, and then trying to work out the intersection of permissions from one of more other systems, and sanitise the results on the way out. Always seemed messy and fraught with problems and the risk of leaking something you shouldn't.


>The author spent half the article complaining about RAG's limitations with this exact scenario (his $5.1B litigation example), then proposes grep as the solution, which would perform even worse.

Yeah I found this very confusing. Sad to see such a poor quality article being promoted to this extent.


Not to mention, unless you want to ship entire containers, you are beholden to the unknown quirks of tools on whatever system your agent happens to execute on. It's like taking something already nondeterministic and extremely risky and ceding even more control—let's all embrace chaos.

Generative AI is here to stay, but I have a feeling we will look back on this period of time in software engineering as a sort of dark age of the discipline. We've seemingly decided to abandon almost every hard won insight and practice about building robust and secure computational systems overnight. It's pathetic that this industry so easily sold itself to the illogical sway of marketers and capital.


Mostly, i agree, except that the industry (from where I'm standing) has never done much else but sell itself to marketers and capital.


> It's pathetic that this industry so easily sold itself to the illogical sway of marketers and capital.

What are you implying. Capital always owned the industry except some really small coops and FOSS communities.


I don't get it. Isn't grep RAG?


In RAG, you operate on embeddings and perform vector search, so if you search for fat lady, it might also retrieve text like huge queen, because they're semantically similar. Grep on the other hand, only matches exact strings, so it would not find it.


R in RAG is for retrieval… of any kind. It doesn’t have to be vector search.


Sure, but vector search is the dominant form of RAG, the rest are niche. Saying "RAG doesn’t have to use vectors" is like saying "LLMs don't have to use transformers". Technically true, but irrelevant when 99% of what's in use today does.


How are they niche? The default mode of search for most dedicated RAG apps nowadays is hybrid search that blends classical BM-25 search with some HNSW embedding search. That's already breaking the definition.

A search is a search. The architecture doesn't care if it's doing an vector search or a text search or a keyword search or a regex search, it's all the same. Deploying a RAG app means trying different search methods, or using multiple methods simultaneously or sequentially, to get the best performance for your corpus and use case.


Most hybrid stacks (BM25 + dense via HNSW/IVF) still rely on embeddings as a first class signal. So in practice the vector side carries recall on paraphrase/synonymy/OOO vocab, while BM25 stabilizes precision on exact term and short doc cases. So my point still stands.

> The architecture doesn't care

The architecture does care because latency, recall shape, and failure modes differ.

I don't know of any serious RAG deployments that don't use vectors. I'm referring to large scale systems, not hobby projects or small sites.


But this thread and the OP article assume that RAG is always done *exclusively* with vectors, and that's just not true, it's almost never been true.


This isn't the case.

RAG means any kind of data lookup which improves LLM generation results. I work in this area and speak to tons of companies doing RAG and almost all these days have realised that hybrid approaches are way better than pure vector searches.

Standard understanding of RAG now is simply adding any data to the context to improve the result.


Code is also unique in its suitability for agentic grep retrieval, especially when combined with a language server. Code enforces structure, semantics, and consistency in a way that is much easier to navigate than the complexities of natural language.


> Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems

Great point, but this grep in a loop probably falls apart (i.e. becomes non-performant) at 1000s of docs, not millions and 10s of simultaneous users


Why does grep in a loop fall apart? It’s expensive, sure, but LLM costs are trending toward zero. With Sonnet 4.5, we’ve seen models get better at parallelization and memory management (compacting conversations and highlighting findings).


If LLM costs are trending towards zero, please explain the $600B openai when Oracle and the $100B deal with Nvidia.

And if you think those deals are bogus, like I do, you still need to explain surging electricity prices.


"LLM costs are trending toward zero". They will never be zero for the cutting edge. One could argue that costs are zero now via local models but enterprises will always want the cutting edge which is likely to come with a cost


They're not trending toward zero; they're just aggressively subsidized with oil money.


Isn't grep + LLM a form of RAG anyway?


Yes, this guy's post came up on my LinkedIn. I think it's helpful to consider the source in these types of articles, written by a CEO at a fintech startup (looks like AI generated too). It's obvious from reading the article that he doesn't understand what he's talking about and has likely never created any kind of RAG or other system. He has a very limited experience, basically a single project, of building a system around rudimentary ingestion of SEC filings, that's his entire breath of technical experience on the subject. So take what you read with a grain of salt, and do your own research and testing.


It really depends on what you mean by RAG. If you take the acronym at face value yeah.

However, RAG has been used as a stand in for a specific design pattern where you retrieve data at the start of a conversation or request and then inject that into the request. This simple pattern has benefits compared to just using sending a prompt by itself.

The point the author is trying to make is that this pattern kind of sucks compared to Agentic Search, where instead of shoving a bunch of extra context in at the start you give the model the ability to pull context in as needed. By switching from a "push" to a "pull" pattern, we allow the model to augment and clarify the queries it's making as it goes through a task which in turn gives the model better data to work with (and thus better results).


I guess, but with a very basic form of exact match retreival. The embedding based RAG tries to augment the prompt with extra data that is semantically similar instead of just exactly same.


Yeah 100%

Almost all tool calls would result in rag.

Rag is dead just means rolling out your own search and manually injecting results into context is dead (just use tools). It means the chunking techniques are dead.


Chunking is still relevant, because you want your tool calls to return results specific to the needs of the query.

If you want to know "how are tartans officially registered" you don't want to feed the entire 554kb wikipedia article on Tartan to your model, using 138,500 tokens, over 35% of gpt-5's context window, with significant monetary and latency cost. You want to feed it just the "Regulation>Registration" subsection and get an answer 1000x cheaper and faster.


but you could. For that example, you could just use a much cheaper model since it's not that complicated a question, and just pass the entire article. Just use gemini flash for example. Models will only get cheaper and context windows only get bigger


I've seen it called "agentic search" while RAG seems to have become synonymous with semantic search via embeddings


That's a silly distinction to make, because there's nothing stopping you from giving an agent access to a semantic search.

If I make a semantic search over my organization's Policy As Code procedures or whatever and give it to Claude Code as an MCP, does Claude Code suddenly stop being agentic?


Well yeah RAG just specifies retrieval augmented, not that vector retrieval or decoder retrieval was used


Cursor’s use of grep is bad. It finds definitions way slower and less accurately than I do using IDE indexing, which is frustratingly “right there.” Crazy that there’s not even LSP support in there.

Claude Code is better, but still frustrating.


What exactly is RAG? Is it a specific technology, or a technique?

I'm not a super smart AI person, but grepping through a codebase sounds exactly like what RAG is. Isn't tool use just (more sophisticated) RAG?


Yes, you are right. The OP has a weirdly narrow definition of what RAG is.

Only the most basic "hello world" type RAG systems rely exclusively on vector search. Everybody has been doing hybrid search or multiple simultaneous searches exposed through tools for quite some time now.


RAG is a technique, so instead of string matching (like grep), it uses embeddings + vector search to retrieve semantically similar text (car ≈ automobile), then feeds that into the LLM. Tool use is broader, RAG is one pattern within that, but not the same as grep.


Yeah RAG doesn't say what its retrieving from, retrieving with grep is still RAG.


Yeah, 'RAG' is quite literal tool use, where the tool is a vector search engine more or less.

What was described as 'RAG' a year ago now is a 'knowledge search in vector db MCP', with the actual tool and mechanism of knowledge retrieval being the exact same.


This is censorship with extra steps.

Look at what the bill actually requires. Companies have to publish frameworks showing how they "mitigate catastrophic risk" and implement "safety protocols" for "dangerous capabilities." That sounds reasonable until you realize the government is now defining what counts as dangerous and requiring private companies to build systems that restrict those outputs.

The Supreme Court already settled this. Brandenburg gives us the standard: imminent lawless action. Add in the narrow exceptions like child porn and true threats, and that's it. The government doesn't get to create new categories of "dangerous speech" just because the technology is new.

But here we have California mandating that AI companies assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime." Then they have to implement mitigations and report to the state AG. That's prior restraint. The state is compelling companies to filter outputs based on potential future harm, which is exactly what the First Amendment prohibits.

Yes, bioweapons and cyberattacks are scary. But the solution isn't giving the government power to define "safety" and force companies to censor accordingly. If someone actually uses AI to commit a crime, prosecute them under existing law. You don't need a new regulatory framework that treats information itself as the threat.

This creates the infrastructure. Today it's "catastrophic risks." Tomorrow it's misinformation, hate speech, or whatever else the state decides needs "safety mitigations." Once you accept the premise that government can mandate content restrictions for safety, you've lost the argument.


It is already illegal under 18 USC § 842 to provide bomb-making instructions or similar with the knowledge or intent that said instructions will be used to commit a crime. The intent is to balance free speech with the probability of actual harm.

AIs do not have freedom of speech, and even if they did, it is entirely within the bounds of the Constitution to mitigate this freedom as we already do for humans. Governments currently define unprotected speech as a going concern.

But there's a contradiction hidden in your argument: requiring companies to _filter_ the output of AI models is a prior restraint on their speech, implying the companies do not have control over their own "speech" as produced by the models. This is absurd on its face; just as the argument that the output of my random Markov chain text generator is protected speech because I host the generator online.

There are reasonable arguments to make about censoring AI models, but freedom of speech ain't it, because their output doesn't quack like "speech".


Do libraries have freedom of speech? The same argument can then be used to censor libraries.

Do books have freedom of speech? The same argument can then be used to censor parts of a book.


Do we treat books as the protected speech of libraries? No. In fact we already ban books from library shelves regularly. Freedom of speech does not compel libraries to host The Anarchist's Cookbook, and does not prevent governments from limiting what libraries can host, under existing law.


False. I have no clue where you got this idea, but libraries are perfectly within their right to have it on their shelves, just as publishers are allowed to publish it (present copyright conflicts aside). Repeated legal attacks against the book, at least in the US, were unsuccessful.

You may be conflating “libraries” with “school libraries,” where some states have won the right to limit the contents of shelves. Public libraries have certainly faced pressure about certain books, but legally they are free to stock whatever they want. In practice they often have to deal with repeated theft or vandalism of controversial books, so sometimes they pull them.


> You may be conflating “libraries” with “school libraries,”

For the purpose of this discussion, there is zero difference, unless you can articulate one that matters. Feel free to mentally prefix any mention of "library" with "school" if you like.


School libraries are state institutions under the control of various Boards of Education. As state institutions their rules and policies can be set by statute or Board policy. It has nothing to do with freedom of speech. English teachers likewise must focus on teaching English at work, but this is not a restriction on their freedom of speech.

(That said, I am opposed to political restrictions on school library books. It is still not a free speech issue.)


If you look at the LLMs as a new kind of fuzzy search engine instead of focusing on the fact that they're pretty good at producing human text, you can see it's not about whether the LLMs have a right to "speak", it's whether you have a right to see uncensored results.

Imagine going to the library and the card catalog had been purged of any references to books that weren't government approved.


You're actually making my point for me. 18 USC § 842 criminalizes distributing information with knowledge or intent that it will be used to commit a crime. That's criminal liability for completed conduct with a specific mens rea requirement. You have to actually know or intend the criminal use.

SB 53 is different. It requires companies to implement filtering systems before anyone commits a crime or demonstrates criminal intent. Companies must assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime," then implement controls to prevent those outputs. That's not punishing distribution to someone you know will commit a crime. It's mandating prior restraint based on what the government defines as potentially dangerous.

Brandenburg already handles this. If someone uses an AI to help commit a crime, prosecute them. If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal. We don't need a regulatory framework that treats the capability itself as the threat.

The "AIs don't have speech rights" argument misses the point. The First Amendment question isn't about the AI's rights. It's about the government compelling companies (or anyone) to restrict information based on content. When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.

And yes, companies control their outputs now. The problem is SB 53 removes that discretion by legally requiring them to "mitigate" government-defined risks. That's compelled filtering. The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.

The real issue is precedent. Today it's bioweapons and cyberattacks. But once we establish that government can mandate "safety" assessments and require mitigation of "dangerous capabilities," that framework applies to whatever gets defined as dangerous tomorrow.


I hate that HN's guidelines ask me not to do this, but it's hard to answer point-by-point when there are so many.

> You have to actually know or intend the criminal use.

> If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal.

And if I tell an AI chatbot that I'm intending to commit a crime, and somehow it assists me in doing so, the company behind that service should have knowledge that its service is helping people commit crimes. That's most of SB 53 right there: companies must demonstrate actual knowledge about what their models are producing and have a plan to deal with the inevitable slip-up.

Companies do not want to be held liable for their products convincing teens to kill themselves, or supplying the next Timothy McVeigh with bomb-making info. That's why SB 53 exists; this is not coming from concerned parents or the like. The tech companies are scared shitless that they will be forced to implement even worse restrictions when some future Supreme Court case holds them liable for some disaster that their AIs assisted in creating.

A framework like SB 53 gives them the legal basis to say, "Hey, we know our AIs can help do [government-defined bad thing], but here are the mitigations in place and our track record, all in accordance with the law".

> When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.

Does the output of AI models represent the company's speech, or does it not? You can't have your cake and eat it too. If it does, then we should treat it like speech and hold companies responsible for it when something goes wrong. If it doesn't, then the entire First Amendment argument is moot.

> The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.

Here's the problem: the nature of LLMs themselves do not allow companies to fully implement their editorial choices. There will always be mistakes, and one will be costly enough to put AIs on the national stage. This is the entire reason behind SB 53 and the desire for a framework around AI technology, not just from the state, but from the companies producing the AIs themselves.


You're conflating individual criminal liability with mandated prior restraint. If someone tells a chatbot they're going to commit a crime and the AI helps them, prosecute under existing law. But the company doesn't have knowledge of every individual interaction. That's not how the knowledge requirement works. You can't bootstrap individual criminal use into "the company should have known someone might use this for crimes, therefore they must filter everything."

The "companies want this" argument is irrelevant. Even if true, it doesn't make prior restraint constitutional. The government can't delegate its censorship powers to willing corporations. If companies are worried about liability, the answer is tort reform or clarifying safe harbor provisions, not building state-mandated filtering infrastructure.

On whether AI output is the company's speech: The First Amendment issue here isn't whose speech it is. It's that the government is compelling content-based restrictions. SB 53 doesn't just hold companies liable after harm occurs. It requires them to assess "dangerous capabilities" and implement "mitigations" before anyone gets hurt. That's prior restraint regardless of whether you call it the company's speech or not.

Your argument about LLMs being imperfect actually proves my point. You're saying mistakes will happen, so we need a framework. But the framework you're defending says the government gets to define what counts as dangerous and mandate filtering for it. That's exactly the infrastructure I'm warning about. Today it's "we can't perfectly control the models." Tomorrow it's "since we have to filter anyway, here are some other categories the state defines as harmful."

Given companies can't control their models perfectly due to the nature of AI technology, that's a product liability question, not a reason to establish government-mandated content filtering.


> You can't bootstrap individual criminal use into "the company should have known someone might use this for crimes, therefore they must filter everything."

Lucky for me, I am not. The company already has knowledge of each and every prompt and response, because I have read the EULAs of every tool I use. But that's beside the point.

Prior restraint is only unconstitutional if it is restraining protected speech. Thus far, you have not answered the question of whether AI output is speech at all, but have assumed prior restraint to be illegal in and of itself. We know this is not true because of the exceptions you already mentioned, but let me throw in another example: the many broadcast stations regulated by the FCC, who are currently barred from "news distortion" according to criteria defined by (checks notes) the government.


Having technical access to prompts doesn't equal knowledge for criminal liability. Under 18 USC § 842, you need actual knowledge that specific information is being provided to someone who intends to use it for a crime. The fact that OpenAI's servers process millions of queries doesn't mean they have criminal knowledge of each one. That's not how mens rea works.

Prior restraint is presumptively unconstitutional. The burden is on the government to justify it under strict scrutiny. You don't have to prove something is protected speech first. The government has to prove it's unprotected and that prior restraint is narrowly tailored and the least restrictive means. SB 53 fails that test.

The FCC comparison doesn't help you. In Red Lion Broadcasting Co. v. FCC, the Supreme Court allowed broadcast regulation only because of spectrum scarcity, the physical limitation that there aren't enough radio frequencies for everyone. AI doesn't use a scarce public resource. There's no equivalent justification for content regulation. The FCC hasn't even enforced the fairness doctrine since 1987.

The real issue is you're trying to carve out AI as a special category with weaker First Amendment protection. That's exactly what I'm arguing against. The government doesn't get to create new exceptions to prior restraint doctrine just because the technology is new. If AI produces unprotected speech, prosecute it after the fact under existing law. You don't build mandatory filtering infrastructure and hand the government the power to define what's "dangerous."


My reading is you can teach a criminal how to make bombs.

You cannot teach them with the /intent/ that they'll use a bomb to commit a specific crime.

It's an enhancement.


If there's one thing I've learned watching the trajectory of social media over the last 15 years, it's that we've been way to slow to assess the risks and harmful outcomes posed by new, rapidly evolving industries.

Fixing social media is now a near impossible task as it has built up enough momentum and political influence to resist any kind of regulation that would actually be effective at curtailing its worst side effects.

I hope we don't make the same mistakes with generative AI


There are few greater risks over the next 15 years than that LLMs get entirely state-captured and forbidden from saying anything that goes against the government narrative.


This depends entirely on who you trust more, your government or tech oligarchs. Tech oligarchs are just as liable to influence how their LLMS operate for evil purposes, and they don't have to worry about pesky things like due process, elections, or the constitution getting in their way.


Government actively participated with social media oligarchs to push their nonsense Covid narrative and squash and discredit legitimate criticisms from reputable skeptics.

Both are evil, in combination so much more so. Neither should be trusted at all.


> Add in the narrow exceptions like child porn and true threats, and that's it.

You're contradicting yourself. On the one hand you're saying that governments shouldn't have the power to define "safety", but you're in favor of having protections against "true threats".

How do you define "true threats"? Whatever definition you may have, surely something like it can be codified into law. The questions then are: how loose or strict the law should be, and how well it is defined in technical terms. Considering governments and legislators are shockingly tech illiterate, the best the technical community can do is offer assistance.

> The government doesn't get to create new categories of "dangerous speech" just because the technology is new.

This technology isn't just new. It is unlike any technology we've had before, with complex implications for the economy, communication, the labor market, and many other areas of human society. We haven't even begun to understand the ways in which it can be used or abused to harm people, let alone the long-term effects of it.

The idea that governments should stay out of this, and allow corporations to push their products out into the world without any oversight, is dreadful. We know what happens when corporations are given free reign; it never ends well for humanity.

I'm not one to trust governments either, but at the very least, they are (meant to) serve their citizens, and enforce certain safety standards that companies must comply with. We accept this in every other industry, yet you want them to stay out of tech and AI? To hell with that.

Frankly, I'm not sure if this CA regulation is a good thing or not. Any AI law will surely need to be refined over time, as we learn more about the potential uses and harms of this technology. But we definitely need more regulation in the tech industry, not less, and the sooner, the better.


There's no contradiction. "True threats" is already a narrow exception defined by decades of Supreme Court precedent. It means statements where the speaker intends to communicate a serious expression of intent to commit unlawful violence against a person or group. That's it. It's not a blank check for the government to decide what counts as dangerous.

Brandenburg gives us the standard: speech can only be restricted if it's directed to inciting imminent lawless action and is likely to produce that action. True threats, child porn, fraud, these are all narrow, well-defined categories that survived strict scrutiny. They don't support creating broad new regulatory authority to filter outputs based on "dangerous capabilities."

You're asking how I define true threats. I don't. The Supreme Court does. That's the point. We have a constitutional framework for unprotected speech. It's extremely limited. The government can't just expand it because they think AI is scary.

"This technology is different" is what every regulator says about every new technology. Print was different. Radio was different. The internet was different. The First Amendment applies regardless. If AI enables someone to commit a crime, prosecute the crime. You don't get to regulate the information itself.

And yes, I want the government to stay out of mandating content restrictions. Not because I trust corporations, but because I trust the government even less with the power to define what information is too dangerous to share. You say governments are meant to serve citizens. Tell that to every government that's used "safety" as justification for censorship.

The issue isn't whether we need any AI regulation. It's whether we want to establish that the government can force companies to implement filtering systems based on the state's assessment of what capabilities are dangerous. That's the precedent SB 53 creates. Once that infrastructure exists, it will be used for whatever the government decides needs "safety mitigations" next.


I'm not sure why you're only focusing on speech. "True threats" doesn't come close to covering all the possible use cases and ways that "AI" tools can be harmful to society. We can't apply legal precedent to a technology without precedent.

> "This technology is different" is what every regulator says about every new technology. Print was different. Radio was different. The internet was different.

"AI" really is different, though. Not even the internet, or computers, for that matter, had the potential to transform literally every facet of our lives. Now, I personally don't buy into the "AGI" nonsense that these companies are selling, but it is undeniable that even the current generation of these tools can shake up the pillars of our society, and raise some difficult questions about humanity.

In many ways, we're not ready for it, yet the companies keep producing it, and we're now deep in a global arms race we haven't experienced in decades.

> I want the government to stay out of mandating content restrictions. Not because I trust corporations, but because I trust the government even less with the power to define what information is too dangerous to share.

See, this is where we disagree.

I don't trust either of them. I'm well aware of the slippery slope that is giving governments more power.

But there are two paths here: either we allow companies to continue advancing this technology with little to no oversight, or we allow our governments to enact regulation that at least has the potential to protect us from companies.

Governments at the very least have the responsibility to protect and serve their citizens. Whether this is done in practice, and how well, is obviously highly debatable, and we can be cynical about it all day. On the other hand, companies are profit-seeking organizations that only serve their shareholders, and have no obligation to protect the public. In fact, it is pretty much guaranteed that without regulation, companies will choose profits over safety every time. We have seen this throughout history.

So to me it's clear that I should trust my government over companies. I do this everyday when I go to the grocery store without worrying about food poisoning, or walk over a bridge without worrying that it will collapse. Shit does happen, and governments can be corrupted, but there are general safety regulations we take for granted every day. Why should tech companies be exempt from it?

Modern technology is a complex beast that governments are not prepared to regulate. There is no direct association between technology and how harmful it can be; we haven't established that yet. Even when there is such a connection, such as smoking causing cancer, we've seen how evil companies can be in refuting it and doing anything in their power to preserve their revenues at the expense of the public. "AI" further complicates this in ways we've never seen before. So there's a long and shaky road ahead of us where we'll have to figure out what the true impact of technology is, and the best ways to mitigate it, without sacrificing our freedoms. It's going to involve government overreach, public pushback, and company lobbying, but I hope that at some point in the near future we're able to find a balance that we're relatively and collectively happy with, for the sake of our future.


LLMs don't have rights. LLMs are tools, and the state can regulate tools. Humans acting on behalf of these companies can still, if they felt the bizarre desire to, publish assembly instructions for bioweapons on the company blog.


You're confused about whose rights are at stake. It's you, not the LLM, that is being restricted. Your argument is like saying, "Books don't have rights, so the state can censor books."


> if they felt the bizarre desire to, publish assembly instructions for bioweapons on the company blog.

Can they publish them by intentionally putting them into the latent space of an LLM?

What if they make an LLM that can only produce that text? What if they continue training so it contains a second text they intended to publish? And continue to add more? Does the fact that there's a collection change things?

These are genuine questions, and I have no clue what the answers are. It seems strange to treat a implementation of text storage so differently that you lose all rights to that text.


I have rights. I want to use whatever tool or source I want - LLMs, news, Wikipedia, search engines. There’s no acceptable excuse for censorship of any of these, as it violates my rights as an individual.


>LLMs are tools, and the state can regulate tools

More and more people get information from LLMs. You should be horrified at the idea of giving the state control over what information people can access through them, because going by historical precedent there's 100% chance that the state would use that censorship power against the interests of its citizens.


“More and more people get information from LLMs” this is the part I'm horrified by.


I'd rather be horrified that people are getting information from LLMs when LLMs have no way to know what it's outputting is true.


And the government is going to somehow decide what the truth is? Government is the last entity on earth I’d trust to arbitrate the truth.


Are you also horrified how many people get their facts from Wikipedia, given its systematic biases? All tools have their strengths and weaknesses. But letting politicians decide which information is rightthink seems scary.


Was this comment written with the assistance of AI? I am asking seriously, not trying to be snarky.


No. I just write well.


You clearly already know this, but you do in fact write very well!


Thank you!


> Today it's "catastrophic risks." Tomorrow it's misinformation, hate speech, or whatever else the state decides needs "safety mitigations."

That's the problem.

I'm less worried about catastrophic risks than routine ones. If you want to find out how to do something illegal or dangerous, all an LLM can give you is a digest what's already available on line. Probably with errors.

The US has lots of hate speech, and it's mostly background noise, not a new problem.

"Misinformation" is more of a problem, because the big public LLMs digest the Internet and add authority with their picks. It's adding the authority of Google or Microsoft to bogus info that's a problem. This is a basic task of real journalism - when do you say "X happened", and when do you say "Y says X happened"? LLMs should probably be instructed to err in the direction of "Y says X happened".

"Safety" usually means "less sex". Which, in the age of Pornhub, seems a non-issue, although worrying about it occupies the time of too many people.

An issue that's not being addressed at all here is using AI systems to manipulate customers and provide evasive customer service. That's commercial speech and consumer rights, not First Amendment issues. That should be addressed as a consumer rights thing.

Then there's the issue of an AI as your boss. Like Uber.


Presumably things like making sure LLMs don’t do things like encourage self-harm or fuel delusions also falls under “safety”, but probably also “ethics”.


Yep this is absolutely censorship with extra steps but also just an unnecessary bureaucracy. I think the things you have in quote are the core of it - all these artificial labels and categorizations of what is ultimately plain old speech, are trying to provide pathways to violate constitutional rights. California is not new to this game however - look at the absurd lengths they’ve gone to in violating second amendment rights. This is the same playbook.

What is surprising, however, is the timing. Newsom vetoed the previous verison of this bill. Him signing it after Charlie Kirk’s assassination, when there is so much conversation around the importance of free speech, is odd. It reminds me of this recent article:

Everyone’s a Free-Speech Hypocrite by Greg Lukianoff, the president and chief executive of the Foundation for Individual Rights and Expression (FIRE) https://www.nytimes.com/2025/09/23/opinion/consequence-cultu...


Good post. It's not even about the rights of the LLM or the corporation, but of the people who will be using these tools.

Imagine if the government went to megaphone manufacturers and demanded that the megaphones never amplify words the government doesn't like. "Megaphones don't have rights so this isn't a problem", the smooth brained internet commenters smugly explain, while the citizens who want to use megaphones find their speech through the tool limited by the governments arbitrary and ever changing decrees.

As for the government having a right to regulate tools, would a regulation that modern printing presses recognize and refuse to print offensive content really fly with you defending this? The foremost contemporary tool for amplifying speech, the press is named right in the first ammendment. "Regulating tools" in a way that happens to restrict the way citizens can use that tool for their own speech is bullshit. This is flagrantly unconstitutional.


I've never thought censorship was a core concern of AI. It's just regurgitating from an LLM. I vehemently oppose censorship but who cares about AI? I just dont see the use-case.


Censorship of AI has a huge use-case: people get information from AI, and censorship allows the censors to control which information people can access through the AI.


Worse people including me easily delegate parts of our thinking to this new LLM thing.


This really captures something I've been experiencing with Gemini lately. The models are genuinely capable when they work properly, but there's this persistent truncation issue that makes them unreliable in practice.

I've been running into it consistently, responses that just stop mid-sentence, not because of token limits or content filters, but what appears to be a bug in how the model signals completion. It's been documented on their GitHub and dev forums for months as a P2 issue.

The frustrating part is that when you compare a complete Gemini response to Claude or GPT-4, the quality is often quite good. But reliability matters more than peak performance. I'd rather work with a model that consistently delivers complete (if slightly less brilliant) responses than one that gives me half-thoughts I have to constantly prompt to continue.

It's a shame because Google clearly has the underlying tech. But until they fix these basic conversation flow issues, Gemini will keep feeling broken compared to the competition, regardless of how it performs on benchmarks.

https://github.com/googleapis/js-genai/issues/707

https://discuss.ai.google.dev/t/gemini-2-5-pro-incomplete-re...


Another issue: Gemini can’t do tool calling and (forced) json output at the same time

If you want to use application/json as the specified output in the request, you can’t use tools

So if you need both, you either hope it gives you correct json when using tools (which many times it doesn’t). Or you have to do two requests, one for the tool calling, another for formatting

At least, even if annoying, this issue is pretty straightforward to get around


Back before structured outputs were common among model providers, I used to have a “end result” tool the model could call to get the structured response I was looking for. It worked very reliably.

It’s a bit of a hack but maybe that reliably works here?


You can definitely build an agent and have it use tools like you mention. That’s the equivalent of making 2 requests to Gemini, one to get the initial answer/content, then another to get it formatted as proper json

The issue here is that Gemini has support for some internal tools (like search and web scraping), and when you ask the model to use those, you can’t also ask it to use application/json as the output (which you normally can when not using tools)

Not a huge issue, just annoying


I think this might be also something to do with their super specific outputting requirements when you do use search (has to be displayed in predefined Google format).


Does any other provider allow that? what use cases are there for JSON + tool calling at the same time?


Please correct my likely misunderstanding here, but on the surface, it seems to me that "call some tools then return JSON" has some pretty common use cases.


Let's say you wanna build an app that gives back structured data after a web search. First a tool call to a search api. Then do some reasoning/summar/etc on the data returned by the tool. And finally return JSON.


OpenAI, Ollama, DeepSeek all do that.

And wanting to programmatically work with the result + allow tool calls is super common.


Suppose there's a pdf with lots of tables i want to scrape. I mention the pdf url in my message and with gemini's url context tool, i now have access to the pdf.

I can ask gemini to give me the pdf's content as a json and it complies most of the time. But at times, there's an introductory line like "Here's your json:". Those introductory lines interfere with programmatically using the output. They're sometimes there, sometimes not.

If I could have structured output at the same time as tool use, I can reliably use what gemini spits out as it'll be in a json, no annoying intro lines.


OpenAI


Unfortunately Gemini isn't the only culprit here. I've had major problems with ChatGPT reliability myself.


I only hit that problem in voice mode, it'll just stop halfway and restart. It's a jarring reminder of its lack of "real" intelligence


I've heard a lot that voice mode uses a faster (and worse) model than regular ChatGPT. So I think this makes sense. But I haven't seen this in any official documentation.


This is more because of VAD - voice activity detection


I think what I am seeing from ChatGPT is highly varying performance. I think this must be something they are doing to manage limitations of compute or costs. With Gemini, I think what I see is slightly different - more like a lower “peak capability” than ChatGPT’s “peak capability”.


I'm fairly sure there's some sort of dynamic load balancing at work. I read an anecdote from someone had a test where they asked it to draw a little image (something like an ascii cat, but probably not exactly that since it seems a bit basic), and if the result came back poor they didn't bother using it until a different time of day.

Of course it could all be placebo, but when you intuitively think about it, somewhere on the road the the hundreds of billions in datacenter capex, one would think that there will be periods where compute and demand are out of sync. It's also perfectly understandable why now would be a time to be seeing that.


Small things like this or the fact that AI studio still has issues with simple scrolling confuse me. How does such a brilliant tool still lack such basic things?


It's crazy how Google can create so many really amazing products technically but they fall short just because of basic UI/UX issues.


I see Gemini web frequently break its own syntax highlighting.


The scrolling in AI Studio is an absolute nightmare and somehow they managed to make it worse.

It’s so annoying that you have this super capable model but you interact with it using an app that is complete ass


App was likely built my same LLM...


Because they are moving fast and breaking shit.

Ask ChatGPT to output markdown or PDF on iOS or Mac app and the web experience. The web is often better - the apps will return nothing.


This is my perception as well.

Gemini 2.5 Pro is _amazing_ for software architecture, but I just get tired of poking it along. Sonnet does well enough.


chatgpt also has lots of reliability issues


If anyone from OpenAI is reading this, I have two complaints:

1. Using the "Projects" thing (Folder organization) makes my browser tab (on Firefox) become unusably slow after a while. I'm basically forced to use the default chats organization, even though I would like to organize my chats in folders.

2. After editing a message that you already sent,you get to select between the different branches of the chat (1/2, and so on), which is cool, but when ChatGPT fails to generate a response in this "branched conversation" context, it will continue failing forever. When your conversation is a single thread and a ChatGPT message fails with an error, re trying usually works and the chat continues normally.


And 3)

On mobile (android) opening the keyboard scrolls the chat to the bottom! I sometimes want to type referring something from the middle of the LLMs last answer.


Projects should have their own memory system. Perhaps something more interactive than the existing Memories but projects need their own data (definitions, facts, draft documents) that is iterated on and referred to per project. Attached documents aren't it, the AI needs to be able to update the data over multiple chats.


It would also be nice if ChatGPT could move chats between projects. My sidebar is a nightmare.


You can drag and drop chats between projects


i know. i want the assistant to do it. shouldn't it be able to do work on its own platform?


I wonder if this is because a memory cap was reached at that output token. Perhaps they route conversations to different hardware depending on how long they expect it to be.


When this happened to me it was because, I can only guess, it was the Gemini servers were overloaded. Symptoms: Gemini model, Opaque API wrapper error, truncated responses. To be fair the Anthropic servers are overloaded a lot too but they have a clear error. I gave Gemini a few days on the bench and it fixed itself without any client side changes. YMMV.


Half my requests get retried because they fail, I've contributed to a ticket in June, with no fix yet.


That used to happen a lot in ChatGPT too.


The latest comment on that issue is someone saying there's a fix available for you to try.


Yes agree, it was totally broken when I tested the API two months ago. Lots of failed to connect and very slow response time. Hoping the update fixes these issues.


It's been a lot better lately. Nothing like two months ago at all.


What happens if you ask it to please continue? Does it start over?


> I've been running into it consistently, responses that just stop mid-sentence

I’ve seen that behavior when LLMs of any make or model aren’t given enough time or allowed enough tokens.


FWIW, I think GLM-4.5 or Kimi K2 0905 fit the bill pretty well in terms of complete and consistent.

(Disclosure: I'm the founder of Synthetic.new, a company that runs open-source LLMs for monthly subscriptions.)


That’s not a “disclosure”, that’s an ad.


CTO of PolicyGenius (NYC) here. We'd love to have former SoundClouders come join. Not just for engineering, we have a bunch of openings across the business. You can also email recruiting@policygenius.com.

https://boards.greenhouse.io/policygenius


I actually specifically wrote about this after receiving an email calling imposter syndrome out as the reason the person wanted to stop the interview process. Some thoughts on how to overcome it are in the article.

https://medium.com/@DavidMcKayV/obliterate-imposter-syndrome...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: