I’ve noticed a strong negative streak in the security community around LLMs. Lots of comments about how they’ll just generate more vulnerabilities, “junk code”, etc.
It seems very short sighted.
I think of it more like self driving cars. I expect the error rate to quickly become lower than humans.
Maybe in a couple of years we’ll consider it irresponsible not to write security and safety critical code with frontier LLMs.
I've been watching a twitch streamer vibe-code a game.
Very quickly he went straight to, "Fuck it, the LLM can execute anything, anywhere, anytime, full YOLO".
Part of that is his risk-appetite, but it's also partly because anything else is just really furstrating.
Someone who doesn't themselves code isn't going to understand what they're being asked to allow or deny anyway.
To the pure vibe-coder, who doesn't just not read the code, they couldn't read the code if they tried, there's no difference between "Can I execute grep -e foo */*.ts" and "Can I execute rm -rf /".
Both are meaningless to them. How do you communicate real risk? Asking vibe-coders to understand the commands isn't going to cut it.
So people just full allow all and pray.
That's a security nightmare, it's back to a default-allow permissive environment that we haven't really seen in mass-use, general purpose internet connected devices since windows 98.
The wider PC industry has got very good at UX to the point where most people don't need to worry themselves about how their computer works at all and still successfully hide most of the security trappings and keep it secure.
Meanwhile the AI/LLM side is so rough it basically forces the layperson to open a huge hole they don't understand to make it work.
I know exactly the streamer you're referring to and this is the first time I've seen an overlap between these two worlds! I bet there are quite a few of us. Anyway, agreed on all accounts, watching someone like him has been really eye opening on how some people use these tools ... and it's not pretty.
Yeah, it does sound a lot like self-driving cars. Everyone talks about how they're amazing and will do everything for you but you actually have to constantly hold their hand because they aren't as capable as they're made out to be
You're talking about a theoretical problem in the future, while I assure you vibe coding and agent based coding is causing major issues today.
Today, LLMs make development faster, not better.
And I'd be willing to bet a lot of money they won't be significantly better than a competent human in the next decade, let alone the next couple years. See self-driving cars as an example that supports my position, not yours.
Does it matter though? Programming was already terrible. There are a few companies doing good things, the rest made garbage already for the past decades. No one cares (well; consumers don't care; companies just have insurance when it happens so they don't really care either; it's just a necessary line item) about their data being exposed etc as long as things are cheap cheap. People daily work with systems that are terrible in every way and then they get hacked (for ransom or not). Now we can just make things cheaper/faster and people will like it. Even at the current level software will be vastly easier and faster to make; sure it will suck, but I'm not sure anyone outside HN cares in any way shape or form (I know our clients don't; they are shipping garbage faster than ever and they see our service as a necessary business expense IF something breaks/messes up). Which means that it won't matter if LLMs get better; it matters that they get a lot cheaper so we can just run massive amounts of them on every device committing code 24/7 and that we keep up our tooling to find possible minefields faster and bandaid them until the next issue pops up.
It kind of reminds me of when Uber launched, and taxi drivers were talking about how it was the death of safe, clean, responsible, and trustworthy taxi service.
Except taxi drivers were not any safer, cleaner, more responsible, or more trustworthy than the general population, and the primary objective of most taxi companies seemed to be to actively screw with their customer base.
In short, their claims were inaccurate and motivated by protecting their existing racket.
> Today, LLMs make development faster, not better.
You don't have to use them this way. It's just extremely tempting and addictive.
You can choose to talk to them about code rather than features, using them to develop better code at a normal speed instead of worse code faster. But that's hard work.
Perhaps I'm doing something wrong, but I can't use them that way, hard work or no. It feels like trying to pair program with a half-distracted junior developer.
What's the point of that? A skilled developer can already develop high quality code at a normal speed?
I will use AI for suggestions when using an API I'm not familiar with because it's faster than reading all the documentation to figure it the specific function call I need, but I then follow up on the example to verify it's correct and I can confidently read the code. Is that what you're taking about?
A vibe coder without 20+ years of experience can't do that, but they can publish an app or website just the same.
What metric would you measure to determine whether a fully AI-based flow is better than a competent human engineer? And how much would you like to bet?
In this context, fewer security vulnerabilities exist in a real world vibe coded application (not a demo or some sort of toy app) than one created by a subject matter expert without LLM agents.
I'd be willing to bet 6 figures that doesn't happen in the next 2 years.
The current models cannot be made to become better than humans who are good at their job. Many are not good at their job though and I think (see) we already crossed that. Certain outsourcing countries could have (not yet, but will have) millions of people without jobs as they won't be able to steer the LLMs to making anything usable as they never understood anything to begin with.
For people here on HN I agree with you; not in the next 2 years or, if no-one invents another model than the transformer based model, not for any length of time until that happens.
Agreed. I think the parent poster meant it differently, but I think self driving cars are an excellent analogy.
They've been "on the cusp" of widespread adoption for around 10 years now, but in reality they appear to have hit a local optimum and another major advance is needed in fundamental research to move them towards mainstream usage.
Analogous to the way I think of self-driving cars is the way I think of fusion: perpetually a few years away from a 'real' breakthrough.
There is currently no reason to believe that LLMs cannot acquire the ability to write secure code in the most prevalent use cases. However, this is contingent upon the availability of appropriate tooling, likely a Rust-like compiler. Furthermore, there's no reason to think that LLMs will become useful tools for validating the security of applications at either the model or implementation level—though they can be useful for detecting quick wins.
Yeah, there's a massive difference between a system that can handle a specific number of well-defined situations, and a system that can handle everything.
I don't know what the current state of self-driving cars is. Do they already understand the difference between a plastic bag blowing onto the street, and a football rolling onto the street? Because that's a massive difference, and understanding that is surprisingly hard. And even if you program them to recognize the ball, what if it's a different toy?
Let's maybe cross that bridge when (more important, if!) we come to it then? We have no idea how LLMs are gonna evolve, but clearly now they are very much not ready for the job.
For now we train LLMs on next token prediction and Fill-in-the-middle for code. This exactly reflects in the experience of using them in that over time they produce more and more garbage.
It's optimistic but maybe once we start training them on "remove the middle" instead it could help make code better.
There are plenty of security people on the other side of this issue; they're just not making news, because the way you make news in security is by announcing vulnerabilities. By way of example, last I checked, Dave Aitel was at OpenAI.
This would be a cool place for LLMs to store a summary of the prompts used to generate the code in order to make it easier for other LLMs and humans to pick up where they left off.
I've been thinking about this exact same problem. It would be great to store a log of how much time/tokens were spent reasoning, what was the reasoning path, why were certain decisions made, etc.
I don't know if rationale is something better suited for the git commit log, or tagged by code function to an external "rationale" system of record.
If you're buying modern phones and expect the charger to also be used with your future phone, I'd look for a USB PD capable power supply with PPS support. (Edit: Many of the phone makers that are listed as having proprietary technologies support PD on newer phones. Since the EU mandates USB PD, I would expect the vast majority of new devices to support it at least well enough that you won't need anything else.)
Rather than 10 of a given charger, consider a smaller number of GaN chargers with multiple ports, but be aware that many of the "smart" ones will reset all ports if any port is reconnected or renegotiates. I have a "smart" charger capable of outputting 100 W on one port or some mix of wattages on multiple ports (mainly for travel), and a "dumb" multi-port charger that I use both for slow charging of phones and for powering IoT devices that I don't want to be reset. The latter simply has multiple USB-A ports, which lets me charge almost anything - either with an A-to-C cable, or A-to-whatever-that-device-needs (either Micro-USB, Mini-USB, or something proprietary).
This would make a lot of sense for both sides I think. Owning the part of the stack that decides where the inference requests go is like Google owning the browser.