I think it is great for experimenting, and proving concepts. Alphas and personal projects, not shipped code.
I've been working on wasm sandboxing and automatic verification that code doesn't have the lethal trifecta and got something working in a couple of days.
I've wondered if LLMs can help match people. People give the LLM some public context about their lives and two LLMs can have a chat about availablity and world views.
Use AI to scaffold relationships not replace them.
How does one make sure the implementation is sufficient and complete? It feels like assuming total knowledge of the world, which is never true. How many false positives and false negatives do we tolerate? How does it impact a person?
I'm not sure. We can use LLMs to try
out different settings/algorithms and see what it is like to have it on a social level before we implement it for real.
Perhaps but I am not entirely optimistic about LLM's in this context but hey perhaps freedom to do this and then doing it might make a dent after all, one can never know until they experiment I guess
Fair, I don't know how valuable it would be. I think LLMs would only get you so far. They could be tried in games or small human contexts . We would need a funding model that rewarded this though.
I've been thinking about using LLMs to help triage security vulnerabilities.
If done in an auditably unlogged environment (with a limited output to the company, just saying escalate) it might also encourage people to share vulns they are worried about putting online.
I definitely think it's a viable idea! Someone like Hackerone or Bugcrowd would be especially well poised to build this since they can look at historical reports, see which ones ended up being investigated or getting bounties, and use the to validate or inform the LLM system.
The 2nd order effects of this, when reporters expect an LLM to be validating their report, may get tricky. But ultimately if it's only passing a "likely warrants investigation" signal and has very few false negatives, it sounds useful.
With trust and security though, I still feel like some human needs to be ultimately responsible for closing each bad report as "invalid" and never purely relying on the LLM. But it sounds useful for elevating valid high severity reports and assisting the human ultimately responsible.
Though it does feels like a hard product to build from scratch, but easy for existing bug bounty systems to add.
Echoresponse - a tool for responsible disclosure. Security Researchers and companies encode some of their secret knowledge in LLMs and the LLMs have a discussion and can say one word from agreed upon list back to the party that programmed them.
You joke, but that's a very real approach that AI pentesting companies do take: an agent that creates reports, and an agent that 'validates' reports with 'fresh context' and a different system prompt that attempts to reproduce the vulnerability based on the report details.
*Edit: the paper seems to suggest they had a 'Triager' for vulnerability verification, and obviously that didn't catch all the false positives either, ha.
At my first job, all the applications the data people developed were compulsorily evaluated through Fortify (I assume this is HP Fortify) and to this day I have no idea what the security team was actually doing with the product, or what the product does. All I know is that they never changed anything even though we were mostly fresh grads and were certainly shipping total garbage.
It's like, when you say agents will largely be relegated to "triage" --- well, a pretty surprising amount of nuts and bolts infosec work is basically just triage!
There is strong reason to expect evolution to have found a system that is complex and changing for its control system, for this very reason so it can't get easily gamed (and eaten).
I've been working on wasm sandboxing and automatic verification that code doesn't have the lethal trifecta and got something working in a couple of days.
I'd like to do a clean rewrite at some point.
reply