Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They know all the flags and are generally better at interpreting tool output than I am.

In the toy example, you explicitly restrict the agent to supply just a `host`, and hard-code the rest of the command. Is the idea that you'd instead give a `description` something like "invoke the UNIX `ping` command", and a parameter described as constituting all the arguments to `ping`?



Honestly, I didn't think very hard about how to make `ping` do something interesting here, and in serious code I'd give it all the `ping` options (and also run it in a Fly Machine or Sprite where I don't have to bother checking to make sure none of those options gives code exec). It's possible the post would have been better had I done that; it might have come up with an even better test.

I was telling a friend online that they should bang out an agent today, and the example I gave her was `ps`; like, I think if you gave a local agent every `ps` flag, it could tell you super interesting things about usage on your machine pretty quickly.


Or have the agent strace a process and describe what's going on as if you're a 5 year old (because I actually need that to understand strace output)


Iterated strace runs are also interesting because they generate large amounts of data, which means you actually have to do context programming.


What is Sprite in this context?


I'm guessing the Fly Machine they're referring to is a container running on fly.io, perhaps the sprite is what the Spritely Institute calls a goblin.


Also to be clear: are the schemas for the JSON data sent and parsed here specific to the model used? Or is there a standard? (Is that the P in MCP?)


Its JSON schema, well standardized, and predates LLMs: https://json-schema.org/


Ah, so I can specify how I want it to describe the tool request? And it's been trained to just accommodate that?


Most LLMs have tool patterns trained into them now, which are then managed for you by the API that the developers run on top of the models.

But... you don't have to use that at all. You can use pure prompting with ANY good LLM to get your own custom version of tool calling:

  Any time you want to run a calculation, reply with:
  {{CALCULATOR: 3 + 5 + 6}}
  Then STOP. I will reply with the result.
Before LLMs had tool calling we called this the ReAct pattern - I wrote up an example of implementing that in March 2023 here: https://til.simonwillison.net/llms/python-react-pattern




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: