During my experience as a software engineering we often solve production bugs in this order:
1. On-call notices there is an issue in sentry, datadog, PagerDuty
2. We figure out which PR it is associated to
3. blame the person that does the PR
4. Tells them to fix it and update the unit tests
Although, the key issue here is that PRs tell you where a bug landed and with agentic code, they often don’t tell you why the agent made that change.
with agentic coding a single PR is now the final output of:
prompts + revisions, wrong/stale repo context, tool calls that failed silently (auth/timeouts), constraint mismatches (“don’t touch billing” not enforced)
So I’m starting to think incident response needs “agent traceability”: prompt/context references, tool call timeline/results, key decision points, mapping edits to session events.
Essentially, in order for us to debug better we need to have an the underlying reasoning on why agents developed in a certain way rather than just the output of the code.
The alternative right now is that these corps are currently flying blind with what is currently being sent to these LLMs, MCP tools, etc.
There needs to be an audit trail for these Agents and without it we have a severe gap in visibility not only from security, but also improving the process itself.
Yeah, the space is still developing, but my bet is that we are going to see more conversations on how to improve these across the board and it starts with shifting the solution left.
Setup is pretty easy. If you use cursor or vs code you just click the button on the readme. if you are using anything else you just copy and paste the MCP server.
The second step is to add a rule that tells it to use the a24z memory mcp server. After that it should automatically do everything for you. I am having some trouble where it doesn't always call the tool, but I get around that just by adding "use a24z memory MCP" as part of my prompt.
I have only been building it for the past week so I am trying to get some feedback on what is versus is not working. I am building some evaluation tools to see what the baseline is with and without a24z. I'll have those for you and write another post as I evolve the project.
1. On-call notices there is an issue in sentry, datadog, PagerDuty
2. We figure out which PR it is associated to
3. blame the person that does the PR
4. Tells them to fix it and update the unit tests
Although, the key issue here is that PRs tell you where a bug landed and with agentic code, they often don’t tell you why the agent made that change.
with agentic coding a single PR is now the final output of:
prompts + revisions, wrong/stale repo context, tool calls that failed silently (auth/timeouts), constraint mismatches (“don’t touch billing” not enforced)
So I’m starting to think incident response needs “agent traceability”: prompt/context references, tool call timeline/results, key decision points, mapping edits to session events.
Essentially, in order for us to debug better we need to have an the underlying reasoning on why agents developed in a certain way rather than just the output of the code.