How to Write an Agent

2025-03-13

adk, agent, agent development kit, agent evals, ai, evals, evaluations, howto, nerve, nerve adk

Hello friends. This blog post was supposed to be the second part of this research, however I didn’t have enough time (or interest really) to dedicate to it, and when the 120-days disclosure window expired I went f**k-it mode and started working on other, more interesting things. Today, we’ll talk about agents and Nerve, a project I started a few months ago that makes implementing an agent simple and intuitive.

What Is An Agent?

The term “agent” has been increasingly used in recent times to describe various technologies, including OpenAI DeepResearch, Operator, and Grok 3. According to Wikipedia’s definition of “intelligent agent”:

In artificial intelligence, an intelligent agent is an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge.

In similar terms, we can think of an agent as a model (being it an LLM or other older algorithms) that, after making an observation, decides which tool to use from its “toolbox” to complete the given task, step by step, in a loop. This reframing, in my opinion, better aligns with the chat or sequential-based experience we have nowadays with large language models and gives us a clue for how an agent might look like in software.

Aren’t we all living in our own loop, trying to make the best of what we have, one day at a time? Anyways …

What Is A Tool?

In the context of LLMs and LLDMs, models can be made aware of and use these tools through a mechanism called, unsurprisingly, function calling. Most of the models you ever interacted with have been trained to understand a “here’s tool A that does X” message (the tool definition) in their prompts.

For instance, here’s how llama3.3 understands its toolbox as part of its system prompt:

When you receive a tool call response, use the output to format an answer to the orginal user question.
You are a helpful assistant with tool calling capabilities.

... snip ...

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

... tools definition here ...

Each tool definition is a JSON document that looks something like this:

{
  "name": "get_current_weather",
  "description": "Get the current weather in a given location",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "The city and state, e.g. San Francisco, CA"
      },
    },
    "required": [
      "location"
    ]
  }
}

The model receives all of these tool definitions as part its system prompt. When it is later asked in a chat, “What’s the weather in San Francisco?”, it can use one or more of these tools by responding with an _”I want to use tool A with these parameters: …” response message that looks like this:

{
    "function": {
        "name": "get_current_weather",
        "arguments": {
            "location": "San Francisco",
        }
    }
}

But how do we go from JSON to providing tools with actual functionality to the model? And how do we execute these tool calls? How do we return their output back to the model?

This part of the job - that “chat -> execute tools -> observe state -> loop or break if task done” loop from the earlier diagram - is delegated to whatever agentic framework you are going to use (and a lot of your good will).

There are several of these frameworks, each taking a different approach. Some provide a minimal API to control predefined agent types, while others use similar concepts with different abstractions. Some can feel overly intricate, making it challenging to navigate their design choices. They all come with certain limitations and, most importantly, require you to write code, learn their abstractions, and adapt your agent ideas to fit their structure.

In my personal view, many of them are more complex than necessary. If agent = a model executing tools in a loop, we can have something better - something that abstracts away the agent’s inner mechanics and lets us focus on the actual agent logic.

Why Nerve?

Nerve was written (and rewritten, now twice 😬) as an ADK (Agent Development Kit) with the KISS principle in mind: anything that can be generalized as part of the framework should not be something the user has to handle unless they choose to. So they can focus on the actual agent logic rather than the agent loop and other abstractions.

This is what a Nerve agent looks like:

agent: You are an helpful assistant using pragmatism and shell commands to perform tasks.

task: Find the running process that is using the most RAM.

using: [shell]

And this is how you run it in a terminal: nerve run agent.yml.

Nerve includes a built-in library of agent oriented tools, organized in namespaces, that can be included via the using directive (in this example, the shell namespace allows the agent to execute shell commands). You can think about this as a “standard library“ for agents including functionalities that can be reused.

At the time of writing, the existing namespaces are (check the documentation for the list of tools in each one; in bold are my personal favorites :D):

💻 shell - Let the agent execute shell commands.
📂 filesystem - Read-only access primitives to the local filesystem.
🔧 anytool - Let the agent create its own tools in Python.
💻 computer - Computer use primitives for mouse, keyboard, and screen.
🌐 browser - Browser use primitives. (very experimental but promising)
💬 inquire - Let the agent interactively ask questions to the user in a structured way.
🧠 reasoning - Simulates the reasoning process at runtime for models that have not been trained for it.
✅ task - Let the agent autonomously set the task as complete or failed.
🕒 time - Tools for getting the current date and time and waiting for a given number of seconds.

What Can I Do With It?

Nerve is generic enough to be used for a variety of tasks. Before we start creating our first agent, I want to spend a few words about some of the existing examples that interest me the most and/or that I use on a daily basis and why. In the majority of the cases you’ll see that the YAML implementation is rather simple despite what these agents can do :)

code-audit

This agent existed in Nerve’s examples folder since its first iteration and I have been using it one way or another for all sorts of things. The agent is given read-only access to a folder and the task to:

review the source code in it
append any potential vulnerability to an audit.jsonl report file
keep going until all files are processed

1
2
3

nerve run code-audit --target-path /path/to/src

nerve run code-audit # default to the current directory

Nerve ( https://t.co/wNopPIX7fu ) and the code_auditor example tasklet ( https://t.co/KjwSi6q2BE ) using GPT-4o to find a RCE vulnerability in the widget-options v4.0.7 Wordpress Plugin 🧠

Zero code, fully autonomous agent as a simple YAML file. pic.twitter.com/SkaI7ijGPx
— Simone Margaritelli (@evilsocket) December 2, 2024

changelog

This little utility is a lifesaver for generating an high-level, nicely formatted changelog like this one from a list of commits like these ones.

1 2	# -q is the quiet mode, logs are disabled and only the changelog markdown (and fatal errors) will be printed to stdout nerve run changelog -q > CHANGELOG.md

webcam

There are four pets in my apartment and several security cameras: I’ve always wanted a “bot” that could check the video feed, detect custom events I can describe with languange such as: “the doggo being cute” or “the kitten breaking something” (or “my son is taking their first steps” for another use-case :D) and alert me via Telegram or whatever. When good open source vision models started being a thing, I could not not write this one :D

# set the webcam rtsp url
export NERVE_WEBCAM_URL="rtsp://192.168.1.10:554/stream1"
# recommended: conversation window of size 5
nerve run examples/webcam -c 5

i always wanted a system that could check my ipcams and inform me when my pets are being cute ... pic.twitter.com/qvd52SrUdq
— Simone Margaritelli (@evilsocket) February 6, 2025

computer-use

An experimental agent that can be used with vision models to use your computer to perform tasks:

1	nerve run computer-use --task 'open the browser and check the news on cnn.com'

bettercap-agent

This is an example of how the API of a service can be used as a tool. The agent can interact with a running bettercap instance via its REST API and perform tasks like ‘find the most vulnerable WiFi access point’ or:

1	nerve run bettercap-agent --task 'deauth all the apple devices'

nerve run bettercap-agent --task 'find the oldest wifi access point'

What gets me every time is how the model uses the help menu to determine the best commands to execute ... AI can RTFM ❤️ pic.twitter.com/b2YbFyzuyK
— Simone Margaritelli (@evilsocket) March 11, 2025

android-agent

Very experimental but promising Android automation agent. An ADB shell is all an agent needs:

1	nerve run android-agent --task 'take a selfie and send it to jessica'

"open YouTube and search for ‘cats’” 🧠 pic.twitter.com/PfApM2zKqk
— Simone Margaritelli (@evilsocket) March 5, 2025

ab-problem

The agent is given a logic puzzle and its answer is evaluated at runtime. This example shows how a tool can alter the runtime state and set the task as complete or let the agent loop continue. This is a foundational feature for agent evaluations.

1	nerve run ab-problem --program 'A# A# #A #A B# #B #B #A'

recipe-workflow

Used to showcase the concept of workflows. Pick any food and use multiple agents to write a tasty and nicely formatted recipe for you:

1	nerve run recipe-workflow --food 'spaghetti alla carbonara (con guanciale, non pancetta!)'

Creating an Agent

Let’s start with the fun stuff! First, to install and use Nerve, you’ll need Python 3.10 or newer.

Use PIP to install (or upgrade) Nerve:

1	pip install --upgrade nerve-adk

Then, if you don’t feel like creating the agent YAML file from scratch, you can use the guided procedure for agent creation.

Start creating an agent by executing the command nerve create weather.yml and when prompted, use these values:

Path: leave default
System prompt: leave the default for now (however, it is important to define the “persona” of your agent as accurately as possible. For examples of simple system prompts you can reuse, check the agents in the examples folder).
Task: What's the weather in {{ place }}? (the {{ place }} jinja2 syntax allows us to create parametric agents)
Tools: select task and time; deselect (for now) shell.

At the end of the procedure, the weather.yml file that Nerve generated will look like this (except for the comments and formatting I added here for clarity):

agent: You are an helpful assistant.

task: What's the weather in {{ place }}?

# we use the 'task' namespace so the agent can set the task as complete autonomously
# and 'time' because why the hell not
using:
    - task 
    - time

Since we are referencing the place variable in the prompt, we’ll need to provide it as a command line argument, otherwise the agent will exit with the message:

Command line argument –place is required in non interactive mode.

Moreover, our agent doesn’t have weather forecast related tools (yet), therefore if we run it with nerve run weather.yml --place rome we’ll likely see it “giving up” like this:

Adding Tools

Via YAML

We could easily extend the agent tooling by adding a tools section, with a get_current_weather tool that will use curl to read wttr.in and return the forecast information to the model:

# we added the second part to let the agent use the task namespace effectively
agent: You are an helpful assistant. Set your task as complete after you have reported the weather to the user.

task: What's the weather in {{ place }}?

using:
    - task 
    - time

# the agent extended toolbox ^_^
tools:
    - name: get_current_weather
      description: Get the current weather in a given location
      arguments:
        - name: location
          description: The city and state, e.g. San Francisco, CA
          # nerve supports providing examples to the models to help them 
          # use the tools more effectively
          example: Rome
      # the command line, arguments can be used via {{ name }} syntax (jinja2)
      tool: curl wttr.in/{{ place }}

If we run this agent again with nerve run weather.yml --place rome, now the agent will execute and use the output of the new tool:

Via (LLM’s) Common Sense

This, however, is unnecessary. These models are smart enough to figure the right tool to use on their own, so most of the times we can just do this:

agent: You are an helpful assistant, use the shell to get the weather, report it in a nice format and then set your task as complete.

task: What's the weather in {{ place }}?

using:
- task
- shell

This version of the agent simply relies on the shell standard namespace. The model will use it to execute the CURL command curl -s 'http://wttr.in/Rome?format=3':

This demonstrates how, when provided with simple tools to complete a task, a model will naturally determine how to use them effectively. The emergent behavioral complexity these models exhibit when equipped with a robust tooling framework and a state machine to operate within suggests that we are only scratching the surface of what’s possible with existing models.

In Python

Tools that require more complex logic can be implemented in Python. Creating a tools.py file in the same folder of the agent will automatically provide the functions as tools:

import typing as t

# This annotated function will be available as a tool to the agent.
def read_webcam_image(foo: t.Annotated[str, "Describe arguments to the model like this."]) -> dict[str, str]:
    """Reads an image from the webcam."""

    # a tool can return a simple scalar value, or a dictionary for models with vision.
    base64_image = '...'
    return {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
    }

A perfect example of this feature is the ab-problem agent that relies on Python tools to evaluate the model response to a logic puzzle that’s dynamically generated, and complete the task when the evaluation is successful.

However keep in mind that I managed to have an agent use my Android phone just with the shell and adb shell. So, when in doubt, KISS! :D

As an SDK / ADK

Ultimately, if you really want to write code, Nerve can be used as a Python package to fully customize your agent loop, down to the single step.

import asyncio
from typing import Annotated

import httpx

from nerve.models import Configuration
from nerve.runtime import logging
from nerve.runtime.agent import Agent


# Annotate functions and parameters to describe them to the agent.
async def get_current_weather(location: Annotated[str, "The city and state, e.g. San Francisco, CA"]) -> str:
    """Get the current weather in a given location."""

    try:
        async with httpx.AsyncClient() as client:
            r = await client.get("https://wttr.in/" + location)
            return r.text
    except Exception as e:
        # let the agent know what happened
        return f"ERROR: {e}"


async def main():
    # pass level='DEBUG' to get more verbose logging or level='SUCCESS' to get quieter logging
    logging.init(level="INFO")

    agent = Agent.create(
        "openai/gpt-4o",  # the model to use
        Configuration(
            agent="You are a helpful assistant.",
            task="What is the weather in {{ place }}?",
            using=[
                "task",  # to allow the agent to set the task as complete autonomously
            ],
            tools=[get_current_weather],
        ),
    )

    # run until done or max steps reached or max cost reached or timeout
    await agent.run(start_state={"place": "Rome"}, max_steps=100, max_cost=10.0, timeout=10)


if __name__ == "__main__":
    asyncio.run(main())

Check the examples/adk folder for more examples.

Supported Models

Nerve uses LiteLLM, therefore any inference provider in this list is supported and different models can be used either via the -g / --generator command line argument, or by setting the NERVE_GENERATOR environment variable.

For instance, to run the weather agent via a local ollama inference server for free:

1	nerve run -g 'ollama/qwq' weather.yml --place rome

To use another server:

1	nerve run -g 'ollama/qwq?api_base=http://your-ollama-server:11434' weather.yml --place rome

Other Features

Agents can be made of a single YAML file like in the weather.yml example, or of a folder with an agent.yml inside, so that foobar/agent.yml is detected as an agent called foobar. If you want to access an agent from anywhere in the terminal, you can copy it in the ~/.nerve/agents folder:

# create the folder if it doesn't exist
mkdir -p $HOME/.nerve/agents

cp weather.yml $HOME/.nerve/agents

Now you can use it with nerve run weather from anywhere. My favorite agents so far to have in this load path are this changelog generator, and code-audit that I run regularly on my code changes.

Sessions can be recorded with --trace:

1	nerve run weather.yml --place rome --trace trace.jsonl

And replayed with:

1 2	# -f for fast forward nerve play trace.json -f

As usual, for more features and information, read the f…antastic manual :D

Getting to 1.0.0

Nerve 1.x.x is a distillation of lessons learned. The first iteration that I wrote during summer 2024 was an explorative, soon-to-be-reimplemented Python PoC. Back then, I thought that reimplementing it in Rust would have been a good idea, so I wrote what now lives on the legacy-rust branch.

Boy, was I wrong.

There’s no single Rust crate for talking to any LLM via a unified interface … and the ones that exist for specific providers don’t support most of the functionalities needed for an agent. Python is simply the language of AI, so by leaving it behind I also left behind all those convenient libraries like LiteLLM and LangChain and I had to reimplement a lot of stuff (and more stuff = more technical debt). Plus a dozen other relatively minor issues. Most importantly after a few months of writing agents and Rust refactorings, I consolidated a set of cleaner, simpler abstractions that allowed me to reimplement everything (again) in Python in a more elegant solution.

This new version can talk to any LLM of any provider, offers a very powerful templating engine (jinja2) for the prompts, can be extended with endless libraries and it makes defining, chaining and running complex agents very simple.

Embrace change.

Future

I’ll be allocating most of my time on this project. It’s a lot of fun and I believe it has great potential.

Interactive Mode: a debugger-like mode that allows you to pause the agent execution, inspect the state, change things around, ask questions, give feedback, new directives and then continue.

Workflows 2.0: a rewrite of the workflows system with an event bus (both for IPC and network)
that agents can advertise their presence on, and talk to each other with. Each agent is its own process attached to this shared bus.

Browser-use: so hard to get right for real, work in progress …

Project-MU: an open source evaluation framework built with Nerve.

More Tools: the standard library will probably be growing with more capabilities.

More Examples: the more the better! Some of the agents will probably be moved to their own repository as mini projects.

More Integrations: bettercap is just the beginning.

BTW I’m looking for a (remote) job, so if you have any openings for a good software engineer with hands-on experience with AI/ML and cybersecurity, feel free to check my resume, ping me on LinkedIN or drop me a line at evilsocket AT gmail DOT com ♥ 🙏