Skip to content

ability-eval

Code evaluation ability

ability-eval is a kadi-ability that provides an evaluation engine for code diffs, test results, logs, and behavior trace analysis. It is designed to run as a native ability on a local broker network named “eval”, using an LLM backend configured in config.toml. The package acts as a specialization point in the AGENTS ecosystem for analyzing execution traces and test outcomes and returning structured evaluation results for other agents or orchestrators.

  • Entrypoint: dist/index.js (built from index.ts). The ability is started as a kadi ability and connects to a local broker as configured in config.toml.
  • Broker connectivity: the ability connects to a broker URL configured under [broker.local]. It joins NETWORKS = [“eval”], so it receives and responds to messages on the “eval” network.
  • Model backend: evaluation tasks that require LLM reasoning use the model settings under [model] (EVAL_MODEL, MAX_TOKENS).
  • Dependencies: this ability uses @kadi.build/core and agents-library to register as a kadi ability and to interact with brokers and other agents.
  • Ability dependencies: agent.json declares a required ability “secret-ability”. That implies the ability expects secret management (credentials, keys) to be provided by another ability at runtime.
  • High-level data flow:
    1. Ability bootstraps and connects to broker (broker.local.URL).
    2. It listens for evaluation requests on the “eval” network.
    3. For LLM-backed evaluations, it consults model config (EVAL_MODEL, MAX_TOKENS) and any secrets provided by “secret-ability”.
    4. It produces structured evaluation payloads: diffs, test summaries, logs, and traces, and publishes responses back over the broker.

This repository does not include an explicit tool registration snapshot in the provided source artifacts (no tool registry JSON found). The key interface points you should expect to inspect in the codebase (dist/index.js / index.ts) are:

  • Entrypoint/registration: dist/index.js — registers the kadi ability with the broker and hooks message handlers.
  • Declared ability dependency: “secret-ability” (agent.json) — used to access secrets at runtime.

Tools / exported functions (from available metadata)

Name / SurfaceDescriptionKey parameters / config
kadi ability entrypoint (dist/index.js)Main process that connects to broker and registers message handlers for evaluation tasksbroker.local.URL, broker.local.NETWORKS
secret-ability (declared ability)Declared dependency; expected to provide secrets (credentials/API keys)Provided by another ability at runtime
Model configurationControls which LLM is used for evaluation and token limitsmodel.EVAL_MODEL, model.MAX_TOKENS

If you need to extend or inspect exported functions, inspect the source index.ts (project root) and the compiled dist/index.js. Registration with @kadi.build/core will typically show calls such as registerAbility(…) or broker.connect(…).

All configuration lives in config.toml (provided excerpt). Key fields:

  • [broker.local]

    • URL: string — WebSocket URL to the broker (e.g., “ws://localhost:8080/kadi”)
    • NETWORKS: array[string] — broker networks to join (e.g., [“eval”])
    • MODE: string — runtime mode; present value “native”
  • [model]

    • EVAL_MODEL: string — model identifier for LLM-based evaluation (e.g., “claude-sonnet-4-20250514”)
    • MAX_TOKENS: integer — token cap for LLM calls (e.g., 4096)

There are no explicit environment variables or secrets vault configuration in the provided config.toml. The ability depends on an external ability named “secret-ability” (agent.json -> abilities), which is the intended provider for any sensitive credentials (API keys, tokens). Inspect the runtime registration code (index.ts) to see how secrets are requested/consumed (likely via the kadi core ability runtime).

Provided config.toml excerpt:

# Ability Eval Configuration
[broker.local]
URL = "ws://localhost:8080/kadi"
NETWORKS = ["eval"]
MODE = "native"
[model]
EVAL_MODEL = "claude-sonnet-4-20250514"
MAX_TOKENS = 4096

Common runtime configuration patterns to check in the source:

  • Loading of config.toml via @kadi.build/core or a TOML parser
  • Overriding config via environment variables (not present in config.toml — check index.ts)
  • Secret fetch calls to “secret-ability” at startup or on demand

The repository’s package metadata and scripts are useful to run and debug the ability. Below are the exact script excerpts from agent.json so you can run or build the ability as intended.

agent.json scripts excerpt:

{
"scripts": {
"preflight": "node --version",
"setup": "npm install && npm run build",
"build": "npx tsc",
"start": "node dist/index.js",
"dev": "npx tsx index.ts",
"serve": "npx tsx index.ts stdio",
"serve:broker": "npx tsx index.ts broker",
"clean": "rm -rf node_modules abilities agent-lock.json package-lock.json dist"
}
}

Key usage patterns:

  • Development: npx tsx index.ts — runs the TypeScript entry directly (index.ts must exist).
  • Production: npm run build && npm start — compile to dist/ and run dist/index.js.
  • Broker mode: npx tsx index.ts broker — starts in broker-facing mode (behavior depends on index.ts implementation).

Note: The actual handler functions and exported evaluation utilities are implemented in index.ts / dist/index.js. Inspect these files to see message names, handler signatures, and data schemas (requests/response payloads).

Declared package dependencies (from provided dependency manifest):

{
"dependencies": {
"@kadi.build/core": "*",
"agents-library": "*",
"tsx": "^4.21.0"
},
"devDependencies": {
"@types/node": "^25.3.1",
"typescript": "^5.9.3"
}
}
  • Runtime dependencies:

    • @kadi.build/core — core kadi runtime primitives for abilities and broker integration.
    • agents-library — shared types/utilities across AGENTS platform (message envelopes, schemas).
    • tsx — used for running TypeScript entry during development.
  • Dev dependencies:

    • typescript, @types/node — build and type checking.

What this ability depends on:

  • The “secret-ability” (declared in agent.json abilities) — used to access secrets required for LLM or external services.
  • A running broker at broker.local.URL reachable on network “eval”.

What depends on this ability:

  • No explicit consumers are declared in this repository, but other agents that need evaluation of code diffs, test results, or traces will subscribe to the “eval” network and call this ability via the broker.
  • Inspect index.ts (source) and dist/index.js (compiled) to see the exact message names and handler signatures. This doc intentionally avoids inventing function names — implementations live in index.ts.
  • When adding LLM calls, honor model.EVAL_MODEL and model.MAX_TOKENS from config.toml, and fetch API keys from the “secret-ability”.
  • Ensure that broker.local.NETWORKS contains “eval” so the ability is reachable by agents expecting evaluation services.
  • Use the provided npm scripts for development (“dev”, “serve”, “serve:broker”) and CI-friendly builds (“build”, “start”).

If you need deeper details about message schemas, handler functions, or exported tools, open index.ts and the compiled dist/index.js — those files contain the concrete implementation and registration calls to @kadi.build/core.