Skip to content

ability-docs-memory

Documentation search engine — 4-signal hybrid recall

ability-docs-memory is an ability for the AGENTS ecosystem that crawls, chunks, indexes, and searches documentation using a 4-signal hybrid recall (semantic, keyword, graph, structural). It stores documentation as DocNode vertices in a graph database, creates NEXT_SECTION and REFERENCES structural edges for navigable context, and exposes tools for searching (docs-search), reindexing (docs-reindex), reading pages (docs-page), and index inspection (docs-index-status). It exists to provide a documentation-first memory/search layer optimized for structural navigation and fused ranking.

  • Inputs: raw documentation pages (markdown or llms-guides.txt), or live crawls performed during reindex.
  • Pipeline (docs-reindex):
    1. Crawl / split pages into PageDocument items (splitIntoPages).
    2. Delegate chunking + embedding + storage to ability-graph’s graph-index / graph-batch-store (bulk ingestion).
    3. Post-process stored DocNode vertices to create NextSection and References edges via graph queries.
    4. Optional LLM-based extraction of entities/topics (extraction_model).
  • Querying (docs-search):
    • Delegates recall to ability-graph’s graph-recall tool with enforced vertexType ‘DocNode’, default signals [‘semantic’,‘keyword’,‘graph’,‘structural’], and structuralEdges [‘NextSection’,‘References’] so results can be expanded along NEXT_SECTION and REFERENCES links.
    • Embedding options (model, transport, apiUrl, apiKey) are forwarded from config.
  • Key components:
    • KadiClient: registers the docs-* tools and serves them either in-process (native), via broker, or stdio.
    • ability-graph: used as a loaded ability (native load if available) for graph-query, graph-recall, graph-index, graph-batch-store, etc.
    • secret-ability / vault: used to load secrets (API keys, model manager) via loadDocsConfigWithVault.
    • DocNode vertex schema (DOCNODE_SCHEMA) — canonical representation of documentation chunks.
  • How it fits in AGENTS:
    • Can be loaded natively via client.loadNative(‘ability-graph’) for in-process tools.
    • Can be started as a broker-served ability so other agents invoke docs-* tools remotely.
    • Other agents call docs-search/docs-page to retrieve documentation context during tasks, reasoning, or answer grounding.

The ability registers four tools. The registration functions are exported/used internally:

  • registerSearchTool
  • registerReindexTool
  • registerPageTool
  • registerIndexStatusTool
ToolRegistration functionDescriptionKey input params
docs-searchregisterSearchToolSearch documentation using 4-signal hybrid recall; structural expansion follows NEXT_SECTION and REFERENCES.query (string), collection? (string), limit? (number), mode? (‘semantic'
docs-reindexregisterReindexToolFull reindex pipeline: crawl → chunk → embed → store; creates NextSection & References edges.pages? (PageDocument[]), llmsContent? (string), llmsTxt? (string), collection? (string), clearExisting? (boolean), background? (boolean), skipExtraction? (boolean)
docs-pageregisterPageToolFetch a single documentation page by slug, return chunks and metadata, optional related pages.slug (string), collection? (string), includeRelated? (boolean)
docs-index-statusregisterIndexStatusToolQuery index statistics: total DocNodes, per-collection counts, topics/entities, edge counts, last indexed time.collection? (string)

Notes:

  • Tools are registered on a KadiClient instance via client.registerTool(…) with zod input schemas. See registration code for exact typing and descriptions.
  • Internally, tools call abilities.invoke(‘graph-recall’|‘graph-query’|‘graph-batch-store’|…) on the loaded ability-graph.

Primary config file: config.toml (walk-up from CWD). Secrets go in secrets.toml (encrypted vault) and are resolved by loadDocsConfigWithVault.

Provided config.toml fields (the [docs] section):

  • database: string (e.g., “agents_memory”)
  • default_collection: string (e.g., “agents-docs”)
  • embedding_model: string (e.g., “text-embedding-3-small”)
  • extraction_model: string (e.g., “gpt-5-nano”)
  • max_tokens: number (e.g., 500)
  • base_url: string (e.g., “http://localhost:3333”)
  • domain: string (e.g., “localhost”)
  • embedding_transport: string (“api” or other transport)
  • chat_transport: string Broker config section:
  • [broker.remote]
    • URL (string)
    • NETWORKS (string[])
    • MODE (string)

Environment variables (overrides):

  • BROKER_URL — overrides broker URL resolution used by the client (used in index.ts resolveBrokerUrl()).
  • Config loader supports environment env override precedence (see loadDocsConfigWithVault). The loader also looks up vault/model-manager secrets (MODEL_MANAGER_BASE_URL, MODEL_MANAGER_API_KEY) if configured.

Secrets / Vault:

  • Secrets such as API keys (embedding / model manager) should be stored in the encrypted vault (secrets.toml) and are retrieved by loadDocsConfigWithVault via secret-ability. The config loader will expose config.apiUrl and config.apiKey for embedding/tooling use (used when invoking graph-recall).

agent.json (ability metadata) declares runtime dependencies:

  • abilities: ability-graph (^0.1.2), secret-ability (^0.9.4)
  • brokers: remote broker address (wss://broker.dadavidtseng.com/kadi)

Key patterns and actual excerpts from the source.

  • Client bootstrap and ability-graph loading (src/index.ts):
const brokerUrl = resolveBrokerUrl();
const agentJson = loadAgentJson();
const client = new KadiClient({
name: agentJson.name ?? 'ability-docs-memory',
version: agentJson.version ?? '0.0.1',
brokers: {
default: { url: brokerUrl },
},
});
console.log('[ability-docs-memory] Loading configuration…');
const config: DocsConfig = await loadDocsConfigWithVault(client);
console.log('[ability-docs-memory] Loading ability-graph…');
let graphAbility: LoadedAbility | null = null;
try {
graphAbility = await client.loadNative('ability-graph', { timeout: 0 });
console.log('[ability-docs-memory] ability-graph loaded');
} catch (err: any) {
console.warn(
'[ability-docs-memory] ability-graph native load failed, continuing:',
err?.message ?? err,
);
}
const LONG_RUNNING_TOOLS = new Set(['graph-batch-store']);
const abilities: SignalAbilities = {
invoke: <T>(tool: string, params: Record<string, unknown>) => {
if (!graphAbility) {
throw new Error('ability-graph not loaded — cannot invoke tools');
}
const opts = LONG_RUNNING_TOOLS.has(tool) ? { timeout: 0 } : undefined;
return graphAbility.invoke<T>(tool, params, opts);
},
};
  • docs-search tool delegate to graph-recall (src/tools/search.ts):
const result = await abilities.invoke<Record<string, unknown>>('graph-recall', {
query: input.query,
vertexType: 'DocNode',
mode,
signals,
structuralEdges: ['NextSection', 'References'],
structuralDepth: input.structuralDepth ?? 1,
structuralTopK: input.structuralTopK ?? 5,
filters,
limit,
database: config.database,
embedding: {
model: config.embeddingModel,
transport: config.embeddingTransport,
apiUrl: config.apiUrl,
apiKey: config.apiKey,
},
});
  • docs-page example query pattern (src/tools/page.ts):
const queryResult = await abilities.invoke<{
success: boolean;
result?: Array<Record<string, unknown>>;
}>('graph-query', {
database: config.database,
query:
`SELECT @rid, content, title, slug, pageUrl, source, chunkIndex, tokens, importance, metadata, indexedAt` +
` FROM DocNode` +
` WHERE slug LIKE '${safeSlug}%'` +
` AND collection = '${safeCollection}'` +
` ORDER BY chunkIndex ASC`,
});
  • docs-reindex (registration header and pipeline description) (src/tools/reindex.ts):
client.registerTool(
{
name: 'docs-reindex',
description:
'Reindex documentation into the graph database. Crawls pages, chunks by markdown headings, ' +
'extracts entities/topics, generates embeddings, and creates DocNode vertices with ' +
'NextSection and References edges. Uses graph-batch-store for bulk ingestion.',
input: z.object({
pages: z.array(z.object({
title: z.string().describe('Page title'),
slug: z.string().describe('URL slug for identification'),
pageUrl: z.string().describe('Full URL of the page'),
source: z.string().optional().describe('Source identifier'),
content: z.string().describe('Raw markdown content of the page'),
})).optional().describe('Pre-loaded page documents to index'),
...
}),
},
async (input) => {
const startTime = Date.now();
const collection = input.collection ?? config.defaultCollection;
try {
// ── Step 1: Resolve pages ──────────────────────────────────────
let pages: PageDocument[] = [];
console.error(`[docs-reindex] Step 1: Resolving pages…`);
...
}
}
);
  • Abilities:
    • depends on: ability-graph (core graph tools: graph-query, graph-recall, graph-index, graph-batch-store)
    • depends on: secret-ability (vault/secrets retrieval via loadDocsConfigWithVault)
  • Packages:
    • @kadi.build/core — KadiClient, tool registration, z (zod) types used across tools.
    • Node built-ins: fs, path, url for configuration and startup resolution.
  • What depends on ability-docs-memory:
    • Any agent that needs documentation recall/search should invoke docs-search/docs-page via broker or load this ability natively. It is designed to be called by other agents (native or remote) to provide documentation grounding and navigable context.
  • Internal dependency notes:
    • LONG_RUNNING_TOOLS contains ‘graph-batch-store’ so invocations to that graph tool are run with no timeout.
    • The search tool expects config.apiUrl and config.apiKey to be present when using embedding transport ‘api’ (these are resolved by loadDocsConfigWithVault).

Implementation / Modification Notes (for developers)

Section titled “Implementation / Modification Notes (for developers)”
  • To change structural edges used by docs-search, update structuralEdges: [‘NextSection’, ‘References’] in src/tools/search.ts.
  • To add additional index metrics, extend registerIndexStatusTool’s SQL queries (safeQuery usage).
  • The ability attempts to load ability-graph natively; if you require remote graph usage, ensure the broker exposes ability-graph or adjust abilities.invoke to route to a remote ability.
  • Configuration resolution order is environment → vault → config.toml → defaults. Use secrets.toml/vault for API keys rather than placing them in config.toml.
  • When adding long-running graph operations, add their tool names to LONG_RUNNING_TOOLS to bypass default invocation timeouts.

If you need a walkthrough for adding a new structural edge type, extending the DocNode schema, or hooking a custom embedding backend, tell me which area you want to modify and I will provide step-by-step code edits.