CallDocumentation Index
Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
kensa eval from Claude Code, Cursor, Codex, OpenCode, Gemini CLI, Claude Desktop, or any other MCP-aware client. Tools are thin adapters over the CLI surface, so mcp.call_tool("eval") runs the same pipeline as kensa eval.
Use
uvx kensa-mcp for the cleanest zero-install setup. Use uv run kensa mcp when you want the server version to exactly match the project dependency.Connect your MCP client
Quick install (Claude Code)
Run this from your project root:Project root
uvx pulls kensa-mcp from PyPI into an isolated environment on first launch and reuses it afterward. Nothing to pre-install. The server inherits cwd from Claude Code and reads .kensa/ relative to that directory, so always invoke claude mcp add from the repo that contains your scenarios.
Manual JSON config
For Cursor, Codex, Claude Desktop, and other MCP clients, add this to the client config (e.g.~/.claude.json or a project-local .mcp.json):
Client config via uvx
kensa mcp subcommand (it matches the version you have installed):
Client config via project dependency
mcp extra (uv add "kensa[mcp]"). Without it, kensa mcp prints a one-line install hint.
Source checkout (kensa contributors)
Contributor setup from a source checkout
Verify manually
The MCP client starts the stdio server for you. Run it manually only to verify setup or use HTTP mode:Manual verification
127.0.0.1 by default. Do not expose the HTTP transport on a public interface without a bearer token in front. The run and eval tools execute subprocesses with no auth of their own.
Tools
| Tool | Purpose | Returns |
|---|---|---|
init | Scaffold .kensa/ (idempotent) | InitResponse or MCPError |
doctor | Pre-flight diagnostics | DoctorResponse with ready flag |
run | Execute scenarios, capture traces | RunSummary with manifest_uri |
judge | Score a run (checks + LLM judge) | JudgeSummary with results_uri |
eval | run + judge + HTML report | EvalSummary with results_uri |
report | Render results in a chosen format | ReportResponse |
analyze | Cost/latency stats + anomaly flags | Analysis |
run, judge, eval) report progress over ctx.report_progress when the client provides a Context, and return a compact summary plus a resource URI pointing at full detail. Fetch the resource only when you need it.
Resources
Read-only data under thekensa:// namespace.
| URI | What it returns |
|---|---|
kensa://runs | List of the 50 most recent eval runs (newest first; capture-only manifests are filtered out before the cap) |
kensa://runs/{run_id} | Manifest plus summary for one run |
kensa://runs/{run_id}/results | Full judged results for one run |
kensa://runs/{run_id}/trace/{scenario}/{index} | Spans for one scenario execution (index is 0-based; dataset-backed scenarios produce one entry per row) |
kensa://scenarios | List of scenarios in .kensa/scenarios/ |
kensa://scenarios/{scenario_id} | Full scenario definition |
kensa://judges | Names of structured judge prompt specs |
kensa://judges/{name} | A single JudgePromptSpec |
Errors
Tools never raise across the protocol boundary. Failures return a stable envelope:code | When |
|---|---|
scenarios_missing | Scenario directory does not exist |
scenario_not_found | Requested scenario ID not in the directory |
scenario_invalid | YAML syntax or schema error in a scenario file |
run_not_found | Referenced run has no manifest on disk |
run_not_evalable | Targeted run is a capture, not an eval (use kensa generate instead) |
no_judge_key | No judge API key (ANTHROPIC_API_KEY / OPENAI_API_KEY) set |
invalid_run_id | run_id failed path-safety validation |
path_escape | A path argument tried to escape the workspace |
unknown_format | Requested report format is not one of terminal / markdown / json / html |
subprocess_failed | A scenario subprocess crashed before the runner could collect spans |
internal | Uncategorised failure, surface the message to the user |