NEWkensa capture

Zero to evals in minutes.

Your coding agent drafts evals. You approve. Kensa instruments and runs them.

$uvx kensa init

Adds kensa CLI, scaffolds your project, and drops skills into whichever coding agent you use. Python 3.10+.

Works with
Claude CodeCodex CLICursorGemini CLIOpenCode

How it works

Kensa turns agent behavior into repeatable evals: scenarios in, traces captured, checks run, reports out.

01

Zero to eval

Ask your coding agent to inspect the codebase and draft the first scenarios. You review evals instead of starting from a blank file.

02

Runs become traces

Kensa captures LLM calls, tool use, tokens, cost, and latency while your agent runs each scenario.

03

Checks gate judges

Assertions run before LLM judges, catching obvious regressions without spending tokens.

04

Ship with evidence

Get verdicts, traces, cost, latency, and failure details in terminal, Markdown, JSON, or HTML.

Each run leaves traces that kensa can turn into sharper scenarios.

Skills

5 skills take you from zero to eval, or from traces to targeted iteration.

/audit-evals

Assess readiness, identify testable behaviors, prepare the environment. The default entry point.

/generate-scenarios

Happy paths, edge cases, tool usage, error handling, cost bounds. One command.

/generate-judges

Binary pass/fail definitions with few-shot examples, ready to reuse across scenarios.

/validate-judge

Test judge accuracy against human labels. Iterates until TPR and TNR meet threshold.

/diagnose-errors

Categorize failures, identify patterns, recommend next action.

CLI PY3.10+

Works standalone for CI and local iteration. Checks run before the judge, so obvious failures stop early without spending tokens.

kensa initScaffold .kensa/ (bare; --example for a demo)
kensa captureRecord one real agent invocation as a trace
kensa generateSynthesize scenarios from captured traces
kensa evalrun + judge + report in one shot
kensa runExecute scenarios in subprocesses
kensa judgeDeterministic checks + LLM judge
kensa reportTerminal, markdown, JSON, or HTML output
kensa analyzeCost/latency stats + anomaly flagging
kensa doctorPre-flight environment checks
kensa mcpServe kensa over MCP for LLM clients

FAQ

What agents does kensa work with?

Any Python agent that makes LLM calls. Auto-instrumentation covers Anthropic, OpenAI, and LangChain out of the box. Other providers work with manual OTel config.

Do I need to modify my agent code?

No. kensa auto-instruments your agent at startup. Zero code changes needed.

Can I run kensa in CI?

Yes. kensa eval --format markdown is all you need. Deterministic checks need no API keys. Add judge keys as secrets for LLM-judged criteria.

Can I drive kensa from an MCP client?

Yes. In Claude Code, claude mcp add kensa -- uvx kensa-mcp registers the stdio server — uvx fetches the kensa-mcp package from PyPI on first run, no pre-install needed. Every CLI action is a tool, and runs, scenarios, and judges are readable as resources under kensa://.

Is kensa free?

Yes, it is MIT licensed. The only cost is your LLM API calls for judge criteria, and that's optional.