Quickstart

Get kensa running in under a minute.

npx skills add satyaborg/kensa   # install eval skills
uv add kensa                     # or: pip install kensa

This is the recommended setup for Codex, Cursor, OpenCode, Gemini CLI, and other coding agents. Installs five skills (audit-evals, generate-scenarios, generate-judges, validate-judge, diagnose-errors) plus the CLI runtime. Then just say "evaluate my agent".

The skill automatically adds kensa as a project dependency on its first run, and uses the CLI to drive the eval workflow. No server or extra config needed.

Option 2: Claude Code plugin

If you primarily use Claude Code, install kensa as a plugin instead:

/plugin marketplace add satyaborg/kensa
/plugin install kensa

Same skills as the npx install, updated through the marketplace.

Provider extras

Install the extra that matches your stack for auto-instrumentation:

uv add "kensa[anthropic]"
uv add "kensa[openai]"
uv add "kensa[langchain]"
uv add "kensa[all]"

See Tracing & Instrumentation for passive trace collection and OTel backend setup.

Try an example

git clone https://github.com/satyaborg/kensa.git && cd kensa
uv sync --extra openai   # or --extra anthropic
cd examples/sql-analyst

Then, inside any coding agent (Claude Code, Codex, Cursor, OpenCode, Gemini CLI, …), say:

> evaluate this agent

No pre-written scenarios or setup needed. Kensa generates them from your code.

Add instrumentation if needed

The coding-agent workflow runs kensa doctor and helps add missing instrumentation. Manual setup mainly applies if you use kensa without the skills flow:

from kensa import instrument

instrument()

# Your existing imports below
from anthropic import Anthropic
# ...

instrument() must be called before your SDK imports. It configures OpenTelemetry, writes spans as JSONL, and auto-instruments any detected SDK. No-ops when KENSA_TRACE_DIR is unset, so it's safe to leave in production code.