CLI Reference

All kensa commands and their options.

kensa works standalone without a coding agent. Python 3.10+.

kensa init

Scaffold .kensa/ with an example scenario and agent.

kensa init           # create .kensa/ with example files
kensa init --force   # overwrite the existing example
kensa init --blank   # scaffold directories only, skip the example

Detects your API keys and scaffolds accordingly:

  • ANTHROPIC_API_KEY set → Anthropic agent using claude-haiku-4-5
  • OPENAI_API_KEY set → OpenAI agent using gpt-5.4-mini
  • Neither → stub agent (no API call)

Creates .kensa/agents/example.py and .kensa/scenarios/example.yaml. Run kensa eval immediately after to verify your setup. Use --blank if you want an empty .kensa/ and plan to write scenarios yourself or via a coding agent.

kensa eval

Run + judge + report in one shot.

kensa eval                       # all scenarios
kensa eval -s classify_ticket      # specific scenario (repeatable)
kensa eval --format markdown     # CI-friendly output
kensa eval --timeout 600         # 10-minute per-scenario timeout
kensa eval --model claude-sonnet-4-6  # override judge model
FlagDefaultDescription
--scenario-dir.kensa/scenariosWhere scenario YAMLs live
-s, --scenario-idallRun a specific scenario (repeatable)
--timeout300Per-scenario timeout in seconds
--modelresolvedJudge model override
--formatterminalterminal, markdown, or json

kensa run

Run scenarios and capture traces. No judging.

kensa run                              # all scenarios
kensa run -s classify_ticket             # specific scenario
kensa run --dry-run                    # list what would run, don't execute
kensa run --format json                # machine-readable manifest
FlagDefaultDescription
--scenario-dir.kensa/scenariosWhere scenario YAMLs live
-s, --scenario-idallRun a specific scenario (repeatable)
--timeout300Per-scenario timeout in seconds
--dry-runoffList scenarios that would run, without executing
--formattexttext or json

Each scenario runs in its own subprocess with KENSA_TRACE_DIR set. Traces are written as JSONL to .kensa/traces/.

kensa judge

Score the latest run with checks + LLM judge.

kensa judge                            # default model, latest run
kensa judge --model claude-haiku-4-5   # override model
kensa judge --run-id abc123            # specific run
kensa judge --format json              # machine-readable
FlagDefaultDescription
--run-idlatestWhich run to judge
--modelresolvedJudge model override
--formattexttext or json

Checks run first. If all pass, the LLM judge evaluates criteria. If any check fails, the judge is skipped (fail-fast).

kensa report

Generate reports from the latest run.

kensa report                          # rich terminal output
kensa report --format markdown        # CI-friendly
kensa report --format json            # machine-readable
kensa report --format html            # standalone HTML file
kensa report -o results.md --format markdown  # write to file
kensa report --run-id abc123 -v       # full reasoning for a past run
FlagDefaultDescription
--run-idlatestWhich run to render
--formatterminalterminal, markdown, json, or html
-o, --outputstdoutWrite to file instead of stdout
-v, --verboseoffShow full check details and judge reasoning

kensa report always writes a standalone HTML report to .kensa/reports/ as a side effect, regardless of --format.

kensa analyze

Surface cost, latency, and anomalies across runs.

kensa analyze                         # text summary
kensa analyze --format json           # machine-readable
kensa analyze -o analysis.json --format json
FlagDefaultDescription
--trace-dir.kensa/tracesWhere trace JSONL files live
--formattexttext or json
-o, --outputstdoutWrite to file instead of stdout

Outputs per-scenario stats — cost percentiles, latency percentiles, token usage, tool frequencies — and flags anomalies (cost outliers, latency outliers, repeated tool calls, high turn counts).

kensa doctor

Verify your setup is ready to run.

kensa doctor

Checks:

  • Python version (3.10+)
  • Package manager detection (uv, pipenv, pip)
  • .kensa/scenarios/ directory exists
  • .env file loaded
  • API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY)
  • Trace directory writable
  • SDK instrumentation (scans agent scripts for openai/anthropic/langchain imports, verifies instrumentor packages)
  • Judge provider instantiation

Environment variables

VariablePurpose
KENSA_TRACE_DIRDirectory for JSONL span output. Set automatically during kensa run.
KENSA_JUDGE_MODELOverride the default judge model.
ANTHROPIC_API_KEYAnthropic API key for judge and/or agent.
OPENAI_API_KEYOpenAI API key for judge and/or agent.