Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

Start with deterministic checks in CI even if you do not want to spend judge tokens on every push. Add judge keys only when you want natural-language gating.

GitHub Actions

.github/workflows/eval.yml
# .github/workflows/eval.yml
name: Evals
on: [push]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
      - name: Install
        run: uv sync --extra anthropic
      - name: Run evals
        run: uv run kensa eval --format markdown
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
Exit codes: 0 = pipeline ran end-to-end, 1 = kensa itself errored (config, missing keys, scenario load failure). Failed scenarios do not change the exit code, gate on the report output instead.

What needs API keys

Deterministic checks need no API keys. They run entirely locally. Judge criteria need an API key for the LLM provider. If any scenario sets criteria or judge and no API key is available, kensa eval exits 1. Either add a provider key as a secret, or remove judge criteria from the scenarios you run in CI. This means you can run cost, latency, tool ordering, and output matching checks in CI for free, and add LLM judging only when keys are wired up.

Output formats

FormatFlagUse case
Terminal(default)Local development
Markdown--format markdownPR comments, CI logs
JSON--format jsonMachine-readable, dashboards
kensa eval also writes a standalone HTML report to .kensa/reports/{run_id}.html automatically on every run; upload it as a CI artifact for a shareable view.

PR comment integration

Pipe markdown output to a PR comment:
PR comment step
- name: Run evals
  run: uv run kensa eval --format markdown > eval-report.md

- name: Comment on PR
  uses: marocchino/sticky-pull-request-comment@v2
  with:
    path: eval-report.md
Last modified on May 1, 2026