Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

Checks run before the LLM judge to save cost. If any check fails, the judge is skipped (fail-fast). A scenario passes only when all checks pass AND the judge passes.

Check types

CheckWhat it tests
output_containsOutput includes a string or pattern
output_matchesOutput matches a regex
tools_calledAll listed tools were invoked (set membership, order-free)
tools_not_calledNone of the listed tools were invoked
tool_orderTools called in this temporal sequence (use only when order is load-bearing)
trajectoryMatch the expected tool-call path, optionally with accuracy threshold and inline budgets
max_costTotal cost under threshold
max_turnsLLM call count under limit
max_durationExecution time under limit
no_repeat_callsNo duplicate tool calls with identical arguments

Examples

Output checks

checks:
  # String containment (case-insensitive by default)
  - type: output_contains
    params: { value: "confirmation number" }

  # Case-sensitive containment
  - type: output_contains
    params: { value: "OK", case_sensitive: true }

  # Regex match
  - type: output_matches
    params: { pattern: "\\d{6,}" }
    description: Output contains a 6+ digit number

Tool checks

checks:
  # Tools were called (set membership, order-free)
  - type: tools_called
    params: { tools: [search_flights] }

  # Tools were NOT called (safety check)
  - type: tools_not_called
    params: { tools: [delete_account] }
    description: Agent must never call delete

  # Tools called in order
  - type: tool_order
    params: { order: [search_flights, book_flight] }
    description: Must search before booking

  # Canonical tool-call trajectory with optional budgets
  - type: trajectory
    params:
      steps:
        - tool: search_flights
        - tool: book_flight
      ordering: exact
      args: ignore
      min_accuracy: 1.0
      max_steps: 2
      max_tokens: 2000
      max_duration_seconds: 30
    description: Search, then book, within budget

  # No duplicate calls (trace-wide; flags any tool called twice with the same args)
  - type: no_repeat_calls
    description: Agent should not redo identical work
trajectory is the higher-level path check for tool correctness. It emits trajectory_accuracy and step_efficiency metrics in reports, and in V1 it is limited to one trajectory check per scenario.

Resource checks

checks:
  # Cost cap
  - type: max_cost
    params: { max_usd: 0.10 }
    description: Under 10 cents

  # Turn limit
  - type: max_turns
    params: { max: 5 }
    description: Complete in 5 LLM calls

  # Time limit
  - type: max_duration
    params: { max_seconds: 30 }
    description: Under 30 seconds

Adding a check

Checks use a registry pattern. To add a new check type:
  1. Add a value to CheckType in models.py
  2. Write a check function in checks.py
  3. Register it in CHECK_REGISTRY
# checks.py
def check_my_check(spans: list[Span], params: dict[str, Any]) -> CheckResult:
    # Your logic here
    return CheckResult(check="my_check", passed=True, detail="...")


CHECK_REGISTRY: dict[CheckType, CheckFn] = {
    # ...existing checks...
    CheckType.MY_CHECK: check_my_check,
}
No call-site changes needed. The registry handles dispatch.
Last modified on April 23, 2026