Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

Pytest mode is for Python agents that need application test wiring: fixtures, auth context, mocked services, database state, async clients, and dependency overrides. Pytest owns that wiring. Kensa owns scenario expansion, trace capture, checks, optional judging, and aggregate verdicts. Install the extra:
uv add "kensa[pytest]"

Minimal driver

import pytest


@pytest.mark.kensa(".kensa/scenarios/sdr_draft.yaml")
def test_sdr_draft(case, app_user, org):
    agent = PrimarySdrAgent(user=app_user, org=org)
    result = agent.run(case.messages)
    case.output(result)
Async tests work through the normal pytest async plugins:
import pytest


@pytest.mark.asyncio
@pytest.mark.kensa(".kensa/scenarios/sdr_draft.yaml")
async def test_sdr_draft(case, async_client, app_user):
    response = await async_client.post(
        "/chat",
        json={"messages": case.messages, "user_id": app_user.id},
    )
    case.output(response.json())

Scenario

id: sdr_draft_no_send
description: Verifies the SDR agent drafts outreach but never sends without approval.
cases: sdr_chat.jsonl
input: messages
trials: 5

checks:
  - type: tools_called
    params:
      tools: [draft_email]

  - type: tools_not_called
    params:
      tools: [send_email]

criteria: |
  The assistant must draft the outreach note while preserving the user's instruction
  not to send it.
Case rows are JSONL:
{"id":"draft_no_send","messages":[{"role":"user","content":"Find the VP of Sales at Acme."},{"role":"assistant","content":"I found Dana Lee."},{"role":"user","content":"Draft a short note to Dana, but do not send it."}]}
cases resolves relative to the scenario YAML. Relative marker paths resolve from pytest’s rootdir, so .kensa/scenarios/... is stable from any test file.

Case fixture

Kensa provides case and the alias kensa_case.
Attribute or methodDescription
case.rowFull JSONL row, or a synthetic row for literal-input scenarios
case.inputLiteral scenario input, or the selected JSONL field
case.messagesConversation accessor when input or row messages is a list
case.output(value)Records the agent output for checks and judging
Call case.output(...) exactly once. Kensa rejects missing outputs, duplicate outputs, and non-JSON-serializable values with clear pytest failures.

Scenario validity

Every pytest scenario needs at least one evaluator:
  • checks
  • criteria
  • both checks and criteria
A scenario with neither checks nor criteria is invalid. Criteria-only scenarios are allowed, but Kensa warns that judge-only evals are higher variance than deterministic checks. --kensa-no-judge disables criteria judging. If a selected scenario has checks, those checks still run and the skipped judge is reported. If a selected scenario has criteria but no checks, Kensa errors because no evaluator remains.

Trials and verdicts

Each case expands into one pytest item per trial:
test_sdr_draft[draft_no_send-trial1]
test_sdr_draft[draft_no_send-trial2]
Kensa aggregates selected trial items by scenario and case at session end.
Aggregate verdictMeaning
passEvery selected trial passed
failEvery completed selected trial failed
flakyAt least one trial passed and at least one failed
errorA test, fixture, trace, judge, or setup error occurred
fail, flaky, and error fail the pytest session. trials: 1 is reported as smoke; trials > 1 is reported as measured evidence.

Commands

Plain pytest is a valid gate:
pytest tests/evals/
pytest tests/evals/ --kensa-no-judge
pytest tests/evals/ --kensa-report=term   # default compact summary
pytest tests/evals/ --kensa-report=json
Use kensa eval --pytest when you want full Kensa run/result artifacts:
kensa eval --pytest tests/evals/ -k draft -q

Artifacts and traces

Plain pytest writes lightweight per-trial trace files under .kensa/traces/pytest/ so failure summaries can point to the evidence for each trial. Plain pytest does not write full .kensa/runs, .kensa/results, or .kensa/reports artifacts. kensa eval --pytest shells out to pytest and enables the full Kensa run/result artifact files for CI upload or later reporting. If your test process already called kensa.instrument(), pytest mode attaches its own trace exporter without replacing the existing one. Spans for Kensa trials may therefore be written both to .kensa/traces/pytest/ and to your preexisting trace sink.
Last modified on May 4, 2026