Pytest mode is for Python agents that need application test wiring: fixtures, auth context, mocked services, database state, async clients, and dependency overrides. Pytest owns that wiring. Kensa owns scenario expansion, trace capture, checks, optional judging, and aggregate verdicts. Install the extra:Documentation Index
Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
Minimal driver
Scenario
cases resolves relative to the scenario YAML. Relative marker paths resolve from pytest’s rootdir, so .kensa/scenarios/... is stable from any test file.
Case fixture
Kensa providescase and the alias kensa_case.
| Attribute or method | Description |
|---|---|
case.row | Full JSONL row, or a synthetic row for literal-input scenarios |
case.input | Literal scenario input, or the selected JSONL field |
case.messages | Conversation accessor when input or row messages is a list |
case.output(value) | Records the agent output for checks and judging |
case.output(...) exactly once. Kensa rejects missing outputs, duplicate outputs, and non-JSON-serializable values with clear pytest failures.
Scenario validity
Every pytest scenario needs at least one evaluator:checkscriteria- both
checksandcriteria
checks nor criteria is invalid. Criteria-only scenarios are
allowed, but Kensa warns that judge-only evals are higher variance than deterministic
checks.
--kensa-no-judge disables criteria judging. If a selected scenario has checks, those
checks still run and the skipped judge is reported. If a selected scenario has criteria
but no checks, Kensa errors because no evaluator remains.
Trials and verdicts
Each case expands into one pytest item per trial:| Aggregate verdict | Meaning |
|---|---|
pass | Every selected trial passed |
fail | Every completed selected trial failed |
flaky | At least one trial passed and at least one failed |
error | A test, fixture, trace, judge, or setup error occurred |
fail, flaky, and error fail the pytest session. trials: 1 is reported as smoke; trials > 1 is reported as measured evidence.
Commands
Plain pytest is a valid gate:kensa eval --pytest when you want full Kensa run/result artifacts:
Artifacts and traces
Plain pytest writes lightweight per-trial trace files under.kensa/traces/pytest/
so failure summaries can point to the evidence for each trial. Plain pytest does not
write full .kensa/runs, .kensa/results, or .kensa/reports artifacts.
kensa eval --pytest shells out to pytest and enables the full Kensa run/result
artifact files for CI upload or later reporting.
If your test process already called kensa.instrument(), pytest mode attaches its own
trace exporter without replacing the existing one. Spans for Kensa trials may therefore
be written both to .kensa/traces/pytest/ and to your preexisting trace sink.