Question 1

How does Kensa compare to LangSmith, Braintrust, DeepEval, or Promptfoo?

Accepted Answer

**Kensa is OpenTelemetry-native and generates evals from your real traces as plain pytest your repo owns**, so behavior regressions fail CI with no vendor lock-in. It complements trace platforms: import OTLP or Langfuse evidence.

Question 2

Do I need to modify my agent code?

Accepted Answer

**Almost none.** Call `kensa.instrument()` once at startup, wrap your model call with `kensa.record_llm_call(...)`, and point the `kensa_run` fixture at your agent. No decorators, no `sitecustomize`, no per-file setup.

Question 3

Why pytest?

Accepted Answer

**It already owns execution, fixtures, and CI in every Python repo.** Kensa adds the eval layer on top. No new runner to learn, no parallel CI to maintain.

Question 4

Why not just ask my agent to write pytest tests?

Accepted Answer

**Evidence.** An agent writing tests from scratch guesses what should happen; Kensa mines your real traces, so evals assert what your agent actually did. It adds the primitives raw pytest lacks: `kensa_run`, `kensa_trace`, `judge()`, and `trials` to rerun nondeterministic cases and catch behavior that only sometimes breaks.

Question 5

Does Kensa work with non-Python agents?

Accepted Answer

**Yes.** Your `kensa_run` fixture can drive an agent in any language (subprocess or HTTP). For trace evidence, instrument Python with `kensa.instrument()`, or import traces from any runtime with `kensa import --from` (JSON, JSONL, OTLP, or Langfuse).

Question 6

How long does it take to set up?

Accepted Answer

**About 5 minutes.** Install Kensa, run `kensa init`, let your coding agent wire `kensa_run` through the generated `kensa-evals` skill, then run `kensa doctor` and `kensa eval`.

Question 7

How does the judge work?

Accepted Answer

`judge(output, criteria, input=, trace=)` calls your configured provider and returns a pass/fail with reasoning you assert on. Set `KENSA_JUDGE_PROVIDER` and `KENSA_JUDGE_MODEL`; it defaults to `gpt-5.4-mini` through Any LLM.

Question 8

How does Kensa handle sensitive trace data?

Accepted Answer

**Traces are redacted on import and secrets never touch disk.** You inspect redacted TraceView evidence, and API keys stay in env vars or a dotenv, never written to connection metadata. For strict value redaction, add the `kensa[redaction]` extra and its spaCy model; without it, imports fall back to key-only redaction.

Question 9

Can I run Kensa in CI?

Accepted Answer

**Yes.** `kensa init` writes a GitHub Actions workflow, and `kensa eval --json` emits a stable envelope for tooling. Add judge provider keys as secrets for LLM-judged criteria.

Question 10

Is Kensa free?

Accepted Answer

**Yes, Apache 2.0 licensed.** Your only cost is LLM calls, and only if you opt into judge criteria.

Tool	What it is	How Kensa differs
LangSmith	Hosted observability and eval platform; datasets and results live in the service, account required.	Evals are pytest your repo owns, not datasets locked inside their platform.
Braintrust	Proprietary eval SaaS; evals run through its `Eval()` SDK and land as experiments in its UI.	Your tests are portable pytest you own, not results locked in a vendor format.
DeepEval	Open-source pytest framework with 50+ prebuilt, mostly LLM-judged metrics.	Kensa generates evals from your traces, judging only what deterministic and trace checks cannot.
Promptfoo	Open-source, language-agnostic YAML config run through its own CLI.	Kensa is plain pytest, with no config language or separate runner.

Regression tests for agent behavior.

How it works

Bring in traces

Generate

Judge

Gate

What it is (not)

FAQ