Agentura
Eval CI/CD for AI agents
Agentura runs your eval suite automatically on every pull request and tells you exactly what got better, worse, or slower before you merge. Works with any AI agent that has an HTTP endpoint. No SDK required.
What Agentura does
- Runs evals locally with no signup required (npx agentura@latest run --local)
- Runs golden dataset, LLM judge, and performance evals automatically on every PR
- Posts results as a GitHub Check Run and PR comment
- Compares scores against the main branch baseline so you always know if a PR made things worse
- Works with any HTTP endpoint — Python, Node, Go, anything
Why teams use it
PROMPT CHANGE
A prompt tweak made answers shorter, but your support bot quietly stopped including critical policy details.
MODEL UPGRADE
A model swap looked fine in spot checks, but edge-case quality dropped and complex requests started failing.
CONTEXT CHANGE
A new context field changed tone and behavior, causing customer-facing responses to drift from expectations.