Agentura

Eval CI/CD for AI agents

Agentura runs your eval suite automatically on every pull request and tells you exactly what got better, worse, or slower before you merge. Works with any AI agent that has an HTTP endpoint. No SDK required.

What Agentura does

Runs evals locally with no signup required (npx agentura@latest run --local)
Runs golden dataset, LLM judge, and performance evals automatically on every PR
Posts results as a GitHub Check Run and PR comment
Compares scores against the main branch baseline so you always know if a PR made things worse
Works with any HTTP endpoint — Python, Node, Go, anything

Why teams use it

PROMPT CHANGE

A prompt tweak made answers shorter, but your support bot quietly stopped including critical policy details.

MODEL UPGRADE

A model swap looked fine in spot checks, but edge-case quality dropped and complex requests started failing.

CONTEXT CHANGE

A new context field changed tone and behavior, causing customer-facing responses to drift from expectations.

Get started

Quick Start →

Zero to first green check in 5 minutes

CLI Reference →

Install and use the Agentura CLI