Quick Start

Get your first eval running in under 5 minutes. No signup required.

Step 1 — Run locally first

bash

npx agentura@latest init
npx agentura@latest run --local

init creates an agentura.yaml config and a starter eval dataset in your project. run --local runs your evals on your machine — no GitHub App, no cloud, no login required.

Step 2 — Point it at your agent

Open agentura.yaml and update the endpoint:

yaml

version: 1
agent:
  type: http
  endpoint: https://your-agent.example.com/invoke
  timeout_ms: 10000

evals:
  - name: accuracy
    type: golden_dataset
    dataset: ./evals/accuracy.jsonl
    scorer: fuzzy_match
    threshold: 0.8

ci:
  block_on_regression: false
  compare_to: main
  post_comment: true

Your agent needs to accept POST requests with {"input": "..."} and return {"output": "..."}. No SDK required.

Step 3 — Create your eval dataset

jsonl

{"input": "what is the return policy", "expected": "30 days"}
{"input": "what is the capital of France", "expected": "Paris"}
{"input": "what color is the sky", "expected": "blue"}

Save this as evals/accuracy.jsonl.

Step 4 — Run and see results

bash

npx agentura@latest run --local

You'll see pass/fail per case, scores against your baseline, and any regressions flagged.

Step 5 (optional) — Add to CI with the GitHub App

Once local evals are working, install the GitHub App to run evals automatically on every pull request:

Install Agentura GitHub App →

After install, push agentura.yaml to your repo and open a PR. Results appear as a GitHub Check Run and PR comment within 30 seconds.

Next steps

agentura.yaml reference →

Eval strategies →

CLI reference →