Agentura Docs

agentura generate

Generate eval test cases using AI — the fastest way to get started

Why this exists

Writing eval test cases manually is time-consuming. agentura generate uses an LLM to create a realistic eval dataset tailored to your agent in seconds.

No other eval tool does this. Most require you to write test cases from scratch before you've seen the product work. With agentura generate, you go from zero to a full eval suite in under 2 minutes.

Usage

bash
agentura generate

What happens

  1. You describe your agent in one sentence
  2. CLI optionally probes your live agent endpoint with 3 test messages
  3. An LLM generates 15 realistic test cases as JSONL
  4. An LLM generates a quality rubric tailored to your agent
  5. Files are written to ./evals/
  6. agentura.yaml is updated with all 3 eval strategies

What is probing?

When you answer "y" to probe your agent, Agentura sends 3 generic messages to your live endpoint and observes real responses. Those responses are included in the LLM prompt, resulting in test cases that match your agent's behavior instead of generic examples.

Generated files

evals/
├── accuracy.jsonl     ← 15 test cases (golden_dataset)
├── quality.jsonl      ← 15 test cases (llm_judge)
└── quality-rubric.md  ← scoring rubric for LLM judge

Flags

FlagDescriptionDefault
--description <text>Skip interactive description prompt
--no-probeSkip live agent probingfalse
--count <n>Number of generated test cases15

Groq API key

agentura generate uses Groq's free LLaMA API to generate test cases. You need a free Groq API key:

  1. Go to console.groq.com
  2. Create an account (free, no credit card)
  3. Generate an API key
  4. Set it as an environment variable:
bash
export GROQ_API_KEY=your_key_here

Or just run agentura generate — it will prompt you for the key and save it to ~/.agentura/config.json.

Requirements

  • agentura.yaml must exist (run agentura init first)
  • GROQ_API_KEY (free at console.groq.com)

Next steps

Editing AI-generated evals →