agentura generate
Generate eval test cases using AI — the fastest way to get started
Why this exists
Writing eval test cases manually is time-consuming. agentura generate uses an LLM to create a realistic eval dataset tailored to your agent in seconds.
No other eval tool does this. Most require you to write test cases from scratch before you've seen the product work. With agentura generate, you go from zero to a full eval suite in under 2 minutes.
Usage
agentura generateWhat happens
- You describe your agent in one sentence
- CLI optionally probes your live agent endpoint with 3 test messages
- An LLM generates 15 realistic test cases as JSONL
- An LLM generates a quality rubric tailored to your agent
- Files are written to ./evals/
- agentura.yaml is updated with all 3 eval strategies
What is probing?
When you answer "y" to probe your agent, Agentura sends 3 generic messages to your live endpoint and observes real responses. Those responses are included in the LLM prompt, resulting in test cases that match your agent's behavior instead of generic examples.
Generated files
evals/
├── accuracy.jsonl ← 15 test cases (golden_dataset)
├── quality.jsonl ← 15 test cases (llm_judge)
└── quality-rubric.md ← scoring rubric for LLM judgeFlags
| Flag | Description | Default |
|---|---|---|
| --description <text> | Skip interactive description prompt | — |
| --no-probe | Skip live agent probing | false |
| --count <n> | Number of generated test cases | 15 |
Groq API key
agentura generate uses Groq's free LLaMA API to generate test cases. You need a free Groq API key:
- Go to console.groq.com
- Create an account (free, no credit card)
- Generate an API key
- Set it as an environment variable:
export GROQ_API_KEY=your_key_hereOr just run agentura generate — it will prompt you for the key and save it to ~/.agentura/config.json.
Requirements
- agentura.yaml must exist (run agentura init first)
- GROQ_API_KEY (free at console.groq.com)