agentura generate

Generate eval test cases using AI — the fastest way to get started

Why this exists

Writing eval test cases manually is time-consuming. agentura generate uses an LLM to create a realistic eval dataset tailored to your agent in seconds.

No other eval tool does this. Most require you to write test cases from scratch before you've seen the product work. With agentura generate, you go from zero to a full eval suite in under 2 minutes.

Usage

bash

agentura generate

What happens

You describe your agent in one sentence
CLI optionally probes your live agent endpoint with 3 test messages
An LLM generates 15 realistic test cases as JSONL
An LLM generates a quality rubric tailored to your agent
Files are written to ./evals/
agentura.yaml is updated with all 3 eval strategies

What is probing?

When you answer "y" to probe your agent, Agentura sends 3 generic messages to your live endpoint and observes real responses. Those responses are included in the LLM prompt, resulting in test cases that match your agent's behavior instead of generic examples.

Generated files

evals/
├── accuracy.jsonl     ← 15 test cases (golden_dataset)
├── quality.jsonl      ← 15 test cases (llm_judge)
└── quality-rubric.md  ← scoring rubric for LLM judge

Flags

Flag	Description	Default
--description <text>	Skip interactive description prompt	—
--no-probe	Skip live agent probing	false
--count <n>	Number of generated test cases	15

Groq API key

agentura generate uses Groq's free LLaMA API to generate test cases. You need a free Groq API key:

Go to console.groq.com
Create an account (free, no credit card)
Generate an API key
Set it as an environment variable:

bash

export GROQ_API_KEY=your_key_here

Or just run agentura generate — it will prompt you for the key and save it to ~/.agentura/config.json.

Requirements

agentura.yaml must exist (run agentura init first)
GROQ_API_KEY (free at console.groq.com)

Next steps

Editing AI-generated evals →