Setup: Eval Data and Grading
Before you runfastskill optimize, you need two data files:
- Suite CSV — the eval cases the optimizer learns from.
- Checks TOML — the grading rules that score each response. (Optional but strongly recommended.)
1. Create the suite CSV
The suite is a CSV file with one row per eval case. Required columns:| Column | Required | Description |
|---|---|---|
id | Yes | Unique stable identifier for the case. Used in progress reporting and step artifacts. |
prompt | Yes | The user message sent to the target agent. |
should_trigger | Yes | true if the skill should activate on this prompt, false if not. |
split | No | train or test. Defaults to train if absent. |
tags | No | Comma-separated tags. You can encode split as split:train here instead. |
suite.csv:
Train vs test split
The optimizer trains only ontrain rows. The test split is held out and used for gating final epoch updates. This prevents the optimizer from over-fitting to the exact prompts it learned from.
Rule: You must have at least one train case (the selection set). If the suite has zero training cases, fastskill optimize run will exit with error SKILLOPT_NO_SELECTION_CASES.
Tips for writing cases
- Keep prompts realistic — use the same phrasing a real user would.
- Include both positive cases (
should_trigger: true) and negative cases (should_trigger: false). A mix prevents the optimizer from making the skill trigger on everything. - Aim for 20–50 training cases for a focused skill, more for broader skills.
- Assign 10–20% of cases to
testfor a meaningful hold-out gate.
2. Create the checks TOML (grading)
Checks define what a “pass” looks like for each eval response. Without checks, the optimizer uses onlyshould_trigger match as the signal, which is a weak grading signal.
Example checks.toml:
| Type | Description |
|---|---|
skill_triggered | Passes if the skill triggered on a should_trigger: true case (or correctly didn’t trigger on a false case). This is the primary signal. |
llm_rubric | Asks a judge model the prompt question and parses a yes/no answer. Use for quality dimensions beyond trigger accuracy. |
3. Directory layout
We recommend this layout to keep things organized:Next: configure the run
Once you have your suite and checks files ready, write theoptimize.toml config and start the run.