Skill

dspy-advanced-workflow

Orchestrates full DSPy 3.2.x pipeline: spec → program → metric → baseline → GEPA optimize → export → deploy. Delegates to companion skills for each step.

Python

OpenAI

ai-ml

Popularity

Stars

239

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dspy-agent-skills:dspy-advanced-workflow

User invocable

Model invocable

Inline context

Default effort

When to use

User wants to build, optimize, and ship a new DSPy pipeline; says "full workflow" / "end to end" / "from scratch"; or needs the standard loop applied to a greenfield task.

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.

Supporting Files

example_pipeline.pyreference.md

SKILL.md

150 lines · ~1.6k tokens

Stats

LanguagePython

Stars239

Forks22

MaintenanceExcellent

Last CommitMay 25, 2026

Actions

View Source View Plugin View on GitHub View README

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

Rephrase the user's task in one sentence. Identify inputs, outputs, the quality axis that matters, and any constraints (latency, cost, tool access, context size). Pick predictor shape:

Task shape	Predictor
Single-step structured I/O	`dspy.Predict` / `dspy.ChainOfThought`
Tool use / multi-step	`dspy.ReAct`
Code execution	`dspy.ProgramOfThought`
Long context / codebase	`dspy.RLM` → `dspy-rlm-module`

2. Program

Write the typed dspy.Signature + dspy.Module subclass per dspy-fundamentals. No hard-coded prompts. Keep predictors named so GEPA can target them.

3. Data

Build trainset and separate valset as dspy.Example(...).with_inputs(...). For GEPA, maximize trainset size and keep validation just large enough to represent downstream behavior; held-out testset is reported on at the end only. See dspy-evaluation-harness.

4. Rich metric

Write rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None) returning dspy.Prediction(score=0..1, feedback="natural-language critique"). The feedback is load-bearing — it's what GEPA's reflection LM learns from. A dict with the same fields crashes dspy.Evaluate; only dspy.Prediction aggregates correctly. See dspy-evaluation-harness.

5. Baseline

evaluator = dspy.Evaluate(devset=valset, metric=rich_metric,
                          num_threads=8, display_progress=True,
                          provide_traceback=True,
                          save_as_json="runs/baseline.json")
baseline = evaluator(program)
print("Baseline:", baseline.score)

6. GEPA optimize

reflection_lm = dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000)
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=reflection_lm,
    candidate_selection_strategy="pareto",
    track_stats=True,
    track_best_outputs=True,
    log_dir="./gepa_logs",
    num_threads=8,
    seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

Run auto="light" first as a sanity check; move to auto="medium"/"heavy" for the final run. See dspy-gepa-optimizer.

If you need a deliberate multi-stage compile loop, DSPy 3.2.x also exposes dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) for chaining named optimizers after you have a clean baseline GEPA setup.

7. Export & deploy

optimized.save("artifacts/program.json", save_program=False)     # state, portable
# or for full deployment artifact:
optimized.save("artifacts/program_dir/", save_program=True)

Deploy:

Load with dspy.load("artifacts/program_dir/") or reconstruct + .load("program.json").
Wrap in FastAPI/CLI.
Enable track_usage=True for cost/latency observability.
Log with MLflow (mlflow.dspy.autolog()) or W&B in CI.
Keep an offline regression test that runs the evaluator against the saved program and fails CI below a threshold.

Full orchestration template

"""DSPy end-to-end pipeline — spec → optimize → deploy."""

import dspy
from pathlib import Path

# ----- 1–2. Spec & program (dspy-fundamentals) -----
class MyTask(dspy.Signature):
    """<one-line instruction from the spec>."""
    input_field: str = dspy.InputField()
    output_field: str = dspy.OutputField()

class MyProgram(dspy.Module):
    def __init__(self):
        super().__init__()
        self.step = dspy.ChainOfThought(MyTask)
    def forward(self, **kw):
        return self.step(**kw)

# ----- 3. Data (dspy-evaluation-harness) -----
trainset = [...]   # list[dspy.Example(...).with_inputs(...)]
valset   = [...]

# ----- 4. Rich metric (dspy-evaluation-harness) -----
def rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    score = ...          # compute 0..1
    feedback = ...       # detailed critique
    return dspy.Prediction(score=score, feedback=feedback)  # NOT a dict

# ----- 5. Baseline -----
dspy.configure(lm=dspy.LM("openai/gpt-4o"), track_usage=True)
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric, num_threads=8,
                          display_progress=True, provide_traceback=True,
                          save_as_json="runs/baseline.json")
program = MyProgram()
print("Baseline:", evaluator(program).score)

# ----- 6. GEPA optimize (dspy-gepa-optimizer) -----
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000),
    candidate_selection_strategy="pareto",
    track_stats=True, track_best_outputs=True,
    log_dir="./gepa_logs", num_threads=8, seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

# ----- 7. Export (dspy-fundamentals) -----
Path("artifacts").mkdir(exist_ok=True)
optimized.save("artifacts/program.json", save_program=False)

Guardrails

Never skip step 3 (rich metric). GEPA without feedback ≈ random search.
Always baseline before optimizing — no baseline, no claim.
Save both pre- and post-optimization metrics to JSON for auditability.
If held-out test score drops post-optimization, your valset is too narrow. Expand valset and re-run.
Freeze optimized program with module._compiled = True before multi-stage re-compilation.

dspy-advanced-workflow

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

dspy-advanced-workflow

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

Similar Skills

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

Similar Skills