From dspy-agent-skills
Orchestrates full DSPy 3.2.x pipeline: spec → program → metric → baseline → GEPA optimize → export → deploy. Delegates to companion skills for each step.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dspy-agent-skills:dspy-advanced-workflowWhen to use
User wants to build, optimize, and ship a new DSPy pipeline; says "full workflow" / "end to end" / "from scratch"; or needs the standard loop applied to a greenfield task.
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.
This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.
Rephrase the user's task in one sentence. Identify inputs, outputs, the quality axis that matters, and any constraints (latency, cost, tool access, context size). Pick predictor shape:
| Task shape | Predictor |
|---|---|
| Single-step structured I/O | dspy.Predict / dspy.ChainOfThought |
| Tool use / multi-step | dspy.ReAct |
| Code execution | dspy.ProgramOfThought |
| Long context / codebase | dspy.RLM → dspy-rlm-module |
Write the typed dspy.Signature + dspy.Module subclass per dspy-fundamentals. No hard-coded prompts. Keep predictors named so GEPA can target them.
Build trainset and separate valset as dspy.Example(...).with_inputs(...). For GEPA, maximize trainset size and keep validation just large enough to represent downstream behavior; held-out testset is reported on at the end only. See dspy-evaluation-harness.
Write rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None) returning dspy.Prediction(score=0..1, feedback="natural-language critique"). The feedback is load-bearing — it's what GEPA's reflection LM learns from. A dict with the same fields crashes dspy.Evaluate; only dspy.Prediction aggregates correctly. See dspy-evaluation-harness.
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric,
num_threads=8, display_progress=True,
provide_traceback=True,
save_as_json="runs/baseline.json")
baseline = evaluator(program)
print("Baseline:", baseline.score)
reflection_lm = dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000)
optimizer = dspy.GEPA(
metric=rich_metric,
auto="medium",
reflection_lm=reflection_lm,
candidate_selection_strategy="pareto",
track_stats=True,
track_best_outputs=True,
log_dir="./gepa_logs",
num_threads=8,
seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)
Run auto="light" first as a sanity check; move to auto="medium"/"heavy" for the final run. See dspy-gepa-optimizer.
If you need a deliberate multi-stage compile loop, DSPy 3.2.x also exposes dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) for chaining named optimizers after you have a clean baseline GEPA setup.
optimized.save("artifacts/program.json", save_program=False) # state, portable
# or for full deployment artifact:
optimized.save("artifacts/program_dir/", save_program=True)
Deploy:
dspy.load("artifacts/program_dir/") or reconstruct + .load("program.json").track_usage=True for cost/latency observability.mlflow.dspy.autolog()) or W&B in CI.evaluator against the saved program and fails CI below a threshold."""DSPy end-to-end pipeline — spec → optimize → deploy."""
import dspy
from pathlib import Path
# ----- 1–2. Spec & program (dspy-fundamentals) -----
class MyTask(dspy.Signature):
"""<one-line instruction from the spec>."""
input_field: str = dspy.InputField()
output_field: str = dspy.OutputField()
class MyProgram(dspy.Module):
def __init__(self):
super().__init__()
self.step = dspy.ChainOfThought(MyTask)
def forward(self, **kw):
return self.step(**kw)
# ----- 3. Data (dspy-evaluation-harness) -----
trainset = [...] # list[dspy.Example(...).with_inputs(...)]
valset = [...]
# ----- 4. Rich metric (dspy-evaluation-harness) -----
def rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
score = ... # compute 0..1
feedback = ... # detailed critique
return dspy.Prediction(score=score, feedback=feedback) # NOT a dict
# ----- 5. Baseline -----
dspy.configure(lm=dspy.LM("openai/gpt-4o"), track_usage=True)
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric, num_threads=8,
display_progress=True, provide_traceback=True,
save_as_json="runs/baseline.json")
program = MyProgram()
print("Baseline:", evaluator(program).score)
# ----- 6. GEPA optimize (dspy-gepa-optimizer) -----
optimizer = dspy.GEPA(
metric=rich_metric,
auto="medium",
reflection_lm=dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000),
candidate_selection_strategy="pareto",
track_stats=True, track_best_outputs=True,
log_dir="./gepa_logs", num_threads=8, seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)
# ----- 7. Export (dspy-fundamentals) -----
Path("artifacts").mkdir(exist_ok=True)
optimized.save("artifacts/program.json", save_program=False)
module._compiled = True before multi-stage re-compilation.npx claudepluginhub intertwine/dspy-agent-skills --plugin dspy-agent-skillsTeaches idiomatic DSPy 3.2.x patterns: typed Signatures, dspy.Module subclasses, Predict/ChainOfThought/ReAct/ProgramOfThought, save/load. Use when starting a new DSPy project or refactoring prompt-engineering code.
Optimizes DSPy programs using MIPROv2 with Bayesian optimization for joint instruction and few-shot demonstration tuning. Requires 200+ training examples.
Build type-safe LLM applications with DSPy.rb using signatures, modules, and tools. Use when implementing AI features, agents, or prompt optimization in Ruby.