From Sail
Use when an instrumented Voyage calls Sail inference (the OpenAI-compatible Responses / Chat Completions API) and you need each model call — the LLM span — attributed to the active agent/span in the dashboard. Covers automatic Voyage/span/agent header propagation, background vs synchronous mode, wrapping a raw OpenAI client, and the immutable first-association-wins response_id contract. Not for authoring the Voyage itself (see sail-voyage) — only the inference-call leg.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sail:sail-inference-with-voyageThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this pattern when a Voyage-instrumented process calls Sail inference
Use this pattern when a Voyage-instrumented process calls Sail inference
(Responses API or Chat Completions). Calls routed through the SDK's
sail.inference.* helpers automatically attach Voyage/span/agent headers
so model-call rows appear in the dashboard's Native Model Calls panel,
scoped to the right agent.
sail.inference.responses.create() /
sail.inference.chat.completions.create() from inside a
sail.voyage.run(...) (or create(...)) context.If your Voyage doesn't make inference calls, you don't need this skill.
Every inference call inside an active Voyage gets these headers attached:
| Header | Purpose |
|---|---|
X-Sail-Voyage-Id | Required. Identifies which Voyage owns the call. |
X-Sail-Voyage-Span-Id | Optional. Scopes the call to a specific span. |
X-Sail-Voyage-Agent-Id | Optional. Scopes the call to a specific named agent. |
Sail records the model-call association with the call's response_id as
the key.
response_id is immutable, first-association-wins. If you retry an
inference with the same response_id, the existing association stays
attached to whichever Voyage was active at first successful association —
even if the retry happens in a different Voyage.
import sail
@sail.agent("Reviewer")
@sail.span("draft-response")
def draft():
# Headers attached automatically because a Voyage is active; the call
# auto-attributes to this agent/span.
response = sail.inference.responses.create(
model="zai-org/GLM-5",
input="Summarize this diff in one paragraph: ...",
background=False,
timeout=120,
)
sail.voyage.event("response.drafted", payload={"response_id": response["id"]})
with sail.voyage.run(name="agent-with-inference", version=1):
draft()
Nothing special needed. The Voyage/agent/span context propagates through
sail.voyage.headers() → request headers → backend attribution.
No active span? The wrapper synthesizes one automatically (an
"auto-span", marked _auto and chipped in the cockpit), named after your
calling function — so wrapper calls never land as Missing span. Explicit
spans always win; SAIL_VOYAGE_AUTO_SPANS=0 disables synthesis. Raw
clients don't get synthesis (headers are request-time only) — wrap those
calls in a span yourself when scoping matters.
For a recurring production workflow, use the current series/version shape:
with sail.voyage.run(
name="deep-research",
version=3,
metadata={"topic": "Voyages public launch"},
):
...
sail.inference.responses.create(background=True) returns the response
before the model finishes. Useful for long-running calls when you want
to poll or for parallel fan-out. Caveats:
background=False; a background call's
terminal state can lag the Voyage's, which the dashboard reconciles after the
fact.If you can't switch to sail.inference.* (e.g., a third-party library uses
the OpenAI client directly), wrap the client once with
sail.voyage.wrap_openai() — every call then computes the attribution
headers at call time and un-spanned calls get the same synthesized
auto-spans as sail.inference.*:
from openai import OpenAI
import sail
cfg = sail.Config.from_env()
client = sail.voyage.wrap_openai(
OpenAI(api_key=cfg.api_key, base_url=f"{cfg.api_url.rstrip('/')}/v1")
)
with sail.voyage.run(name="raw-client", version=1) as voyage:
with voyage.agent("Reviewer"):
with voyage.span("call"):
response = client.responses.create(model="zai-org/GLM-5", input="...")
wrap_openai wraps responses.create, responses.retrieve, and
chat.completions.create in place — every holder of that client object
sees attributed calls, so wrap a dedicated client if some callers must stay
unattributed. It is idempotent, supports AsyncOpenAI (the auto-span
covers the awaited request), and follows the process-global current Voyage
per call (pass voyage= to pin one).
For any non-OpenAI-style HTTP client, attach
sail.voyage.headers() yourself — it carries the full attribution context
(voyage id plus the span and agent active at the moment you call it).
Compute it per call, never once at client construction:
OpenAI(default_headers=sail.voyage.headers()) freezes whatever context
was active at construction onto every later call — constructed inside a
span, that pins the wrong span to your whole run. (wrap_openai makes this
mistake unrepresentable; prefer it whenever the client is OpenAI-style.)
sail.inference.responses.create() outside a Voyage. It
still works, but it does not create any Voyage dashboard model-call
association. If you intended Voyage attribution, you forgot to open a
Voyage (sail.voyage.run(...)).sail.voyage.headers() into default_headers at client
construction. That snapshots one span/agent context onto every later
call. Pass extra_headers=sail.voyage.headers() per call instead.response_id across Voyages. The first attribution wins
forever; subsequent calls are effectively orphaned from a Voyage
perspective. Don't try to "fix" an attribution mistake by re-calling
with the same id — create a new response.level="warn" event — instead of letting one malformed reply crash the
run, and only report a successful parse when real values were found.Dashboard checklist:
Scoped > 0, Unscoped = 0,
Missing span = 0.responses · completed, not in_progress). If they're stuck
in_progress, see
sail-voyage-debugging section 4.@sail.agent(...) and @sail.span(...), or
use the SDK helper / wrap_openai() path so headers are computed at request
time.npx claudepluginhub sailresearchco/sail-skills --plugin sailCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.