From langsmith
Use when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management.
How this skill is triggered — by the user, by Claude, or both
Slash command
/langsmith:langsmith-datasetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # Required
LANGSMITH_PROJECT=your-project-name # Check this for relevant traces
Python: pip install langsmith
JavaScript: npm install langsmith
CLI: curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
langsmith dataset list
langsmith dataset get <name-or-id>
langsmith dataset create --name <name>
langsmith dataset delete <name-or-id>
langsmith dataset export <name-or-id> <output-file>
langsmith dataset upload <file> --name <name>
langsmith example list --dataset <name>
langsmith example create --dataset <name> --inputs <json>
langsmith example delete <example-id>
langsmith experiment list --dataset <name>
langsmith experiment get <name>
# 1. Export traces
langsmith trace export ./traces --project my-project --limit 20 --full
import json
from pathlib import Path
from langsmith import Client
client = Client()
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]
root = next((r for r in runs if r.get("parent_run_id") is None), None)
if root and root.get("inputs") and root.get("outputs"):
examples.append({
"trace_id": root.get("trace_id"),
"inputs": root["inputs"],
"outputs": root["outputs"]
})
with open("/tmp/dataset.json", "w") as f:
json.dump(examples, f, indent=2)
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"response": "..."}}
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b"]}}
{"trace_id": "...", "inputs": {"question": "..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
from langsmith import Client
client = Client()
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples(
inputs=[{"query": "What is AI?"}],
outputs=[{"answer": "AI is..."}],
dataset_name="My Dataset",
)
import { Client } from "langsmith";
const client = new Client();
const dataset = await client.createDataset("My Dataset", { description: "Evaluation dataset" });
await client.createExamples({
inputs: [{ query: "What is AI?" }],
outputs: [{ answer: "AI is..." }],
datasetName: "My Dataset",
});
npx claudepluginhub xamuavila/golden-skillsCreates, manages, and uploads evaluation datasets to LangSmith using CLI and SDK. Handles types like final_response, single_step, trajectory, RAG for LLM testing.
Creates evaluation datasets for Dokimos in JSON, CSV, or JSONL formats for LLM evaluation, test data, experiments, and format conversions.
Create, configure, and update datasets on Hugging Face Hub with SQL-based querying, streaming row updates, and multi-format template support.