From langfuse
Use when the user wants to create a Langfuse LLM-as-a-Judge evaluator. Trigger phrases: "create evaluator", "add evaluator", "new evaluation", "set up evaluation criteria", "create judge". Handles prompt composition, schema validation, ID generation, SQL insertion into eval_templates and job_configurations, and post-creation verification.
How this skill is triggered — by the user, by Claude, or both
Slash command
/langfuse:create-evaluatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Insert a Langfuse LLM-as-a-Judge evaluator directly into PostgreSQL by creating
Insert a Langfuse LLM-as-a-Judge evaluator directly into PostgreSQL by creating
an eval_template and matching job_configuration.
Ensure the following are available before proceeding:
cuid2 and psycopg2-binary installed. If missing, install
via uv add cuid2 psycopg2-binary.Consult references/evaluator-schema-reference.md for complete schema details,
format rules, and common patterns.
Ask the user what they want to evaluate. Common intents:
Work with the user to compose or accept an evaluation prompt. The prompt must:
{{variable}} syntax for template variables (e.g., {{input}}, {{output}}).reasoning and score fields.Extract the list of {{variables}} from the prompt for the vars array.
Configure the output_schema with descriptive values for reasoning and score:
{"reasoning": "Step-by-step analysis explaining the score", "score": "Score between 0 and 1"}
The values are LLM instructions, not type descriptors. Richer descriptions
produce better evaluations. See references/evaluator-schema-reference.md for
correct vs incorrect examples.
For each {{variable}} extracted from the prompt, configure a variable mapping
entry:
trace (default), generation, span, or other observation type.input, output, or metadata.Use discover-traces to help the user identify available trace names and
observation names if needed.
Set the model and provider. Defaults:
gpt-4oazure{"temperature": 0, "max_tokens": 500}Verify the model is configured in the project:
SELECT provider, adapter, custom_models FROM llm_api_keys
WHERE project_id = %s;
Ask whether to restrict which traces this evaluator runs on. If yes, delegate to
discover-filter-options to identify filter dimensions (trace names, tags,
environments) and construct the filter JSON. If not, use [].
1.0 = 100%).0).'trace'.Present the complete proposed configuration for user approval:
## Proposed Evaluator: <name>
**Template**:
- Name: <name>
- Model: <model> (<provider>)
- Variables: <var1>, <var2>
- Output Schema: {"reasoning": "...", "score": "..."}
**Prompt**:
<prompt text>
**Job Configuration**:
- Score Name: <score_name>
- Sampling: <sampling>
- Delay: <delay>ms
- Status: INACTIVE (safety default)
- Time Scope: ['NEW']
- Filters: <filter summary or "none">
- Variable Mapping: <mapping summary>
Proceed with creation? (yes/no)
Wait for explicit user approval before writing to the database.
Before inserting, check if an evaluator with the same name already exists:
SELECT id, name, version FROM eval_templates
WHERE project_id = %s AND name = %s
ORDER BY version DESC LIMIT 1
max_version + 1).1.Generate two CUIDs — one for the template, one for the job configuration:
from cuid2 import cuid_wrapper
cuid_generator = cuid_wrapper()
template_id = cuid_generator()
job_config_id = cuid_generator()
import psycopg2
import json
conn = psycopg2.connect("CONNECTION_STRING_HERE")
conn.autocommit = False
try:
with conn.cursor() as cur:
cur.execute("""
INSERT INTO eval_templates (
id, project_id, name, version, prompt, model, provider,
model_params, vars, output_schema
) VALUES (
%s, %s, %s, %s, %s, %s, %s,
%s::jsonb, %s, %s::jsonb
)
""", (
template_id, PROJECT_ID, name, version, prompt, model, provider,
json.dumps(model_params), vars_array, json.dumps(output_schema)
))
docker exec -i CONTAINER_NAME psql -U USER -d DBNAME -c "
INSERT INTO eval_templates (
id, project_id, name, version, prompt, model, provider,
model_params, vars, output_schema
) VALUES (
'TEMPLATE_ID', 'PROJECT_ID', 'NAME', VERSION,
'PROMPT_TEXT',
'MODEL', 'PROVIDER',
'{\"temperature\": 0, \"max_tokens\": 500}'::jsonb,
ARRAY['var1', 'var2'],
'{\"reasoning\": \"...\", \"score\": \"...\"}'::jsonb
);
"
cur.execute("""
INSERT INTO job_configurations (
id, project_id, job_type, eval_template_id, score_name,
filter, target_object, variable_mapping, sampling, delay,
status, time_scope
) VALUES (
%s, %s, 'EVAL', %s, %s,
%s::jsonb, 'trace', %s::jsonb, %s, %s,
'INACTIVE', ARRAY['NEW']
)
""", (
job_config_id, PROJECT_ID, template_id, score_name,
json.dumps(filter_config), json.dumps(variable_mapping),
sampling, delay
))
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
Run verification queries to confirm both records were created:
SELECT id, name, version, model FROM eval_templates
WHERE id = %s AND project_id = %s;
SELECT id, score_name, status, eval_template_id FROM job_configurations
WHERE id = %s AND project_id = %s;
Both queries must return exactly one row.
Present a success summary:
## Evaluator Created Successfully
**Template**: <name> v<version> (id: <template_id>)
**Job Config**: <score_name> (id: <job_config_id>)
**Status**: INACTIVE
View in Langfuse: {LANGFUSE_HOST}/project/{PROJECT_ID}/settings/llm-as-a-judge
Would you like to activate this evaluator now?
If the user wants to activate, delegate to the toggle-evaluator-status skill.
npx claudepluginhub alex-kopylov/zweihander --plugin langfuseSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.