How this skill is triggered — by the user, by Claude, or both
Slash command
/lcars:calibrateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze scoring data and propose threshold adjustments. Changes require explicit user approval — never automatic.
Analyze scoring data and propose threshold adjustments. Changes require explicit user approval — never automatic.
When user runs /lcars:calibrate:
Load data from ~/.claude/lcars/:
scores.jsonl — full score historythresholds.json — current active thresholdsmemory/correction-outcomes.jsonl — correction effectivenessmemory/patterns.json — consolidated patternsAnalyze for threshold misalignment:
Look for query types where drift is frequently detected but corrections are ineffective (fitness < 0.50 for that query type). This suggests the threshold is too aggressive.
Propose: relax the threshold for that query type.
Look for query types where scores consistently approach but don't cross thresholds (e.g., density 0.61 against 0.60 threshold repeatedly). If patterns.json shows validated drift patterns for that type, the threshold may be too lenient.
Propose: tighten the threshold for that query type.
If a query type has enough data (10+ scores) and its average metrics differ significantly from global defaults (> 1 standard deviation), propose adding a query-type-specific override.
For each proposed change, show:
Ask: "Apply these changes? (yes/no/select specific proposals)"
~/.claude/lcars/thresholds.json:Run: python3 -c "import sys; sys.path.insert(0, '${CLAUDE_PLUGIN_ROOT}/lib'); from thresholds import load, save; data = load(); [apply changes]; save(data)"
Increment the version field in the thresholds file.
npx claudepluginhub melek/lcars --plugin lcarsGuides post-launch AI feature calibration: document production error patterns, review eval performance, decide agency promotion. Uses CC/CD loop with /calibrate shortcuts.
Generates synthetic problems with quasi-ground-truth outcomes to test agents and skills, measuring recall, precision, and confidence calibration. Use for validating routing accuracy, A/B testing changes.
Calibrates an LLM judge against human labels using data splits, TPR/TNR metrics, and bias correction. Use after writing a judge prompt to verify alignment before trusting outputs.