By 3shn
A Julia-first Bayesian-workflow copilot: a Socratic super-REPL for building probabilistic models incrementally with fail-closed calibration gates. Tool-agnostic methodology in prose + math (no baked code); consult-the-docs discipline; drives a live Julia session via Revise.jl hot-reload and an MCP REPL.
Use this agent to interpret the output of MCMC and calibration diagnostics — R-hat, ESS, divergences, E-BFMI, trace/rank plots, SBC rank histograms, LOO-PIT, Pareto k — in plain language and recommend the next action. Invoke when someone pastes or describes diagnostic output and needs to know what it means and what to do. <example>Context: The user ran a sampler and got warnings. user: "I got 47 divergences and a bulk-ESS of 80, what does that mean?" assistant: "Let me hand this to the diagnostic-reader agent to interpret and recommend the fix." <commentary>Translating raw diagnostics into a verdict and a concrete next action is this agent's specialty.</commentary></example> <example>Context: The user shows an SBC rank histogram. user: "The SBC ranks pile up at both ends — is that bad?" assistant: "Let me use the diagnostic-reader agent to read that histogram." <commentary>A U-shaped SBC histogram is overconfidence; the agent names it and prescribes the honest fix.</commentary></example>
Use this agent to adversarially criticize a Bayesian model — to find how it could be wrong and what would falsify it — by proposing posterior-predictive test quantities and identifiability / prior-sensitivity probes. Invoke during the criticism stage of the baywright workflow, or whenever someone is ready to trust or ship a model. <example>Context: A model has converged and the user is satisfied. user: "The model fits great, I think we're done." assistant: "Before we call it done, let me run the model-critic agent to try to break it." <commentary>The critic hunts for the misfit the user is not looking for — exactly when someone is ready to stop is when it is most useful.</commentary></example> <example>Context: The user reports a clean posterior. user: "Everything looks good, R-hat is 1.0." assistant: "Clean sampling isn't adequacy — let me use the model-critic agent to design posterior-predictive checks." <commentary>The agent distinguishes 'the sampler worked' from 'the model is right' and proposes the checks that tell them apart.</commentary></example>
Use this agent to elicit priors for a Bayesian model through a Socratic interview — turning a domain expert's vague beliefs into prior distributions on the right scale and pressure-testing them against the prior predictive. Invoke during the priors stage of the baywright workflow, or whenever someone needs help turning "I'm not sure what prior to use" into a defensible choice. <example>Context: The user is choosing priors for a hierarchical model. user: "I have no idea what prior to put on the group-level scale." assistant: "Let me bring in the prior-interviewer agent to elicit it with you." <commentary>Prior elicitation is precisely this agent's job: interview about plausible magnitudes, propose a prior on the right scale, and name the prior-predictive check that confirms it.</commentary></example> <example>Context: A user has written flat priors everywhere. user: "I just used uniform priors so it's objective." assistant: "Let me use the prior-interviewer agent to check what those priors actually imply on the outcome scale." <commentary>The agent's value here is exposing that 'uninformative' priors are often strongly informative on the outcome scale.</commentary></example>
The entry point and orchestrator for Bayesian modeling: load this for any end-to-end modeling effort, any task that spans more than one stage, or the meta-question "is my model any good / good enough / done?". Runs an interactive "super-REPL" — the agent drives a live, hot-reloading session, interprets what it prints, and guides the human at every turn — and enforces a fixed sequence (formulate -> priors -> fit + diagnose -> calibrate -> criticize -> compare -> report) plus non-negotiable honesty gates (trust calibration first; never tune-to-pass; never call a model "good" without evidence). Methodology is prose + math only, tool-agnostic (Julia-first via Revise.jl; also Stan/PyMC/Turing/NumPyro/brms/R); no baked code — consult current docs and write live code in the session. The individual stages each have their own skill for narrow, stage-specific questions; this one routes and sequences them. Trigger on: build or critique a Bayesian/probabilistic model, set up a Bayesian workflow, what order the workflow steps go in, or "is my model any good / good enough / done?".
Baywright workflow verb (invoke as /baywright:bw-criticize): run a posterior-predictive model criticism pass — try to break the model with test quantities, check identifiability and prior sensitivity — and record what you found in the Model Ledger. Trigger on: /baywright:bw-criticize, "criticize my model", "run posterior predictive checks", "can my model reproduce the data".
Baywright teaching verb (invoke as /baywright:bw-explain <concept>): explain a Bayesian-workflow concept in plain language — what it is, why it matters, how to read it — then point to the current docs for the chosen tool instead of pasting code. For learners. Trigger on: /baywright:bw-explain, "explain <concept>", "what is SBC / R-hat / ELPD / a funnel", "help me understand this diagnostic".
Baywright workflow verb (invoke as /baywright:bw-gate): run the calibration gate — prior predictive sanity, simulation-based calibration (SBC), and parameter recovery — fail-closed, and record the verdict in the Model Ledger. Trigger on: /baywright:bw-gate, "run the calibration gate", "check calibration", "run SBC", "is my model calibrated".
Baywright workflow verb (invoke as /baywright:bw-next): advance to the next workflow stage, but gate-check the current one first — it refuses to advance past a stage whose evidence isn't recorded. Trigger on: /baywright:bw-next, "next step in the workflow", "advance the model", "what do I do next with baywright".
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Julia-first Bayesian-workflow copilot for Claude Code — a Socratic super-REPL that builds probabilistic models with you, one honest step at a time.
A REPL is Read → Eval → Print → Loop. baywright makes the agent sit inside that loop,
between Print and Read: it watches what your live session prints (a divergence count, a rank
histogram, a posterior interval), interprets it in plain language, and shapes what you do next.
The model is not a script you run once and report on — it is a living session where edits
hot-reload by AST (Revise.jl) with state preserved, and you stay in the loop at every turn.
baywright owns the process, not the syntax:
v0.2 — the MCMC lane is complete. This release has the full methodology (9 doctrine skills)
plus the interactive loop: the /baywright:bw-* workflow verbs, the persistent Model Ledger, the
on-demand specialist agents (prior-interviewer, model-critic, diagnostic-reader), and the
fail-closed done-gate hook. Amortized / simulation-based inference (SBI) and causal inference are
later phases.
| Skill | Covers |
|---|---|
bayesian-workflow | The spine: the workflow sequence, the operating contract, the honesty gates. Start here. |
model-formulation | The generative story; choosing an observation model; decision-relevance. |
priors-and-prior-predictive | Prior elicitation and prior predictive checks. |
computation-and-diagnostics | Sampler choice; convergence (R-hat, ESS, divergences, E-BFMI, tree depth). |
reparameterization | Funnels, non-centering, standardization, identifiability geometry. |
calibration | Simulation-based calibration (SBC), LOO-PIT, coverage — the honesty core. |
model-criticism | Posterior predictive checks, test quantities, identifiability. |
model-comparison | LOO-CV, ELPD, stacking, WAIC. |
reporting | The audit-trail report: assumptions first, evidence attached. |
model-ledger | The format of the living per-model record the loop reads and writes. |
Drive the workflow with the /baywright:bw-* verbs (Claude Code namespaces plugin commands by
plugin name, so your bw shorthand reads as /baywright:bw-…):
| Verb | Does |
|---|---|
/baywright:bw-start | Run the formulation intake and create the Model Ledger. |
/baywright:bw-status | Show where the model stands and which gates are pending. |
/baywright:bw-next | Advance one stage, gate-checked — won't pass a stage with no evidence. |
/baywright:bw-gate | Run the calibration gate (prior-predictive, SBC, recovery), fail-closed. |
/baywright:bw-criticize | Run a posterior-predictive criticism pass. |
/baywright:bw-explain <concept> | Teaching mode: explain a concept, then point to current docs. |
The Model Ledger (baywright-ledger.md) is the state: a machine-checkable YAML status block
over human-readable prose — the generative story, priors and their justification, each stage's
status and evidence, the honesty log, and the verdict.
On-demand specialist agents: prior-interviewer (Socratic prior elicitation), model-critic
(adversarial "how is this wrong?"), diagnostic-reader (reads R-hat / SBC / LOO output in plain
language).
The done-gate hook is a fail-closed Stop hook: it refuses to let a session end on a "model
is good/done" claim unless the ledger records the required gates (formulation, priors, computation,
calibration, criticism), or you have explicitly set verdict: pending-accepted. It is the firewall
made mechanical — honest workflow even when the agent is the one doing the modeling.
The recommended Julia backend is a persistent REPL driven over MCP, with
Revise.jl for hot-reload. baywright documents this
posture but does not bundle an MCP config — bring your own REPL server (e.g. a kaimon-style
Julia MCP) and the workflow uses whatever live-session and documentation tools you have. A user
without a live REPL still gets the full doctrine and drives their tool however they like.
AI-native SysML v2 systems-engineering co-pilot. Authors valid textual SysML v2 (.sysml), grounds every type/unit in the standard library (ISQ, SI, …), and validates with the real SysML v2 Pilot compiler in a self-correcting loop. Bundles an MCP server (validate_sysml_file, query_library, get_library_element).
npx claudepluginhub 3shn/baywrightUpstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.