From ai-infra-auto-driven-skills
Reviews SGLang PRs in the style of human maintainers using a curated corpus of non-agent review episodes. Useful for correctness, tests, GPU/runtime risks, API compatibility, and maintainability.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-infra-auto-driven-skills:sglang-humanize-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when the user asks for a human-style SGLang code review or wants
Use this skill when the user asks for a human-style SGLang code review or wants review feedback that resembles SGLang maintainers instead of generic linting.
Every review opens with a PR comprehension pass: a short change summary plus a Mermaid execution flowchart (with the PR's added/modified steps marked) so the reviewer can see how the diff actually runs before reading any findings. See PR Comprehension Diagram.
The bundled corpus is collected from sgl-project/sglang PRs from the first
public PR through the latest refresh (June 2026), excluding PRs authored by bots
or obvious coding-agent accounts. The collector paginates every PR's full
conversation and review history, so long multi-round discussions are captured in
their entirety rather than truncated at the first 100 events. It is organized as
review episodes, not just individual comments:
inline_review_thread: file/path-specific GitHub pull-review comments with
diff_hunk context and replies grouped by thread.pr_conversation: top-level PR conversation comments, including design
discussion, requested repros, benchmark negotiation, and author follow-ups.review_submission: review summary bodies such as COMMENT and
REQUEST_CHANGES, preserving the review state.Every episode preserves PR metadata, reviewer identity, original comment text, original comment language, timestamps, categories, and multi-round replies when GitHub exposes them. Read references/corpus-summary.md first for coverage, counts, top paths, and category distribution. Do not paste the raw gzip corpus into context; go through the helper scripts, which read it in memory-bounded segments and return a digest.
There are two tools. For an actual PR review, the exhaustive sweep below is mandatory (see workflow step 3); the first-N query tool is only for follow-up drill-downs.
summarize_sglang_review_corpus.py scans the whole corpus in
memory-bounded segments, collects every thread relevant to the PR (not just
the first N), and prints an aggregate over all matches plus the top relevance-
ranked review opinions. Pass every touched path and the PR's risk keywords;
--path and --query are repeatable and OR-combined.
python3 skills/sglang-humanize-review/scripts/summarize_sglang_review_corpus.py \
--path python/sglang/srt/speculative --path python/sglang/srt/managers \
--query eagle --query "cuda graph" --query verify --query logprob \
--top 30
It reports Scanned N threads ... matched M threads across K PRs so coverage is
explicit. Read the aggregate and the top-ranked threads, then write a short
synthesis of the recurring historical review opinions before reviewing. Use
--format jsonl to stream all matched threads when you need to read every one.
Search the corpus by topic, path, category, or reviewer:
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--query cuda --limit 5
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--path python/sglang/srt --category correctness --limit 8
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--query server_args --format jsonl --limit 3
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--kind pr_conversation --query benchmark --limit 5
python3 skills/sglang-humanize-review/scripts/query_sglang_review_corpus.py \
--kind review_submission --query "request changes" --limit 5
The full corpus is:
references/sglang-review-corpus.jsonl.gz
Regenerate it only when the user asks to refresh the evidence (bump --end-year
to the current year; the collector caps the event window at "now" and paginates
each PR's full conversation/review history):
python3 skills/sglang-humanize-review/scripts/collect_sglang_review_corpus.py \
--repo sgl-project/sglang \
--from-beginning \
--end-year 2026 \
--out-dir skills/sglang-humanize-review/references
git diff, gh pr diff, or the patch supplied by the user.references/corpus-summary.md.
summarize_sglang_review_corpus.py with every touched path
(--path, repeatable) and the PR's risk keywords (--query, repeatable:
for example cuda, kv cache, server_args, openai, logprob, tp,
dp, eagle, fp8, benchmark, pytest). It scans all threads in
memory-bounded segments and aggregates every relevant match, not the first
N.Scanned N ... matched M across K PRs line. If
matched is 0, widen paths/keywords and rerun; if it is very large, read
the aggregate plus the top-ranked threads and, when needed, stream the
full set with --format jsonl.pr_conversation or review_submission) when the PR changes behavior,
tests, docs, benchmarking, deployment defaults, or model support — and
write a short synthesis: the recurring concerns, what reviewers
blocked vs. nitpicked, repros/benchmarks they demanded, and the prevailing
resolution for this subsystem. Prefer same-subsystem evidence over broad
keyword matches. This synthesis is what the findings must be grounded in.query_sglang_review_corpus.py only afterward, to drill into a
specific thread or reviewer surfaced by the sweep.model-pr-optimization-history for the model slug before judging whether
the change repeats or conflicts with prior PRs.llm-torch-profiler-analysis,
llm-pipeline-analysis, or model-compute-simulation evidence rather
than asking for generic "benchmarks".llm-serving-capacity-planner
expectations for startup logs and capacity accounting.sglang-prod-incident-triage style replay requirements.EagleVerifyInput for a method-collision that does
not exist on the PR branch). Confirm with gh pr diff, gh api .../contents/<path>?ref=<pr-sha>, or git show <pr-sha>:<path>. If only a
mismatched checkout is available, label the finding "needs branch
verification" rather than asserting it.Prioritize these risks because they recur heavily across the human review threads in the corpus:
server_args, CLI defaults, endpoint behavior, streaming, and backward
compatibility.Before findings, emit a comprehension block so the reviewer understands the PR's principle at a glance. It has two parts:
Diagram rules:
```mermaid block with flowchart TD (or LR for short
linear flows). This renders on GitHub PR comments and most markdown viewers.changed class and keep
untouched context nodes plain, so old vs. new behavior is obvious. Always
include the legend node.-->|fp8 path|, -->|cache miss|) when a branch is where the behavior
changes.```mermaid blocks, stacked
vertically (one after another, never two side by side) — do not pack two
subgraphs into one block, which lays them out horizontally and shrinks each
to an unreadable size. For a pure refactor with no control-flow change, show
old-vs-new as two short branches and say so.flowchart TD (top-down) so the graph grows vertically and stays
legible; reserve LR for a genuinely short linear chain.function, ClassName.method, file:line) in node
labels so the diagram is verifiable against the diff. Do not fabricate edges.(), ,, :, =, >,
&, or / in double quotes — C["down_proj(x, skip_all_reduce=rs)"],
B -->|"id < 0 or id >= vocab"| N. Use <br/> for line breaks inside a
label, never a literal \n. Put classDef changed stroke-dasharray:5 5,stroke-width:2px;
once at the end. Quoting unconditionally is the safe default.Skeleton to adapt (replace labels with the PR's real symbols and paths):
flowchart TD
A["Entry: forward / handler / scheduler step"] --> B{"Branch the PR changes"}
B -->|"new condition"| C["New/changed call or transform"]:::changed
B -->|"existing path"| D["Unchanged path"]
C --> E["Downstream effect: KV write / output / metric"]:::changed
D --> E
E --> F["Return / response / side effect"]
L["Legend: dashed border = added or modified by this PR"]:::changed
classDef changed stroke-dasharray:5 5,stroke-width:2px;
Place this comprehension block first in the response, then the findings. Keep it tight; it orients the reader, it is not the review itself.
Mirror human SGLang review habits:
For a normal review, return:
scanned / matched / PRs) and summarize the recurring review opinions for
this subsystem from the exhaustive corpus sweep (workflow step 3) that the
findings build on.For a review-prep pass before the user opens a PR, return:
For a corpus-backed explanation, include the query terms and summarize the matched review behavior without dumping long comment bodies.
npx claudepluginhub bbuf/ai-infra-auto-driven-skills --plugin ai-infra-auto-driven-skillsOrchestrates multi-agent code review with Codex CLI, Gemini CLI, and five Claude specialist subagents (security, performance, logic, regression, robustness) then synthesizes findings into verified fixes. Use for deep reviews, second opinions, or council reviews on PRs, commits, or branches.
Performs multi-agent code review of GitHub Python Pull Requests, covering architecture, tests, performance, docs, lint, security, and API design.
Invokes a different AI model for independent code review, catching bugs that self-review misses. Useful for PR review or second opinions on code.