Stats

Actions

Available In

AI-Infra-Auto-Driven-SKILLS

Agent-ready playbooks for LLM serving benchmarks, capacity planning, torch-profiler triage, pipeline analysis, compute simulation, SGLang/vLLM optimization, human code review, production incidents, and model PR intelligence.

This repository is built for AI infrastructure engineers who want agents to do real work, not recite generic prompts.

It gives an agent the operational memory needed to benchmark SGLang, vLLM, and TensorRT-LLM fairly; explain serving capacity from startup logs; split prefill and decode profiler evidence; inspect traces at layer and kernel level; estimate operator FLOPs and MFU; review SGLang patches against real maintainer discussion patterns; run Humanize-governed SGLang and vLLM SOTA loops; triage SGLang production incidents from a replay; and keep model-family optimization history close to the code that actually changed.

For standalone kernel campaigns and kernel evidence tools, see the sibling project KDA-Pilot.

If this saves you one stale model-support assumption, one misleading profiler trace, or one late-night benchmark loop, a star helps more AI-infra engineers find it.

Core Skills

Skill	Use it when
`llm-serving-auto-benchmark`	You need a fair, bounded serving benchmark search for SGLang, vLLM, TensorRT-LLM, or another OpenAI-compatible stack.
`llm-serving-capacity-planner`	You need to explain SGLang or vLLM startup memory, KV cache budget, request capacity, or OOM pressure from logs.
`llm-torch-profiler-analysis`	You need a three-table profiler report that keeps `extend/prefill` and `decode` evidence separate.
`llm-pipeline-analysis`	You need forward-pass, layer, and kernel-level timing from a torch profiler trace, including anchor boundaries and Perfetto ranges.
`model-compute-simulation`	You need operator shapes, FLOPs, MFU estimates, kernel-to-op mapping, or parallelism what-if analysis for an LLM serving shape.
`sglang-humanize-review`	You need SGLang code-review findings grounded in full human PR review episodes from project start through the latest refresh (June 2026), including inline code context, top-level discussion, review summaries, and multi-round replies. Every review opens with a PR comprehension pass — a change summary plus a Mermaid execution flowchart with the diff's modified steps marked — so the reviewer sees how the PR runs before the findings.
`sglang-sota-humanize-loop`	You want one model-level Humanize RLCR loop that owns gap decisions, profiler triage, required layer-pipeline deep dives, SGLang patches, optional `ncu-report-skill` evidence, and real-model revalidation after the fixed fair benchmark.
`vllm-sota-humanize-loop`	You want one model-level Humanize RLCR loop that owns gap decisions, profiler triage, required layer-pipeline deep dives, vLLM patches, optional `ncu-report-skill` evidence, and real-model revalidation after the fixed fair benchmark.
`sglang-prod-incident-triage`	You need to turn queue growth, timeouts, wrong outputs, crashes, or distributed stalls into a replay and next debug step.
`model-architecture-diagram`	You need original public architecture diagrams for popular LLM, VLM, MoE, OCR, and diffusion model families.

ai-infra-auto-driven-skills

Popularity

What's Inside

Confidence

README

AI-Infra-Auto-Driven-SKILLS

Core Skills

SGLang SOTA Performance Loop

Model PR History Catalog

Similar Plugins

vllm-skills

langfuse-pack

nemo-evaluator-skills

superml

skill-optimizer

model-evaluator

More by BBuf

humanize

AI-Infra-Auto-Driven-SKILLS

Core Skills

SGLang SOTA Performance Loop

Model PR History Catalog

Popularity

Health & Quality

More by BBuf

humanize

Similar Plugins

vllm-skills

langfuse-pack

nemo-evaluator-skills

superml

skill-optimizer

model-evaluator