From ultralytics
Use when authoring or modifying a ModelBox graph (.toml), configuring a ModelBox flowunit, choosing the right flowunit for a step (video decode, inference, YOLO postprocess, drawing boxes, encoding output), or resolving "which flowunit handles X". Provides a curated index of flowunits used in YOLO + video pipelines, with TOML config schema, ports, device options, and known-good recipes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ultralytics:modelboxThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Invoke this skill when you need to:
references/coreml_inference.mdreferences/data_source_generator.mdreferences/draw_bbox.mdreferences/httpserver_async.mdreferences/image_process.mdreferences/inference.mdreferences/mean.mdreferences/meta_mapping.mdreferences/normalize.mdreferences/output_broker.mdreferences/resize.mdreferences/video_decoder.mdreferences/video_demuxer.mdreferences/video_encoder.mdreferences/video_input.mdreferences/yolo26_post.mdreferences/yolo_pose_post.mdreferences/yolo_seg_post.mdreferences/yolo_track_post.mdInvoke this skill when you need to:
.toml file in graphviz DSL)[base], [config], [input.*], [output.*])virtual_type / device for a target chipA ModelBox pipeline is a directed graph of flowunits wired together in a .toml graph file using graphviz DSL. Each flowunit declares a device (cuda, cpu, apple_silicon, …), consumes typed input ports, and emits typed output ports. The graph file lists nodes (flowunit instances) and edges (port-to-port wires). Each flowunit node names a flowunit directory that holds its own <name>.toml config. See modelbox-ai.com/modelbox-book for the authoring guide.
| Flowunit | Devices | Group | Use for | Reference |
|---|---|---|---|---|
inference | cuda, cpu | Inference | Run any model; virtual_type picks engine | references/inference.md |
coreml_inference | apple_silicon | Inference | Run CoreML models on Apple Silicon (M-series) | references/coreml_inference.md |
video_input | cpu | Input | Open a video file and stream decoded frames | references/video_input.md |
data_source_generator | cpu | Input | File/dir source (replays paths into a stream) | references/data_source_generator.md |
httpserver_async | cpu | Input | HTTP request/reply edges | references/httpserver_async.md |
video_demuxer | cpu | Video | Split mp4/mkv container into elementary streams | references/video_demuxer.md |
video_decoder | cuda, cpu | Video | Decode H.264/H.265 to frames (NV12/RGB) | references/video_decoder.md |
video_encoder | cpu | Video | Frames → H.264 mp4 | references/video_encoder.md |
resize | cpu | Preprocess | Resize frames to a fixed width x height | references/resize.md |
image_process | cuda, cpu | Preprocess | Resize/letterbox/colorspace/layout convert | references/image_process.md |
normalize | cuda, cpu | Preprocess | Per-channel normalize (mean/std) | references/normalize.md |
mean | cuda, cpu | Preprocess | Per-channel mean subtract | references/mean.md |
yolo26_post | cpu | Postprocess | YOLO detect head -> boxes (anchors, NMS) | references/yolo26_post.md |
yolo_seg_post | cpu | Postprocess | YOLO segment head -> masks | references/yolo_seg_post.md |
yolo_pose_post | cpu | Postprocess | YOLO pose head -> keypoints | references/yolo_pose_post.md |
yolo_track_post | cpu | Postprocess | YOLO + tracker -> tracked boxes with IDs | references/yolo_track_post.md |
draw_bbox | cpu | Output | Draw boxes/labels on frames | references/draw_bbox.md |
output_broker | cpu | Output | Send results to Kafka/RocketMQ/HTTP/file | references/output_broker.md |
meta_mapping | cpu | Utility | Rename/project/drop fields between units | references/meta_mapping.md |
Use virtual_type inside the [base] block of an inference flowunit to select the engine:
| Chip | virtual_type | device key | Inference flowunit |
|---|---|---|---|
| NVIDIA GPU | tensorrt | cuda | inference |
| NVIDIA GPU | torch | cuda | inference |
| Apple Silicon (M-series) | coreml | apple_silicon | coreml_inference |
| Huawei Ascend | acl | ascend | inference |
| Huawei Ascend | mindspore | ascend | inference |
| Intel Arc / iGPU | openvino | intel_gpu | inference |
| CPU fallback | onnxruntime | cpu | inference |
Four canned recipes. Each shows a complete graph skeleton; fill in flowunit configs from the linked references.
Chip: NVIDIA GPU | Engine: TensorRT | References: inference, video_input, video_demuxer, video_decoder, image_process, yolo26_post, draw_bbox, video_encoder
# graph/nvidia_file_detect.toml
[driver]
skip-default = false
dir = ["${MODELBOX_SOLUTION_PATH}/flowunit"]
[graph]
format = "graphviz"
graphconf = """digraph nvidia_file_detect {
node [shape=Mrecord]
video_input[type=flowunit, flowunit=video_input, device=cpu,
source_url="${input_video}"]
video_demuxer[type=flowunit, flowunit=video_demuxer, device=cpu]
video_decoder[type=flowunit, flowunit=video_decoder, device=cuda,
pix_fmt="rgb"]
image_process[type=flowunit, flowunit=image_process, device=cuda,
width=640, height=640, interpolation="inter_linear",
color_mode="rgb", data_type="float"]
detect[type=flowunit, flowunit=detect, device=cuda,
virtual_type="tensorrt"]
yolo26_post[type=flowunit, flowunit=yolo26_post, device=cpu,
classes=80, conf_threshold=0.25, nms_threshold=0.45,
input_width=640, input_height=640]
draw_bbox[type=flowunit, flowunit=draw_bbox, device=cpu,
thickness=2, font_size=0.6]
video_encoder[type=flowunit, flowunit=video_encoder, device=cpu,
dest_url="${output_video}", encoder="libx264"]
video_input:out_video_url -> video_demuxer:in_video_url
video_demuxer:out_video_packet -> video_decoder:in_video_packet
video_decoder:out_image -> image_process:in_image
image_process:out_image -> detect:input
detect:output -> yolo26_post:in_data
video_decoder:out_image -> draw_bbox:in_frame
yolo26_post:out_data -> draw_bbox:in_boxes
draw_bbox:out_image -> video_encoder:in_image
}"""
The
detectflowunit directory must contain a.tomlwithentry=./model.engineandvirtual_type=tensorrt. See references/inference.md.
Chip: Apple M-series | Engine: CoreML | References: coreml_inference, video_input, video_demuxer, video_decoder, resize, yolo26_post, draw_bbox, video_encoder
This is Recipe 1 with three lines changed (marked # ← changed):
# graph/apple_file_detect.toml
[driver]
skip-default = false
dir = ["${MODELBOX_SOLUTION_PATH}/flowunit"]
[graph]
format = "graphviz"
graphconf = """digraph apple_file_detect {
node [shape=Mrecord]
video_input[type=flowunit, flowunit=video_input, device=cpu,
source_url="${input_video}"]
video_demuxer[type=flowunit, flowunit=video_demuxer, device=cpu]
video_decoder[type=flowunit, flowunit=video_decoder, device=cpu,
pix_fmt="rgb"]
resize[type=flowunit, flowunit=resize, device=cpu, # ← changed
width=640, height=640]
detect[type=flowunit, flowunit=detect, device=apple_silicon, # ← changed
virtual_type="coreml"]
yolo26_post[type=flowunit, flowunit=yolo26_post, device=cpu,
classes=80, conf_threshold=0.25, nms_threshold=0.45,
input_width=640, input_height=640]
draw_bbox[type=flowunit, flowunit=draw_bbox, device=cpu,
thickness=2, font_size=0.6]
video_encoder[type=flowunit, flowunit=video_encoder, device=cpu,
dest_url="${output_video}",
encoder="h264_videotoolbox"] # ← changed
video_input:out_video_url -> video_demuxer:in_video_url
video_demuxer:out_video_packet -> video_decoder:in_video_packet
video_decoder:out_image -> resize:in_image
resize:out_image -> detect:input
detect:output -> yolo26_post:in_data
video_decoder:out_image -> draw_bbox:in_frame
yolo26_post:out_data -> draw_bbox:in_boxes
draw_bbox:out_image -> video_encoder:in_image
}"""
The
detectflowunit directory must contain a.tomlwithentry=./model.mlmodelcanddevice=apple_silicon,virtual_type=coreml. See references/coreml_inference.md.
References: httpserver_async, image_process, inference/coreml_inference, yolo26_post
# graph/http_detect.toml
[graph]
format = "graphviz"
graphconf = """digraph http_detect {
node [shape=Mrecord]
http_in[type=flowunit, flowunit=httpserver_async, device=cpu,
endpoint="/api/detect", port=8080]
image_process[type=flowunit, flowunit=image_process, device=cuda,
width=640, height=640, color_mode="rgb", data_type="float"]
detect[type=flowunit, flowunit=detect, device=cuda,
virtual_type="tensorrt"]
yolo26_post[type=flowunit, flowunit=yolo26_post, device=cpu,
classes=80, conf_threshold=0.25, nms_threshold=0.45,
input_width=640, input_height=640]
http_in:out_request -> image_process:in_image
image_process:out_image -> detect:input
detect:output -> yolo26_post:in_data
yolo26_post:out_data -> http_in:in_reply
}"""
Swap
detectdevice/virtual_typefor Apple Silicon (device=apple_silicon, virtual_type=coreml) and replaceimage_process→resize.
References: video_demuxer, video_decoder, image_process, inference, yolo26_post, output_broker
# graph/rtsp_detect_broker.toml
[graph]
format = "graphviz"
graphconf = """digraph rtsp_detect_broker {
node [shape=Mrecord]
video_demuxer[type=flowunit, flowunit=video_demuxer, device=cpu,
source_url="${rtsp_url}"]
video_decoder[type=flowunit, flowunit=video_decoder, device=cuda,
pix_fmt="rgb"]
image_process[type=flowunit, flowunit=image_process, device=cuda,
width=640, height=640, color_mode="rgb", data_type="float"]
detect[type=flowunit, flowunit=detect, device=cuda,
virtual_type="tensorrt"]
yolo26_post[type=flowunit, flowunit=yolo26_post, device=cpu,
classes=80, conf_threshold=0.25, nms_threshold=0.45,
input_width=640, input_height=640]
sink[type=flowunit, flowunit=output_broker, device=cpu,
broker_type="http", broker_url="http://localhost:9090/results"]
video_demuxer:out_video_packet -> video_decoder:in_video_packet
video_decoder:out_image -> image_process:in_image
image_process:out_image -> detect:input
detect:output -> yolo26_post:in_data
yolo26_post:out_data -> sink:in_data
}"""
Wire the RTSP URL into
video_demuxerviasource_urlor a precedingdata_source_generatornode. For Kafka output changebroker_type="kafka"and setbroker_urlto the bootstrap server.
inference's entry path is relative to the flowunit's own directory, not the graph .toml.video_decoder (cuda) and image_process.name= in [base] must match the flowunit directory name exactly — ModelBox uses this to locate the directory.:port_name syntax; types must match across the edge (image, tensor, json, …).coreml_inference requires device=apple_silicon; using device=cpu silently routes to a different engine.draw_bbox needs two input ports wired simultaneously — the original frame (from decoder, bypassing the infer chain) and the detection JSON (from the post flowunit).yolo skill for the Ultralytics side of the pipeline: training, export to TensorRT/CoreML, dataset validation.Generated from modelbox@125d1cd6b746cbfd410ac288a1e1f2e2664fb77e (2026-05-06). Re-run python3 tools/build-modelbox-skill.py to update.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub bovey0809/claude-code-ultralytics --plugin ultralytics