From xla
Comprehensive reference for XLA (Accelerated Linear Algebra) compiler - covering architecture, operation semantics, HLO IR, compilation pipeline, GPU/CPU/TPU backends, PJRT API, MLIR integration, custom calls, autotuning, SPMD partitioning, debugging tools, and build system.
How this skill is triggered — by the user, by Claude, or both
Slash command
/xla:xlaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
XLA is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. It takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms.
references/01-overview-and-architecture.mdreferences/02-shapes-and-layout.mdreferences/03-broadcasting.mdreferences/04-operation-semantics-elementwise.mdreferences/05-operation-semantics-binary.mdreferences/06-operation-semantics-collective.mdreferences/07-operation-semantics-control-flow.mdreferences/08-operation-semantics-convolution.mdreferences/09-operation-semantics-data-manipulation.mdreferences/10-operation-semantics-linear-algebra.mdreferences/11-operation-semantics-io-and-other.mdreferences/12-hlo-ir.mdreferences/13-compilation-pipeline.mdreferences/14-hlo-passes.mdreferences/15-gpu-backend.mdreferences/16-gpu-emitters.mdreferences/17-cpu-backend.mdreferences/18-tpu-backend.mdreferences/19-developing-new-backend.mdreferences/20-pjrt-api.mdXLA is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. It takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms.
ML Framework → StableHLO → HLO → Target-Independent Optimizations → Target-Specific Optimizations → Code Generation
#include "xla/client/xla_builder.h"
xla::XlaBuilder builder("add_vectors");
// Create parameters
xla::XlaOp x = xla::Parameter(&builder, 0,
xla::ShapeUtil::MakeShape(xla::F32, {1024}), "x");
xla::XlaOp y = xla::Parameter(&builder, 1,
xla::ShapeUtil::MakeShape(xla::F32, {1024}), "y");
// Build computation
xla::XlaOp result = xla::Add(x, y);
// Build and compile
auto computation = builder.Build().value();
HloModule matmul_example
ENTRY main {
%p0 = f32[1024,512]{1,0} parameter(0)
%p1 = f32[512,2048]{1,0} parameter(1)
ROOT %dot = f32[1024,2048]{1,0} dot(%p0, %p1),
lhs_contracting_dims={1}, rhs_contracting_dims={0}
}
// Element-wise operations
XlaOp Add(XlaOp lhs, XlaOp rhs);
XlaOp Mul(XlaOp lhs, XlaOp rhs);
XlaOp Sub(XlaOp lhs, XlaOp rhs);
XlaOp Div(XlaOp lhs, XlaOp rhs);
// Data manipulation
XlaOp Reshape(XlaOp operand, ArraySlice<int64> dimensions);
XlaOp Broadcast(XlaOp operand, ArraySlice<int64> broadcast_sizes);
XlaOp Slice(XlaOp operand, ArraySlice<int64> start, ArraySlice<int64> limit, ArraySlice<int64> strides);
XlaOp Transpose(XlaOp operand, ArraySlice<int64> permutation);
XlaOp ConcatInDim(ArraySlice<XlaOp> operands, int64_t dimension);
// Linear algebra
XlaOp Dot(XlaOp lhs, XlaOp rhs);
XlaOp DotGeneral(XlaOp lhs, XlaOp rhs, DotDimensionNumbers dnums);
XlaOp Conv(XlaOp lhs, XlaOp rhs, ArraySlice<int64> strides, Padding padding);
// Collective operations
XlaOp AllReduce(XlaOp operand, XlaComputation computation, ReplicaGroupVector groups);
XlaOp AllGather(XlaOp operand, int64_t dim, int64_t count, ReplicaGroupVector groups);
// Control flow
XlaOp While(XlaComputation condition, XlaComputation body, XlaOp init);
XlaOp Conditional(XlaOp pred, XlaOp true_val, XlaComputation true_comp,
XlaOp false_val, XlaComputation false_comp);
# Dump HLO from JAX
XLA_FLAGS=--xla_dump_to=/tmp/hlo_dump python my_program.py
# Run HLO module
run_hlo_module --platform=CUDA --reference_platform=Interpreter computation.hlo
# Optimize and inspect HLO
hlo-opt --platform=CUDA --stage=hlo input.hlo
hlo-opt --passes=algebraic-simplifier input.hlo
# Deviceless GPU compilation
hlo-opt --platform=CUDA --stage=llvm \
--xla_gpu_target_config_filename=gpu_specs/a100_pcie_80.txtpb input.hlo
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub jstzwj/ai-infra-plugins --plugin xla