From ra-skills
Python patterns for system reliability — background jobs and task queues (NATS JetStream via nats-py), durable multi-step workflows (Dapr Workflow via dapr-ext-workflow), resilience and recovery (retries, backoff, timeouts, circuit breakers via tenacity), caching (Redis), and observability (OpenTelemetry traces, metrics, logs via OTLP). USE WHEN building async workers, queueing tasks, designing fault-tolerant multi-step workflows that must survive crashes, handling transient network/IO failures, instrumenting Python services for production, designing retry policies, configuring tracing/metrics, or caching with Redis. NOT FOR language idioms or type hygiene (use `writing-python`), HTTP routing (use `fastapi`), or deep OTel reference (use `otel`).
How this skill is triggered — by the user, by Claude, or both
Slash command
/ra-skills:python-infrastructureThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
System-reliability concerns for Python services in this project, grouped because real code uses them together: a task you queue (background-jobs) needs retries (resilience), instrumentation (observability), and often touches the cache (Redis) on the same call path.
System-reliability concerns for Python services in this project, grouped because real code uses them together: a task you queue (background-jobs) needs retries (resilience), instrumentation (observability), and often touches the cache (Redis) on the same call path.
| Concern | Tool | Notes |
|---|---|---|
| Message bus / task queue | NATS JetStream (via nats-py) | Durable streams, consumer groups, replay. Replaces Celery/RabbitMQ here. |
| Durable multi-step workflows | Dapr Workflow (via dapr-ext-workflow) | Activity-level checkpointing via the Dapr sidecar. Use only when a workflow has multiple non-idempotent steps and JetStream's "redeliver the whole message" model isn't enough. Requires a Dapr sidecar per pod (see fastapi/references/microservices.md § Dapr + Kubernetes). |
| HTTP | FastAPI | See sibling fastapi skill. |
| Cache | Redis | redis.asyncio for async workers. |
| Retries / backoff | tenacity | Exponential + jitter, by default. |
| Observability | OpenTelemetry (OTLP) | Traces + metrics + logs. See sibling otel. |
| Logging | stdlib logging → OTel handler | Don't pull in structlog; OTel forwards stdlib records. |
| HTTP client | httpx (async) | Replaces requests. |
| If you need to… | Read |
|---|---|
Queue a task, design a worker, persist job state, retry/DLQ patterns (NATS JetStream + nats-py) | references/background-jobs.md |
| Survive crashes mid-workflow with activity-level recovery (Dapr Workflow — workflows, activities, retry policies, scheduling) | references/dapr-workflows.md |
| Decide what to retry, with what backoff, when to stop, circuit-breakers | references/resilience.md |
| Instrument a service with OTel traces/metrics/logs, four golden signals | references/observability.md |
| Use Redis as a cache (TTL, invalidation, async client patterns) | references/caching.md |
Operation can fail transiently (network/IO/3rd-party API)?
-> resilience.md (retry policy)
Operation runs out-of-request (email, image processing, batch)?
Is the work a single idempotent action? Or fanout/event distribution?
-> background-jobs.md (NATS JetStream)
Is it a multi-step workflow where re-running step 1 on crash is bad
(charge -> reserve -> ship -> notify)?
-> dapr-workflows.md (Dapr Workflow — sidecar required)
Need to know what's happening in production?
-> observability.md (OTel)
Need to avoid repeated expensive lookups?
-> caching.md (Redis)
All five at once for one feature?
-> instrument first, then queue / workflow + retry + cache.
writing-python — how to write the function. This skill — how it survives in production.writing-python → references/error-handling.md — what exception to raise. This skill — what to do when it's raised across a network boundary.writing-python → references/resource-management.md — how to clean up resources (context managers). This skill — how to keep retrying when resources fail to acquire.fastapi — request handlers and DI. This skill — what runs around them.otel — full OTel reference (Python SDK, signals, attributes, Collector). This skill's observability.md pins project conventions on top.requests call in an asyncio worker kills throughput. Use httpx.AsyncClient or run sync code in an executor.inject/extract) is the supported path.SETNX-based mutexes for hot keys.Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub ai-riksarkivet/ra-skills