Convert Agent to Production-Grade Temporal Workflow
This skill combines AgentLens agent assessment with Temporal SDK expertise to generate production-grade Temporal workflows from agent code in a single pass.
Why This Skill Exists
Converting agent code to Temporal requires two domains of expertise:
- Agent analysis — understanding what the agent actually does, classifying nodes, identifying what's genuinely agentic vs. workflow-in-disguise
- Temporal SDK mastery — determinism rules, replay mechanics, idiomatic patterns, error handling, testing, observability
This skill fuses both so the generated code is correct on the first pass.
Phase 1: Assessment (AgentLens)
Before generating any Temporal code, you MUST assess the agent. This determines the conversion strategy.
Step 1.1: Identify the Framework
Read framework-parsers.md for framework-specific parsing patterns.
Step 1.2: Extract the Execution Graph
For the agent code provided, extract:
- Every node/step/agent in the system
- What each node does (function body, prompt, tools)
- How control flows between nodes (edges, conditionals, routers)
- Where LLM calls happen and what they do
- What tools/APIs are called and by whom
- Loop structures — bounded or unbounded?
- State/context passing patterns
Step 1.3: Classify Every Node
Read assessment-rubric.md and classify each node:
| Classification | Definition |
|---|
| Deterministic | No LLM. Pure code logic, API calls, data transforms. |
| LLM-as-Function | LLM called with fixed purpose. Position in flow is predetermined. LLM does not decide what happens next. |
| LLM-as-Router | LLM classifies input to select a branch. Set of possible branches known at design time. |
| Genuinely Agentic | LLM decides next action from open-ended action space. Number of steps not predetermined. |
Step 1.4: Determine System Verdict
- Pure Workflow: All nodes are Deterministic, LLM-as-Function, or LLM-as-Router → full Temporal replacement
- Hybrid: Some nodes genuinely agentic, but the spine is deterministic → Temporal workflow with agentic activities
- Genuinely Agentic: Top-level control flow is LLM-driven → wrap in Temporal for durability, keep agency
Present the assessment summary to the user before proceeding to code generation.
Phase 2: Temporal Foundation Knowledge
Before generating code, read and internalise these Temporal references. This knowledge MUST inform every line of generated code.
Mandatory Reading (read ALL before generating code)
- core/determinism.md — Replay mechanics, Command/Event model. This is the #1 source of bugs in Temporal code.
- python/python.md — Python SDK quick start, file organization, sandbox, common pitfalls.
- python/determinism.md — Python sandbox behavior, safe alternatives, pass-through imports.
- core/patterns.md — Signals, queries, updates, child workflows, continue-as-new, saga, parallel execution, heartbeating, local activities.
- python/patterns.md — Python-specific code examples for all patterns.
- core/gotchas.md — Anti-patterns and common mistakes.
- python/gotchas.md — Python-specific mistakes.
- python/error-handling.md — ApplicationError, retry policies, non-retryable error classification, idempotency.
- core/ai-patterns.md — Why Temporal for AI/LLM, core patterns for wrapping LLM calls.
- python/ai-patterns.md — Python-specific AI patterns, Pydantic data converters, OpenAI/LiteLLM config, retry for LLMs, cost tracking.
Reference as Needed
- python/testing.md — WorkflowEnvironment, time-skipping, activity mocking, replay testing
- python/observability.md — Logging, metrics, tracing, Search Attributes
- python/sync-vs-async.md — Sync vs async activities, ThreadPoolExecutor
- python/data-handling.md — Data converters, Pydantic integration, payload encryption
- python/versioning.md — Patching API for safe workflow evolution
- python/advanced-features.md — Schedules, worker tuning, local activities
- core/versioning.md — Versioning strategies (patching, type versioning, worker versioning)
- core/error-reference.md — Error types (TMPRL codes), workflow status reference
- core/troubleshooting.md — Decision trees for stuck workflows, non-determinism, timeouts
- python/determinism-protection.md — Python sandbox specifics, forbidden operations
Phase 3: Code Generation
Generate code based on the assessment verdict and Temporal knowledge.
Output Structure
Always produce these files:
generated_workflow/
├── workflows.py # Workflow definitions
├── activities.py # Activity implementations
├── models.py # Data models (dataclasses for workflow state)
├── worker.py # Worker bootstrap
├── llm_utils.py # LLM call helpers (if any LLM activities)
├── tests/
│ ├── conftest.py # Test fixtures
│ ├── test_workflows.py # Workflow integration tests
│ └── test_activities.py# Unit tests for activities
├── requirements.txt # Dependencies (temporalio>=1.7.0)
└── migration-notes.md # What changed, how to run, testing checklist
Generation Rules by Verdict
For Pure Workflow
- Each original node becomes a Temporal Activity
- LLM-as-Function nodes → activities that call the LLM with a fixed prompt
- LLM-as-Router nodes → activities whose return value drives workflow branching
- Deterministic nodes → simple activities with no LLM
- Apply model downgrades from the cost analysis
For Hybrid
- Deterministic spine → standard Temporal Activities
- Agentic nodes → Activities with extended timeouts and heartbeating
- The activity wraps the original agent logic
- Mark which activities are deterministic vs. agentic in comments
For Genuinely Agentic
- Wrap entire agent as a Temporal Activity within a thin workflow
- Generate guardrails: max iteration limits, cost budget caps, timeout circuits
- Identify optimization opportunities (caching, model downgrades, parallelization)
Critical Temporal Rules (from references — MUST follow)
Determinism
- NEVER use
datetime.now(), time.time(), random, uuid.uuid4() in workflow code
- NEVER make network calls, file I/O, or access env vars in workflow code
- ALWAYS use
workflow.unsafe.imports_passed_through() for non-workflow imports in the workflow file
- ALWAYS place non-deterministic operations inside activities
- Workflow code must produce identical Commands on replay
File Organization
- NEVER put workflow definitions and activity implementations in the same file
- Workflows in
workflows.py, activities in activities.py, models in models.py
- This prevents sandbox issues and import conflicts
Data Models
- Use
@dataclass for all activity inputs and outputs
- Every activity must have typed input and output dataclasses
- Make all fields JSON-serializable (no complex objects, datetimes as ISO strings)
- Avoid passing large blobs — use references (IDs, URLs) for big data
Timeout Configuration
| Node Type | start_to_close_timeout | heartbeat_timeout |
|---|
| Deterministic (fast) | 10-30s | Not needed |
| Deterministic (API call) | 30-60s | Not needed |
| LLM-as-Function | 15-60s | Not needed |
| LLM-as-Router | 10-30s | Not needed |
| Agentic node | 5-15 minutes | 30-60s |
Retry Policies
- Deterministic activities: standard retry (3 attempts, 1s initial, 2x backoff)
- LLM calls: retry transient API errors, NOT parse errors. Use
non_retryable_error_types=["ValueError", "KeyError", "json.decoder.JSONDecodeError"]
- Agentic activities: limited retry (2 attempts), the agent handles its own errors
- Side-effect activities (email, DB writes): ensure idempotency before enabling retries
Error Handling
- Use
ApplicationError for business logic errors
- Classify errors as retryable vs non-retryable:
- Retryable: network errors, timeouts, rate limits, temporary unavailability
- Non-retryable: invalid input, auth failures, business rule violations, parse errors
- For LLM activities, always handle rate limit errors (HTTP 429) with backoff
Activities with Multiple Side Effects
If an activity performs multiple side effects (e.g., send email + log to CRM):
- Split into separate activities for independent retry/compensation
- Or use the Saga pattern if rollback is needed on partial failure
Testing (from temporal/python/testing.md)
- Integration tests: use
WorkflowEnvironment.start_local() — no running server needed
- Activity unit tests: use
ActivityEnvironment with mocked dependencies
- Replay tests: record history, replay to catch non-determinism on code changes
- Mock activities in workflow tests: test workflow logic without real LLM calls
- Always test both happy path and error/retry paths
Observability (from temporal/python/observability.md)
- Use
workflow.logger for logging inside workflows (not print or logging)
- Add Search Attributes for queryable workflow metadata (e.g., ticket_id, category)
- Consider custom metrics for LLM cost tracking per workflow
Sync vs Async (from temporal/python/sync-vs-async.md)
- Prefer async activities for I/O-bound work (LLM calls, HTTP requests)
- Use sync activities with
ThreadPoolExecutor for CPU-bound or blocking library calls
- Never mix blocking calls in async activities without an executor
Phase 4: Quality Checklist
Before presenting the generated code, verify against this checklist:
Determinism Safety
Activity Design
Testing
Production Readiness
Important Guidelines
- Never assume a system is over-agentified just because it uses an agent framework. Assess the actual code behavior.
- Be precise about what makes something agentic. "It uses tools" is not sufficient. "The LLM picks from an open set of tools AND decides when to stop" is genuinely agentic.
- When converting LLM activities, apply model downgrades from the assessment cost analysis. Classification/extraction tasks rarely need the most expensive model.
- Preserve the original system's behavior exactly. The conversion should be a refactoring, not a redesign.
- If the agent code includes streaming to users, see the agentlens temporal-patterns reference for streaming guidance (query polling, side-channel, or skip Temporal for that step).