From n8n-mcp-skills
Wires n8n error handling so failures are loud, structured, and recoverable. Covers per-node error outputs, retry/self-healing, error-trigger workflows, and HTTP response shaping for webhooks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/n8n-mcp-skills:n8n-error-handlingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
By default, when an n8n node throws, the **whole workflow halts**. For an interactive run you're watching, that's fine — you see the red node and fix it. For anything unattended (a webhook API, a cron job, a queue worker, an agent tool), it's the wrong default: the caller gets a timeout or an empty 500, the operator gets no alert, and the symptom is "the integration just stopped working" with n...
By default, when an n8n node throws, the whole workflow halts. For an interactive run you're watching, that's fine — you see the red node and fix it. For anything unattended (a webhook API, a cron job, a queue worker, an agent tool), it's the wrong default: the caller gets a timeout or an empty 500, the operator gets no alert, and the symptom is "the integration just stopped working" with no log and no clue.
This skill is about making failures loud, structured, and recoverable — and, best case, self-healing so transient blips never reach a human at all.
The two ideas that prevent most silent failures:
| Workflow shape | Error handling posture |
|---|---|
Webhook / API (anything with Respond to Webhook) | Required. Every fallible node's error output wired; status code matches cause. |
| Scheduled / cron / queue worker / agent tool (unattended) | Required. A workflow-level error workflow, plus retryOnFail on network nodes. |
| Internal one-off you run and watch yourself | Optional. Default onError: "stopWorkflow" is fine — you'll see the red node and re-run. |
The dividing line: if anyone other than you sees the output — a downstream system, an end user, an on-call engineer — the failure has to be handled, not swallowed. If you're the only watcher and the cost of failure is "I notice and re-run", looser is fine.
This is the single most common way an n8n workflow "handles" errors while actually swallowing them. Routing a node's failure to a handler takes two changes, and doing only one looks complete but misbehaves:
onError: "continueErrorOutput" on the node. This is what creates the second output. Without it, main[1] doesn't exist no matter what you wire.connections.<node>.main[1], i.e. sourceIndex: 1) to a real handler. Without a target, the error data is emitted into the void.Get one without the other and you hit a failure mode:
| What you did | What happens at runtime |
|---|---|
onError set, error output not wired | Error data is silently discarded. Downstream doesn't fire. The dashboard shows the run as succeeded. Worst case — no error logged anywhere. |
Error output wired, onError not set | The slot never fires; the handler is unreachable. On failure the workflow just halts (default stopWorkflow). |
| Both done | Failure routes down main[1] to your handler. ✅ |
n8n_update_partial_workflow// 1) Turn on the error output (creates main[1])
{ type: "updateNode", nodeName: "HTTP Request",
changes: { onError: "continueErrorOutput" } }
// 2) Wire the error output to a handler. sourceIndex: 1 = the error output.
{ type: "addConnection",
source: "HTTP Request",
target: "Handle Error",
sourceIndex: 1 }
sourceIndex: 0 is the success path, sourceIndex: 1 is the error path. (For IF nodes the aliases branch: "true"/"false" map to index 0/1; for a generic fallible node, use the explicit sourceIndex: 1.)
Then verify. This trap doesn't surface in validate_workflow — a half-wired error output validates clean. Pull the workflow with n8n_get_workflow and confirm both halves:
onError is "continueErrorOutput".connections["HTTP Request"].main[1] contains your handler.Valid onError values:
| Value | Effect |
|---|---|
"stopWorkflow" (default) | Error halts the whole workflow. |
"continueRegularOutput" | Error item flows out the normal output. Rare, usually wrong — downstream gets error-shaped data and keeps going. |
"continueErrorOutput" | Error item flows out the separate error output (main[1]). The one you wire. |
Full failure-mode catalog, fan-in/fan-out shapes, and verification: NODE_ERROR_OUTPUTS.md.
retryOnFail before you wire error pathsBefore you build error branches, absorb the transient failures so they never reach those branches. On any node that calls a network service — HTTP Request, comms (Gmail/Slack/Discord), databases, AI nodes, third-party integrations — set node-level retry:
{ type: "updateNode", nodeName: "HTTP Request",
changes: {
retryOnFail: true,
maxTries: 3,
waitBetweenTries: 5000 // ms
} }
Why this comes first: a 429 or a brief upstream hiccup will retry and usually succeed on its own. The error output then fires only on real, persistent failures — so your 5xx responses and on-call alerts reflect actual problems instead of noise.
Engine limits to know: retry fires on any error (there's no per-status-code filter), maxTries caps at 5, and waitBetweenTries caps at 5000ms — so 5000 is both the max and a sensible default. See n8n-node-configuration (NODE_FAMILY_GOTCHAS.md) for node-specific notes.
A webhook-triggered workflow that responds to its caller has one rule that overrides everything else: no hanging branches. Every path — success and every error — must end at a Respond to Webhook, or the caller sits there until it times out.
Webhook (responseMode: "responseNode")
├── validate input → process → Respond (200, body)
└── (any fallible node's error output → sourceIndex 1)
→ Respond (4xx/5xx, structured error body)
→ optional: log full error privately / notify
Three things make this work:
main[1] to a single Respond node. Keeps the graph readable.responseCode defaults to 200 — even on error branches. This is its own silent trap (see RESPONSE_SHAPES.md and n8n-node-configuration NODE_FAMILY_GOTCHAS.md): an error branch that returns 200 with an error body looks like success to the caller's HTTP client, so their error handling never fires. Set responseCode explicitly on every Respond node.For any endpoint doing structured input validation, run the check as an IIFE inside a single Set node rather than a chain of IF/Switch nodes per field. One node validates the whole payload, returns { valid, validationError, details, requiredSchema }, and an IF branches on valid → your logic (200) or a 400 Respond that echoes the schema back so the caller can self-correct. It's also dramatically faster than a recursive validator in a Code node + sub-workflow. The full pattern, the constraint cookbook, and the expression-escaping gotchas live in API_WORKFLOWS.md.
A 5xx with text/plain "Internal Server Error" is technically an error response and practically useless. And not every failure is a 5xx. Match the status code to why the request failed, because the caller branches on it: their monitoring alerts on 5xx (your fault) but not 4xx (their fault), and 5xx suggests "retry" while 4xx suggests "don't".
The common mistake: wiring everything — including bad input — to one Respond that returns 500 internal_error. Now the caller can't tell their bug from your outage, and your error rates can't separate real incidents from client noise.
| Cause | Status | error code | Where it's handled |
|---|---|---|---|
| Required field missing / wrong type | 400 | validation_error | Upstream check (schema validator / IF), not error output |
| Auth missing or invalid | 401 | unauthorized | Upstream check |
| Authenticated but not allowed | 403 | forbidden | Upstream check |
| Resource ID valid in request, absent in your data | 404 | not_found | Branch on the lookup result, not its error |
| Conflicts with current state (duplicate, race) | 409 | conflict | Detect with logic |
| Caller exceeded rate limit | 429 | rate_limit_exceeded | Set Retry-After header |
| Node threw, cause unknown | 500 | internal_error | Error output path |
| Third-party API returned an error | 502 | upstream_error | Error output of the HTTP node |
| Can't process right now (downstream down) | 503 | service_unavailable | Detect specific error, hint retry |
| Third-party API timed out | 504 | upstream_timeout | Error output filtered by message |
So there are two distinct flows: 4xx is decided before the work (IF/Switch + dedicated Respond), 5xx comes out of error outputs ("we tried, it broke").
One Respond, expression-driven code. When error paths differ only by number and message (same body shape, same headers), don't fan out to N Respond nodes through a Switch. The Respond node accepts expressions in both Response Code and body — compute the code inline:
// Response Code field on a single Respond to Webhook:
{{ (() => {
const msg = $json.error?.message || $json.message || '';
if (msg.includes('INVALID_ID')) return 400;
if (/429|too many/i.test(msg)) return 429;
if (/timeout/i.test(msg)) return 504;
if (/upstream|llm|api/i.test(msg)) return 502;
return 500;
})() }}
Reserve Switch + multiple Responds for paths that diverge structurally (different headers, different body shapes, redirects). Same shape with a different number is one expression-driven Respond.
The default envelope is { "error": "<code>", "message": "<human text>" } — the HTTP status already says success-vs-failure, so no ok: false flag. Never leak internals (stack traces, SQL, upstream bodies, tokens) into the response — log those privately, return a sanitized message. Correlation IDs, retry_after, validation details, and the full do-not-leak list are in RESPONSE_SHAPES.md.
Per-node outputs handle the failures you anticipated on the nodes you remembered to wire. An error workflow catches everything else: a node you forgot to wire, a crash between nodes, a whole-workflow timeout, a trigger failure. For unattended workflows this is the safety net that turns "it silently stopped" into "an alert arrived".
Build it as a separate workflow starting with an Error Trigger node. n8n invokes it with the failure context:
{
"execution": { "id": "...", "url": "...", "lastNodeExecuted": "Fetch order",
"error": { "name": "NodeApiError", "message": "...", "timestamp": 1715000000000 } },
"workflow": { "id": "...", "name": "Sync Stripe customers" }
}
Minimal version — capture → notify:
Error Trigger → Set (build alert from execution + error) → Slack/email (post to #incidents)
A good alert includes the workflow name, a link to the editor and a link to the failed execution, the failed node name, and the real error message (not "Workflow failed"). Field expressions and the optional "fetch the failing input via the n8n node" upgrade are in ERROR_WORKFLOWS.md.
Two traps worth flagging up front:
What the community MCP can't do: assigning the error workflow (instance default or per-workflow override) is an n8n UI setting — Workflow Settings → Error Workflow. There is no MCP tool to set it. Build the error workflow with the MCP, then tell the user the exact UI step to wire it up, and to repeat it (or set the instance default) for every unattended workflow.
| Want to do | Reality |
|---|---|
| Set a workflow's Error Workflow setting | UI only (Workflow Settings → Error Workflow). No MCP tool. Build the workflow, then hand the user the UI step. |
| Toggle other workflow settings (Save Execution Data, timezone, timeout, caller policy) | UI only. n8n_update_partial_workflow has updateSettings, but the error-workflow assignment is not reliably exposed — confirm in the UI. |
| Enable instance-wide error logging (Sentry, server logs) | Instance config, outside n8n workflows entirely. |
What the MCP can do: build the error workflow, set onError/retryOnFail on nodes (updateNode/patchNodeField), wire error outputs (addConnection with sourceIndex: 1), validate (validate_workflow, n8n_validate_workflow), auto-fix common issues (n8n_autofix_workflow), test (n8n_test_workflow), and inspect failures (n8n_executions).
| Anti-pattern | What goes wrong | Fix |
|---|---|---|
onError set but error output unwired | Error silently discarded; run shows as succeeded | Wire sourceIndex: 1 to a real handler, or revert onError to stopWorkflow so it's loud |
Error output wired but onError not set | Slot never fires; handler unreachable; workflow halts on failure | Set onError: "continueErrorOutput" |
| Webhook → process → respond, no error branch | Caller gets a timeout or n8n's generic 500 | Wire every fallible node's error output to a Respond |
Error branch returns 200 with an {error} body | Caller's client reads success; their error handling never fires | Set responseCode to 4xx/5xx explicitly on error Responds |
One 500 internal_error for everything | Caller can't tell their bad input from your outage | Map cause → status (4xx caller, 5xx you) |
| Catching errors in a Code node and returning them as data | Downstream processes error-shaped data and continues | Let it throw; use onError: "continueErrorOutput" + wired path |
Network node with no retryOnFail | Every transient 429/blip surfaces as a 5xx; alerts fire on noise | retryOnFail: true, maxTries: 3, waitBetweenTries: 5000 |
| Switch → N Responds differing only by status code | 5 nodes for what's one Respond | Compute the code inline in one expression-driven Respond |
| Unattended workflow with no error workflow | A genuine failure goes nowhere | Build an Error Trigger workflow + assign it in the UI |
| Error workflow notifies the same channel the workflows monitor | Channel down → error workflow also fails → error vanishes | Use a different channel + a Data Table fallback |
Leaking $json.error (stack/SQL/tokens) into the response | Exposes internals to callers/attackers | Log privately, return a sanitized message |
| File | Read when |
|---|---|
| NODE_ERROR_OUTPUTS.md | Wiring a per-node error output on individual fallible nodes |
| API_WORKFLOWS.md | Building/reviewing a webhook → Respond workflow, including the schema validator |
| RESPONSE_SHAPES.md | Defining response body conventions, status codes, and what not to leak |
| ERROR_WORKFLOWS.md | Setting up the workflow-level catch-all for unattended workflows |
onError/retryOnFail are node config; NODE_FAMILY_GOTCHAS.md covers the Webhook/Respond response-code traps in depth.Response Code and the alert-message expressions rely on correct {{ }} syntax and $json.error access.For an API / webhook workflow:
responseMode: "responseNode"onError: "continueErrorOutput" and main[1] wiredretryOnFail: true, maxTries: 3, waitBetweenTries: 5000responseCode{ error, message } — no stack traces, SQL, or tokensn8n_get_workflow: both onError and main[1] present on each fallible nodeFor an unattended (scheduled/cron/queue) workflow:
retryOnFail configuredRemember: the default is silence. Error handling is two moves — make the failure route (per-node onError + wired output, or a catch-all error workflow) and make it speak (a status code and body that tell the truth). Half a move is worse than none, because it looks done.
npx claudepluginhub czlonkowski/n8n-skills --plugin n8n-mcp-skillsHandles errors in n8n workflows by wiring error outputs, configuring retries, and ensuring structured failure responses for webhook APIs and unattended workflows.
Designs visual n8n workflows with trigger selection, node mapping, data transformations, error handling, and webhook integration. Activates when users mention n8n workflows or automation.
Interprets and fixes n8n workflow validation errors like missing_required, invalid_value, or expression failures during iterative validate-fix cycles.