Skill

GitHub Workflow Doctor

From ai-tools

Track GitHub workflows, analyze failures, and automatically fix issues

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ai-tools:github-workflow-doctor

User invocable

Model invocable

Inline context

Default effort

Uses dynamic context injection — preprocesses shell commands at runtime

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill tracks a GitHub workflow run, analyzes failures, and attempts to fix issues automatically.

Supporting Files

README.mdpreview.mjsscripts/get-workflow-info.mjsscripts/get-workflow-logs.mjsscripts/list-running-workflows.mjsscripts/rerun-workflow.mjsscripts/wait-for-workflow.mjstests/get-workflow-info.test.mjstests/get-workflow-logs.test.mjstests/list-running-workflows.test.mjstests/rerun-workflow.test.mjstests/wait-for-workflow.test.mjs

SKILL.md

340 lines · ~3.2k tokens

Stats

LanguageJavaScript

Parent stars1

MaintenanceExcellent

Last CommitMar 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

GitHub Workflow Doctor

This skill tracks a GitHub workflow run, analyzes failures, and attempts to fix issues automatically.

How to Use

This skill can be invoked in two ways:

From another skill: Pass the workflow run ID as an argument
Standalone: The skill will ask you for the workflow run ID

Workflow

Step 1: Get Workflow Run ID

If a run_id argument was provided, use it and proceed to Step 2.

Otherwise, query the repository for running workflows:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/list-running-workflows.mjs

This returns a JSON array of running/queued workflows with details like:

Workflow name and run number
Display title (commit message or PR title)
Event type (push, pull_request, etc.) and who triggered it
Status (in_progress or queued)
How long it's been running

If running workflows are found:

Present the list to the user and ask them to choose. Format each workflow in a two-line format:

⏳ Update UI; Move scripts to miniter-utility
   container build #5475: Pull request by waynebrantley at 2:34 PM (2m 30s)

The format is:

Line 1: Status icon + display title
Line 2: workflow name #runNumber: event type by actor at trigger time (elapsed time)

Ask the user: "Which workflow would you like to track?" Options: (dynamically create one option per running workflow, showing the formatted workflow info) Plus these additional options: - "Show all recent workflows" - Include completed workflows from the last runs - "Enter a specific run ID" - User will provide a run ID manually

If the user selects "Show all recent workflows", run:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/list-running-workflows.mjs --all

Then present the expanded list and ask them to choose again.

If the user selects "Enter a specific run ID", ask them to provide the run ID.

If no running workflows are found:

Query for recent workflows including failed ones:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/list-running-workflows.mjs --include-failed

If failed workflows are found, inform the user: "No running workflows, but found failed workflows you can fix."

Present the list (including failed workflows marked with ❌) and ask them to choose.

If still no workflows, ask: Ask the user: "What would you like to do?" Options:

"Show all recent workflows" - Show the last 20 workflow runs (including successful)
"Enter a specific run ID" - Provide a run ID manually
"Exit" - Cancel the skill

Store the selected workflow run ID for use throughout the skill.

Step 1.5: Check Workflow Status

Before tracking, check if the selected workflow is already completed:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/get-workflow-info.mjs <run-id>

Check the status field in the JSON output:

If status === "completed":

Check the conclusion field
If conclusion === "success":
- Report: "✅ This workflow already completed successfully!"
- Show duration and URL
- End the skill
If conclusion === "failure":
- Report: "❌ This workflow has already failed. Skipping to failure analysis..."
- Skip to Step 4 (Analyze Failure)
If other conclusion (cancelled, skipped, etc.):
- Report the conclusion and ask user if they still want to analyze it
- If yes, go to Step 4; if no, end the skill

If status === "in_progress" or status === "queued":

Proceed to Step 2 (Track Workflow)

Step 2: Track Workflow to Completion (Skip if already completed)

Note: This step is only executed if the workflow is in_progress or queued (determined in Step 1.5)

Run the wait-for-workflow script to poll the workflow status:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/wait-for-workflow.mjs <run-id>

This will output progress updates to stderr and final status as JSON to stdout.

Inform the user:

Workflow name
Current status
Elapsed time updates every 30 seconds

Step 3: Handle Workflow Result (Only if we tracked it)

Note: This step is only executed if we tracked the workflow in Step 2

When the workflow completes, check the success field in the JSON output.

If Successful ✅

Report to the user:

✅ Workflow completed successfully
Duration: X seconds/minutes
Workflow name and URL

End the skill here.

If Failed ❌

Proceed to Step 4 for failure analysis and fixing.

Step 4: Analyze Failure

Run the get-workflow-logs script to fetch failure details:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/get-workflow-logs.mjs <run-id>

This returns a JSON object with the following structure:

runId — the workflow run ID
failedJobs — array of failed jobs, each with:
- jobName — name of the failed job
- jobId — numeric job ID
- conclusion — "failure", "timed_out", or "cancelled"
- url — link to the job on GitHub
- failedSteps — array of failed steps with name, conclusion, number
logs — full workflow log output as a string
summary — e.g. "2 job(s) failed"

CRITICAL: Analyze this JSON directly. Do NOT generate inline node -e or bash -c commands to parse the logs. The data is already structured JSON — read and interpret it in context. Inline scripts fail due to escaping issues (e.g. \! bash history expansion) and are unnecessary.

Analyze the JSON to identify:

What tests/steps failed (from failedJobs[].failedSteps)
Error messages (from logs)
Root cause of the failure
Files that likely need changes

Step 4.5: Transient Error Detection

Before attempting code fixes, check the logs for transient infrastructure errors — failures caused by temporary platform issues, not by code bugs. These are resolved by re-running the workflow, not by changing code.

Known transient error patterns (check logs string for these):

blob unknown to registry — Docker registry consistency error
TLS handshake timeout — network timeout during TLS negotiation
rate limit or API rate limit exceeded — GitHub/registry rate limiting
Could not resolve host or Temporary failure in name resolution — DNS failures
502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout — upstream server errors
unexpected EOF or connection reset by peer — connection dropped mid-transfer
i/o timeout — generic network I/O timeout
error pulling image combined with timeout/network errors — container image pull failures
Resource not accessible by integration — transient GitHub App permission errors

If a transient error is detected:

Inform the user: "This looks like a transient infrastructure error (e.g. <matched pattern>), not a code bug. Re-running the failed jobs."

Run the rerun script:

node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/rerun-workflow.mjs <run-id>

If rerunTriggered is true:
- Go to Step 2 to track the rerun to completion
- Only one automatic rerun is allowed per skill invocation — set a flag to prevent further auto-reruns
- If the rerun fails with the same transient error: report to the user that the infrastructure issue persists and manual intervention is needed. End the skill.
- If the rerun fails with a different error: proceed to Step 5 (code fix) with the new failure
- If the rerun succeeds: proceed to Step 6 (report success)
If rerunTriggered is false: report the rerun failure message to the user and proceed to Step 5 (attempt code fix anyway)

If no transient error is detected:

Proceed to Step 5 as normal.

Step 5: Attempt to Fix (Max 3 Attempts)

Initialize attempt counter: Set attempt_count = 1 and max_attempts = 3

Fix Loop

For each attempt (while attempt_count <= max_attempts):

Analyze the failure using the logs from Step 4
Identify the fix - determine what code changes are needed
Ask user if auto-fix seems uncertain:
- If you're confident about the fix, proceed
- If uncertain or complex, ask the user: "I've analyzed the failure. Should I attempt an automatic fix?" Options:
  - "Yes, attempt the fix" - Proceed with the fix
  - "No, show me the analysis first" - Display analysis and wait for user guidance
  - "Let me fix it manually" - End the skill
Make the fix: Edit the necessary files

Commit the changes:

git add <changed-files>
git commit -m "fix: address workflow failure - <brief description>"

Check workflow trigger type:
- Run node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/get-workflow-info.mjs <original-run-id>
- Check the event field in the JSON output
Handle based on trigger type:

If event === "push":
- Push the commit:
```
git push
```
- Wait a few seconds for GitHub to register the push
- Get the new workflow run ID:
```
node ${CLAUDE_PLUGIN_ROOT}/skills/github-workflow-doctor/scripts/get-workflow-info.mjs --latest "<workflow-name>"
```
- Go back to Step 2 with the new run ID
- Increment attempt_count
If event === "workflow_dispatch" or other:
- Commit and push the fix:
```
git push
```
- Inform the user: "✅ Fix has been committed and pushed. Since this workflow was triggered manually (workflow_dispatch), please re-run the workflow manually to test the fix."
- Provide the workflow URL
- Ask user: "Would you like to continue tracking after you re-run the workflow?" Options:
  - "Yes, I'll trigger it now" - Wait for user to confirm they've triggered it, then get the new run ID and go to Step 2
  - "No, I'll check it myself" - End the skill
If fix fails again:
- If attempt_count >= max_attempts:
  - Report: "❌ Maximum fix attempts (3) reached. Here's what I found:"
  - Show detailed analysis of the latest failure
  - Suggest next steps for the user
  - End the skill
- Otherwise:
  - Ask user for guidance: "The workflow failed again after my fix attempt. Would you like me to:" Options:
    - "Try another fix" - Get user's hint/guidance, then continue the loop
    - "Show detailed analysis" - Display detailed log analysis and ask for direction
    - "Stop, I'll fix it manually" - End the skill
  - If user provides guidance or asks to try again, increment attempt_count and continue the loop

Step 6: Report Success

If the workflow passes after a fix:

✅ Workflow fixed and passing!
Show what was changed
Total attempts: X
Final duration
Workflow URL

Error Handling

CRITICAL: Do NOT work around script failures.

If any script in this skill produces no output, fails, or returns unexpected results:

Report the problem to the user immediately
Do NOT invent workarounds like running raw gh commands directly
Do NOT silently continue with alternative approaches

Example of what NOT to do:

❌ "The scripts didn't produce output. Let me check the runs directly with gh."

Instead:

✅ "The get-workflow-info.mjs script produced no output. This may indicate a bug in the skill. Would you like me to investigate or should we try a different approach?"

If a script fails, ask the user how to proceed: "A skill script failed to produce output. What would you like to do?" Options:

"Investigate the script failure" - Check if the script has errors
"Try running the script again" - Retry the same command
"Exit the skill" - Stop and report the issue

Notes

The skill uses the gh CLI tool, which must be installed and authenticated
When launched standalone, the skill queries and displays running workflows to choose from
Polling interval is 15 seconds by default
Maximum of 3 automatic fix attempts to prevent infinite loops
For push-triggered workflows, fixes automatically trigger new runs
For manually triggered workflows, user must re-run after fixes

GitHub Workflow Doctor

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

GitHub Workflow Doctor

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

GitHub Workflow Doctor

How to Use

Workflow

Step 1: Get Workflow Run ID

Step 1.5: Check Workflow Status

Step 2: Track Workflow to Completion (Skip if already completed)

Step 3: Handle Workflow Result (Only if we tracked it)

If Successful ✅

If Failed ❌

Step 4: Analyze Failure

Step 4.5: Transient Error Detection

Step 5: Attempt to Fix (Max 3 Attempts)

Fix Loop

Step 6: Report Success

Error Handling

Notes

Similar Skills

GitHub Workflow Doctor

How to Use

Workflow

Step 1: Get Workflow Run ID

Step 1.5: Check Workflow Status

Step 2: Track Workflow to Completion (Skip if already completed)

Step 3: Handle Workflow Result (Only if we tracked it)

If Successful ✅

If Failed ❌

Step 4: Analyze Failure

Step 4.5: Transient Error Detection

Step 5: Attempt to Fix (Max 3 Attempts)

Fix Loop

Step 6: Report Success

Error Handling

Notes

Similar Skills