From ai-agent-skills
Triage Microsoft Dynamics AX 2012 / R3 AOS (Ax32Serv.exe) crashes and the RPC session-allocation cascade. The bundled script collects, server-side, AOS process crashes (Application 1000 / Ax32Serv.exe), unexpected service terminations (System 7031 / "Object Server"), and the kernel session precursors (Dynamics Server 110 "Session Allocation Failed: already allocated" + 180 "invalid session ID"), then correlates AOS terminations with AX rich-client (Ax32.exe) crashes on the RDS/Citrix farm to expose the "AOS dies -> every client drops at once -> reconnect storm" cascade. Use when an AX AOS service crashed/restarted, users got disconnected en masse, or you see Event 7031/110/180 — e.g. "why did the AOS crash", "AX kicked everyone out", "Ax32Serv keeps restarting". Do NOT use for AX SQL/database slowness (that is ax2012-sql-performance) or generic non-AX event-log triage (that is win-eventlog-triage). Requires PowerShell 7+ and WinRM + an admin credential (always prompted).
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-agent-skills:ax2012-aos-crash-triageThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Targets **Dynamics AX 2012 / R3 AOS servers** (and, optionally, the **RDS/Citrix
Targets Dynamics AX 2012 / R3 AOS servers (and, optionally, the RDS/Citrix session hosts running the AX rich client) over WinRM. The bundled script
scripts/Invoke-AosCrashTriage.ps1collects the crash + session signals as JSON and correlates them; the agent writes the root-cause narrative. No LLM is called; nothing is changed (read-only).
SCRIPT = this skill's scripts/Invoke-AosCrashTriage.ps1. Requires PowerShell 7+
(pwsh) and WinRM on the targets.
A win-eventlog-triage sweep shows that there are 7031/1000 errors. This skill encodes the
AX-specific crash taxonomy and gathers the evidence needed to root-cause correctly instead
of guessing:
0xc0000005 access violations vs Event 110
"Session Allocation Failed" forced terminations — and labels Event 180 as a by-design symptom,
so they are never conflated into a false "session caused the crash" story;Ax32Serv.exe (WER LocalDumps) plus recent change signals (hotfixes, boot) — because the
honest root cause comes from a dump or a change-correlation, not from event timing; andAlways prompts via Get-Credential (held in memory for the run, never stored). The
-Credential parameter is a test/automation seam only — don't put a password on the
command line.
Agent / non-interactive runners:
Get-Credentialneeds an interactive console. Launch in a visible window and read the-OutFile, e.g.Start-Process pwsh -ArgumentList '-NoExit','-File','SCRIPT','-AosComputerName','AOS01.contoso.local,AOS02.contoso.local','-OutFile','C:\ops\aos.json'. From a non-domain client use the FQDN (matches a*.domainTrustedHosts entry; a short name won't) +-Authentication Negotiate.
| Want | Pass |
|---|---|
| AOS servers | -AosComputerName AOS01,AOS02 or -AosListFile C:\ops\aos.txt |
| + client/RDS hosts for cascade correlation | -ClientComputerName RDS01,RDS02 or -ClientListFile C:\ops\rds.txt |
| Look-back window | -Hours 24 (default) — computed on each target (TZ-safe) |
| Cascade match window | -CascadeWindowSeconds 120 (how close a client crash must be to an AOS termination to count) |
| Save full report | -OutFile C:\ops\aos-triage.json (stdout becomes the compact view) |
| Transport/auth | -UseSSL · -Authentication Negotiate|Kerberos|CredSSP |
| Tuning | -MaxEvents 2000 (cap per query per host) · -MaxMessageLength 600 · -ThrottleLimit 8 |
# Client-AOS pair, last 24h
pwsh -File SCRIPT -AosComputerName AOS01,AOS02
# Full picture: AOS cluster + RDS farm, last 6h, detail to a file
pwsh -File SCRIPT -AosListFile C:\ops\aos.txt -ClientListFile C:\ops\rds.txt -Hours 6 -OutFile C:\ops\aos-triage.json
-OutFile → full JSON on stdout.-OutFile → full detail to the file; a compact JSON (status, query, summary —
incl. crash_timeline, cascade_correlation, failures) on stdout. Prefer -OutFile for
big sweeps.Key fields: summary.crash_timeline (AOS-down events tagged by class:
access_violation / forced_termination / scm_7031, time-sorted), summary.cascade_correlation
(per AOS event: client crashes within CascadeWindowSeconds), the per-class totals
(access_violation_total, forced_termination_total, session_symptom_total,
client_crash_total), summary.dump_capture_ready_hosts / dump_capture_missing, and
summary.caveats (the correlation-not-causation guardrails). Per-host aos[] carries
access_violations (with exception_meaning/offset), forced_terminations,
scm_terminations, session_symptoms (labelled by-design), dump_readiness/wer_config, and
recent_hotfixes. All times are UTC (Z). See REFERENCE.md for the full
schema and the AX-interpretation guide.
Golden rule: report correlation, not causation. The event log can tell you the crash class and signature, not which code path faulted. Never output "root cause: a client presented a session ID that already existed." Read
summary.caveatsbefore writing anything.
summary.crash_timeline is tagged by class):
access_violation (Event 1000 / 0xc0000005): report the fault signature
(module, exception, exception_meaning, offset) and whether the offset is
identical across crashes (deterministic code path). The event has no call stack —
you cannot name the faulting AX code from it.forced_termination (Event 110 / "Session Allocation Failed: already allocated"): the
AOS kernel deliberately self-terminated on a session-id collision. This is the
proximate cause of that exit, but it is downstream of orphaned SysClientSessions
rows from a prior AOS↔DB interruption / dead cluster node — investigate that, not the
collision message. It is a different crash class from an access violation.session_symptoms (Event 180) as by-design, never as a cause — they don't crash
the AOS.cascade_correlation shows client crashes clustered
at AOS-down times, report it as "clients dropped when AOS X went down" (effect), and note the
reconnect-storm can re-trigger a forced_termination — but the first AOS-down event's cause
still needs evidence.aos[].dump_readiness — if WER LocalDumps isn't configured for Ax32Serv.exe,
give the remediation so the next crash is captured (this is the durable fix).aos[].recent_hotfixes / lastboot — many 0xc0000005 crashes are a
config/deployment/Windows-update regression, found by change correlation, not a dump.forced_termination → fix orphaned sessions / the AOS↔DB link first;
access_violation with a correlatable change → validate that; signature-less
access_violation → capture a dump and analyse the symbolized stack (WinDbg !analyze -v
/ DebugDiag), or open a Microsoft case with the dump.summary.failures (unreachable/auth_failed) and any
truncated: true host — never imply full coverage.See REFERENCE.md for the AX-specific traps (cause/effect direction,
the deterministic-offset tell, batch vs client AOS, the orphaned-session root cause, and the
operational remoting foot-guns). Environment-specific traps (real host names, local
quirks) go in gotchas.local.md in this folder — read it at the start of a run if present,
and append new local pitfalls there (it is gitignored and survives skill updates), never to
the committed docs.
See REFERENCE.md: confirm coverage (no failed/truncated hosts) and ground-truth a finding against the raw event before reporting; after any remediation (coordinated AOS restart, session cleanup, network fix), re-run the same window and confirm no new 7031/1000.
Get-Credential cancelled → re-run and supply the admin credential.auth_failed / unreachable → see the hint field (TrustedHosts/FQDN, Kerberos
no-authority, or WinRM/DNS reachability). Per-host failures never abort the sweep.pwsh not found → install PowerShell 7 (winget install Microsoft.PowerShell).Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub whobat/ai-agent-skills --plugin ai-agent-skills