Audits SQL Server Always On Availability Group configuration correctness across all layers — prerequisites, replica design, listener architecture, backup strategy, endpoint security, distributed AG topology, Basic and Contained AG constraints, and application integration readiness. Use this skill when setting up a new AG, reviewing an existing AG design before a DR test, preparing for a failover, or investigating connection failures, listener misconfigurations, backup failures on secondaries, or endpoint certificate expiry. Applies 35 checks (F1–F35) across 7 categories. Trigger for questions about AG prerequisites, session timeout, failure condition level, read-only routing configuration, MultiSubnetFailover, backup preferred replica, distributed AG setup, Basic AG limits, Contained AG, or endpoint encryption. Companion to /sqlhadr-review (runtime health) and /sqlclusterlog-review (WSFC events).
How this skill is triggered — by the user, by Claude, or both
Slash command
/mssql-performance-skills:sqlag-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audit the configuration and design of one or more SQL Server Always On Availability Groups.
Audit the configuration and design of one or more SQL Server Always On Availability Groups. Applies 35 checks (F1–F35) across seven categories:
Scope distinction: This skill audits configuration correctness ("is the AG designed right?").
Use /sqlhadr-review (H1–H27) for runtime health ("is the AG healthy right now?") and
/sqlclusterlog-review (L1–L30) for WSFC cluster log events.
Accept any of:
Run the following on the primary replica to collect the required data.
Query 1 — Instance and AG overview
SELECT
SERVERPROPERTY('IsHadrEnabled') AS hadr_enabled,
SERVERPROPERTY('ProductVersion') AS product_version,
SERVERPROPERTY('Edition') AS edition,
ag.name AS ag_name,
ag.failure_condition_level,
ag.health_check_timeout,
ag.automated_backup_preference_desc,
ag.db_failover,
ag.basic_features,
ag.is_contained,
ag.required_synchronized_secondaries_to_commit
FROM sys.availability_groups ag;
Query 2 — Replica configuration
SELECT
ag.name AS ag_name,
ar.replica_server_name,
ar.availability_mode_desc,
ar.failover_mode_desc,
ar.session_timeout,
ar.primary_role_allow_connections_desc,
ar.secondary_role_allow_connections_desc,
ar.backup_priority,
ar.seeding_mode_desc,
ar.endpoint_url,
ar.read_only_routing_url
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
ORDER BY ag.name, ar.replica_server_name;
Query 2b — Replica join state (F11 — from DMV, not catalog view)
SELECT
replica_server_name,
join_state_desc -- NOT_JOINED | JOINED_STANDALONE | JOINED_FCI
FROM sys.dm_hadr_availability_replica_cluster_states
ORDER BY replica_server_name;
Query 3 — Listener and IP configuration
SELECT
ag.name AS ag_name,
agl.dns_name,
agl.port,
agl.is_conformant, -- F35: 0 = mismatch with cluster resource
aglip.ip_address,
aglip.ip_subnet_mask,
aglip.state_desc AS ip_state -- ONLINE | OFFLINE | ONLINE_PENDING | FAILED
FROM sys.availability_groups ag
JOIN sys.availability_group_listeners agl ON ag.group_id = agl.group_id
JOIN sys.availability_group_listener_ip_addresses aglip ON agl.listener_id = aglip.listener_id;
Query 4 — Mirroring endpoint
SELECT
name,
state_desc,
role_desc,
connection_auth_desc,
encryption_algorithm_desc,
encryption_desc,
port
FROM sys.database_mirroring_endpoints;
Query 5 — AG database recovery models
SELECT
adc.ag_database_id,
db.name AS database_name,
db.recovery_model_desc,
db.is_read_committed_snapshot_on,
db.state_desc
FROM sys.availability_databases_cluster adc
JOIN sys.databases db ON adc.database_id = db.database_id
ORDER BY db.name;
Query 6 — Endpoint certificates (certificate auth only)
SELECT
name,
subject,
expiry_date,
pvt_key_encryption_type_desc,
thumbprint
FROM sys.certificates
WHERE pvt_key_encryption_type_desc IS NOT NULL
ORDER BY expiry_date;
| Threshold | Value | Used by |
|---|---|---|
| Failure condition level — too permissive | = 1 → Warning | F5 |
| Failure condition level — too aggressive | = 5 → Warning | F5 |
| Health check timeout — too aggressive | < 15,000 ms → Warning | F9 |
| Synchronous-commit replicas | ≥ 4 total (incl. primary) → Warning | F7 |
| WAN session timeout | < 30 sec on ASYNC replica → Warning | F8 |
| Certificate expiry — near | < 90 days → Warning | F25 |
| Certificate expiry — imminent | < 30 days → Critical | F25 |
| Backup priority tie | All eligible secondaries at 50 with SECONDARY preference → Warning | F10 |
Evaluate these first. Missing prerequisites prevent AG operation entirely.
SERVERPROPERTY('IsHadrEnabled') = 0Enable-SqlAlwaysOn -ServerInstance '<instance>' -Restart (requires SqlServer PowerShell module).sys.availability_databases_cluster shows
recovery_model_desc != 'FULL' in sys.databasesALTER DATABASE [db] SET RECOVERY FULL;
BACKUP DATABASE [db] TO DISK = N'\\backup\db.bak';
Log backups must follow to maintain the log chain required by AG log shipping.sys.database_mirroring_endpoints OR state_desc != 'STARTED'CREATE ENDPOINT [Hadr_endpoint] STATE = STARTED AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES,
AUTHENTICATION = WINDOWS NEGOTIATE);
If the endpoint exists but is stopped: ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;sys.database_mirroring_endpoints.encryption_desc = 'DISABLED' (Critical) or
encryption_desc = 'SUPPORTED' (Warning — allows plaintext if the peer does not enforce
encryption)ALTER ENDPOINT [Hadr_endpoint] FOR DATABASE_MIRRORING
(ENCRYPTION = REQUIRED ALGORITHM AES);
Both endpoints must be changed to REQUIRED before either can fully enforce AES.sys.availability_groups.failure_condition_level = 1 (only a complete SQL Server
service failure or lease expiry triggers automatic failover — resource pressure, spinlocks, and
write-access violations are ignored) or = 5 (any qualified failure condition, including
exhaustion of worker threads and unsolvable deadlocks, triggers failover)ALTER AVAILABILITY GROUP [ag] SET (FAILURE_CONDITION_LEVEL = 3);These checks surface design choices that increase commit latency, risk false disconnections, or leave databases unjoinable.
availability_mode_desc = 'SYNCHRONOUS_COMMIT'
(including the primary) ≥ 4ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'server'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);sys.availability_replicas.session_timeout < 30 on any replica with
availability_mode_desc = 'ASYNCHRONOUS_COMMIT'10 ms round-trip time can trigger spurious DISCONNECTED state and health alerts. Increase the timeout:
ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'server'WITH (SESSION_TIMEOUT = 30);— minimum 5 seconds; 30–60 seconds for WAN.
sys.availability_groups.health_check_timeout < 15000 (ms)sp_server_diagnostics is called every health_check_timeout / 3 seconds. A value
below 15,000 ms calls it more than every 5 seconds, adding unnecessary overhead and risking
false-positive failovers. The default is 30,000 ms. Increase for WAN topologies:
ALTER AVAILABILITY GROUP [ag] SET (HEALTH_CHECK_TIMEOUT = 30000);sys.availability_groups.automated_backup_preference_desc IN ('SECONDARY', 'SECONDARY_ONLY') AND all eligible secondary replicas have
backup_priority = 50 (the default)sys.fn_hadr_backup_is_preferred_replica() may select different replicas
across runs, causing log chain fragmentation. Set distinct priorities:
ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'server'
WITH (BACKUP_PRIORITY = 70); — higher value = more preferred (0–100).sys.dm_hadr_availability_replica_cluster_states has join_state_desc = 'NOT_JOINED' (valid joined values are JOINED_STANDALONE for standalone instances and
JOINED_FCI for failover cluster instances)ALTER AVAILABILITY GROUP [ag] JOIN;
Then restore databases with NORECOVERY and join each:
ALTER DATABASE [db] SET HADR AVAILABILITY GROUP = [ag];join_state_desc = 'JOINED' but fewer database rows appear in
sys.availability_databases_cluster on the secondary than on the primary for this AGRESTORE DATABASE [db] FROM DISK = '...' WITH NORECOVERY, REPLACE;
then join: ALTER DATABASE [db] SET HADR AVAILABILITY GROUP = [ag];
Or use automatic seeding if seeding_mode_desc = 'AUTOMATIC' is set.secondary_role_allow_connections_desc = 'NO'ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'server'
WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));
Enable RCSI on the primary to prevent redo blocking by readers:
ALTER DATABASE [db] SET READ_COMMITTED_SNAPSHOT ON;endpoint_url values contain IP addresses on distinct subnets (different
network prefixes) AND sys.availability_group_listener_ip_addresses has fewer IP rows than
distinct subnets represented by the replicasALTER AVAILABILITY GROUP [ag] MODIFY LISTENER N'listener'
(ADD IP (N'10.1.2.10', N'255.255.255.0'));
Also ensure application connection strings include MultiSubnetFailover=True.secondary_role_allow_connections_desc IN ('READ_ONLY', 'ALL') AND
read_only_routing_url IS NULL on that replicaALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'secondary'
WITH (SECONDARY_ROLE
(READ_ONLY_ROUTING_URL = N'TCP://secondary.domain.com:1433'));
Use the FQDN and the SQL Server port (1433 or custom), not the endpoint port (5022).READ_ONLY_ROUTING_LIST configured (detectable when
read_only_routing_url is set on secondaries but no routing list row appears)ApplicationIntent=ReadOnly connections via the listener will land on the primary
unless the primary's routing list redirects them. Configure on each replica (when acting as
primary):
ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'primary'
WITH (PRIMARY_ROLE
(READ_ONLY_ROUTING_LIST = ('secondary1', 'secondary2')));sys.availability_group_listeners.port != 1433Server=listener-name,<port>; — verify that all application configurations and ODBC DSNs
include the port. Confirm firewalls and load balancers allow the custom port from all
application server subnets.sys.availability_group_listener_ip_addresses.state_desc = 'OFFLINE' on one
or more listener IP rows (indicating a standby-subnet IP in a multi-subnet VNN listener)MultiSubnetFailover=True so that the driver attempts all IPs simultaneously during
failover, reducing failover detection time from minutes to seconds.sys.availability_groups.automated_backup_preference_desc = 'NONE'sys.fn_hadr_backup_is_preferred_replica() always returns 1
on every replica, making every replica a backup candidate simultaneously. This causes
duplicate backups and log chain conflicts unless backup jobs explicitly coordinate target
replicas. Set to SECONDARY_ONLY or SECONDARY to enable automatic preferred-replica
arbitration: ALTER AVAILABILITY GROUP [ag]
SET (AUTOMATED_BACKUP_PREFERENCE = SECONDARY_ONLY);automated_backup_preference_desc IN ('SECONDARY', 'SECONDARY_ONLY') AND
backup job step content (from description or msdb.dbo.sysjobsteps) contains no reference
to sys.fn_hadr_backup_is_preferred_replicaIF sys.fn_hadr_backup_is_preferred_replica(DB_NAME()) = 1
BEGIN BACKUP DATABASE [db] TO DISK = N'...' WITH COMPRESSION, STATS = 10; ENDmsdb.dbo.sysjobsteps or confirmed via descriptionsys.fn_hadr_backup_is_preferred_replica()
selects. Without log backups, the transaction log grows unbounded on the primary. Schedule
log backups every 15–60 minutes depending on RPO requirements. The log chain is maintained
regardless of which replica runs the backup.sp_configure 'backup compression default' = 0
on that instanceEXEC sp_configure 'backup compression default', 1; RECONFIGURE;
CPU impact on a secondary does not affect primary commit latency.automated_backup_preference_desc = 'PRIMARY' AND 3 or more replicas
are configuredALTER AVAILABILITY GROUP [ag]
SET (AUTOMATED_BACKUP_PREFERENCE = SECONDARY_ONLY);
Set backup_priority values to control which secondary is preferred.sys.database_mirroring_endpoints.connection_auth_desc contains WINDOWS
AND replica endpoint_url values suggest replicas in different DNS domains or workgroup
(no common domain suffix)ALTER ENDPOINT [Hadr_endpoint] FOR DATABASE_MIRRORING
(AUTHENTICATION = CERTIFICATE [hadr_cert]);
Create and exchange certificates between replicas before altering the endpoint.expiry_date < DATEADD(day, 90, GETDATE()) in sys.certificatessys.database_mirroring_endpoints.encryption_algorithm_desc = 'RC4'ALTER ENDPOINT [Hadr_endpoint] FOR DATABASE_MIRRORING
(ENCRYPTION = REQUIRED ALGORITHM AES);
This must be changed on all replicas. A brief endpoint restart may be required.sys.database_mirroring_endpoints) is blocked by a firewall, network ACL, or
security groupAVAILABILITY GROUP ON clause in the creation script or
description references a SQL Server instance name (e.g., TCP://SERVER01:5022) rather
than a listener DNS name (e.g., TCP://ag-listener.domain.com:5022)CREATE AVAILABILITY GROUP [DistributedAG] WITH (DISTRIBUTED)
AVAILABILITY GROUP ON 'LocalAG' WITH
(LISTENER_URL = N'TCP://local-listener.domain.com:5022', ...);sys.availability_groups.basic_features = 1 AND COUNT of databases in
sys.availability_databases_cluster for this AG > 1sys.availability_groups.basic_features = 1 AND any replica shows
secondary_role_allow_connections_desc != 'NO'secondary_role_allow_connections setting.
Do not route ApplicationIntent=ReadOnly connections to a Basic AG listener.sys.availability_groups.is_contained = 1 AND
sys.database_mirroring_endpoints.connection_auth_desc contains WINDOWS — SQL Server
2022+ only; skip if SQL version < 2022AVAILABILITY_MODE = SYNCHRONOUS_COMMIT on the
inter-AG link AND no recent failover or migration activity is evident — SQL Server 2016+
only; skip if SQL version < 2016ALTER AVAILABILITY GROUP [DistributedAG]
MODIFY AVAILABILITY GROUP ON 'SecondaryAG'
WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);
Use SYNCHRONOUS_COMMIT only during the planned failover procedure window.USE [other_db] — and those databases are not
members of the same AGsys.dm_xe_sessions contains no session with events targeting
alwayson_* event channels or AG-specific events such as
availability_group_lease_expired or availability_replica_state_changeCREATE EVENT SESSION [AG_Diagnostics] ON SERVER
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.hadr_log_block_send_complete
ADD TARGET package0.ring_buffer (SET max_memory = 51200)
WITH (MAX_DISPATCH_LATENCY = 5 SECONDS);
ALTER EVENT SESSION [AG_Diagnostics] ON SERVER STATE = START;sys.availability_group_listeners.is_conformant = 0 for any listener rowIf the SQL Server version is provided, read VERSION_COMPATIBILITY.md. For checks that
require a minimum version above the instance version: verbose mode → log as
SKIP (version: requires SQL 20XX+, instance is SQL 20YY); standard report → omit
entirely. Version-gated checks: F31 (SQL 2022+), F32 (SQL 2016+).
## AG Configuration Analysis
### Summary
- X Critical, Y Warnings, Z Info
- Availability group: [ag_name] | Replicas: N | Version: [product_version] | Edition: [edition]
- Highest-risk finding: [check name and ID]
### Critical Issues
### [C1 — F3] Mirroring Endpoint Not Started — PROD-SQL02
- **Observed:** sys.database_mirroring_endpoints.state_desc = 'STOPPED' on PROD-SQL02
- **Impact:** Replica cannot receive or send AG log stream; AG is broken until endpoint restarts.
- **Fix:** ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
### Warnings
### [W1 — F8] WAN Async Replica Session Timeout Too Low — DR-SQL01
- **Observed:** session_timeout = 10 on ASYNC replica DR-SQL01
- **Impact:** 10-second timeout causes false DISCONNECTED state on WAN links > 10ms RTT.
- **Fix:** ALTER AVAILABILITY GROUP [ag] MODIFY REPLICA ON N'DR-SQL01' WITH (SESSION_TIMEOUT = 30);
### Info
### [I1 — F13] No Readable Secondary Configured
- **Observed:** All secondary replicas have secondary_role_allow_connections_desc = 'NO'
- **Impact:** Reporting and read-scale workloads cannot be offloaded to secondaries.
- **Fix:** Configure ALLOW_CONNECTIONS = READ_ONLY and enable RCSI on the primary database.
### Passed Checks
| Check | Result |
|-------|--------|
| F1 — AlwaysOn Feature Disabled | PASS — IsHadrEnabled = 1 |
| F2 — AG Database Not in FULL Recovery | PASS — all AG databases in FULL recovery |
Include a Prioritized Action Order table after all findings:
### Prioritized Action Order
| Priority | Action | Resolves | Effort |
|----------|--------|----------|--------|
| 1 — Immediately | Start mirroring endpoint on PROD-SQL02 | C1 | 2 min |
| 2 — Today | Increase session_timeout on DR-SQL01 to 30 seconds | W1 | 5 min |
| 3 — This sprint | Configure readable secondary and RCSI | I1 | 30 min |
---
*Analyzed by: [AI model and version] · [date and time, local timezone or UTC]*
--brief — Omit the Passed Checks table and attribution footer. Output Summary, Findings,
and Prioritized Action Order only.
--critical-only — Suppress Warning and Info findings. Omit Passed Checks table.
Use during incidents when only blocking issues matter.
Both flags can be combined: --brief --critical-only produces Summary and Critical findings only.
When the request includes --verbose, --trace, or the word verbose:
Append a ## Check Evaluation Log section after the Passed Checks table:
| Check | Evidence | Threshold | Result |
|---|---|---|---|
| [ID — Name] | [key attribute(s) or "absent"] | [threshold or condition] | PASS / FIRE → severity / NOT ASSESSED |
Save both files to the working directory:
output/sqlag-review/<YYYY-MM-DD-HHmmss>-<input-prefix>/analysis.md
output/sqlag-review/<YYYY-MM-DD-HHmmss>-<input-prefix>/trace.md
Derive <input-prefix> from the AG name, filename stem, or run as fallback.
msdb.dbo.sysjobsteps; if not provided, note as
NOT ASSESSED and recommend manual review./sqlhadr-review — Runtime health of the AG (disconnected replicas, lag, queue sizes,
sync state). Run alongside this skill to get both configuration correctness and current
health state in one analysis session.
/sqlclusterlog-review — WSFC cluster log analysis (lease timeouts, quorum loss, node
eviction, AG resource transitions). Essential after a failover event.
/sqlerrorlog-review — SQL Server ERRORLOG analysis including AG failover events, lease
expiry messages, and secondary lag signals.
/sqlspn-review — SPN and Kerberos delegation analysis for AG listeners. Run when
Kerberos authentication fails through the listener (double-hop, constrained delegation).
/sqlencryption-review — Full encryption posture audit including endpoint certificates,
TDE on AG databases, and backup encryption.
mssql-performance-review — Orchestrator that routes mixed artifacts to multiple specialised skills (this one included). Use when you have several artifact types together or describe a symptom without knowing which skill to run.
Provides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub vanterx/mssql-performance-skills --plugin mssql-performance-skills