Skill

harshjudge

E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios — frontend (browser), backend (API), or CLI. Activates for tasks involving E2E tests, test scenario creation, test execution with evidence capture, or checking test status.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harshjudge:harshjudge

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

AI-native E2E testing with CLI commands and evidence capture.

Supporting Files

assets/prd.mdreferences/create.mdreferences/iterate.mdreferences/run-step-agent.mdreferences/run-tools.mdreferences/run.mdreferences/setup.mdreferences/status.md

SKILL.md

171 lines · ~1.6k tokens

Stats

LanguageTypeScript

Stars0

MaintenanceExcellent

Last CommitMar 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

HarshJudge E2E Testing

AI-native E2E testing with CLI commands and evidence capture.

CLI Setup

The CLI is bundled with this plugin. At the start of each session, locate the plugin and set an alias:

HJ_PATH=$(python3 -c "import json; d=json.load(open('$HOME/.claude/plugins/installed_plugins.json')); print([v[0]['installPath'] for k,v in d['plugins'].items() if 'harshjudge' in k][0])")
alias harshjudge="node $HJ_PATH/dist/cli.js"

Then all commands below work as harshjudge <command>. No npm or npx needed.

Core Principles

Evidence First: Capture evidence appropriate to the step type — screenshots for frontend, response bodies for API, stdout for CLI
Fail Fast: Stop on error, report with context
Complete Runs: Always call harshjudge complete-run, even on failure
Step Isolation: Each step executes in its own spawned agent for token efficiency
Knowledge Accumulation: Learnings go to prd.md, not scenarios

Step-Based Execution

HarshJudge uses a step-based agent pattern for token-efficient test execution:

Main Agent                    Step Agents (spawned per step)
    │
    ├─ harshjudge start <scenarioSlug>
    │      ↓
    │  Returns: runId, steps[]
    │
    ├─► Spawn Agent: Step 01 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   harshjudge complete-step <runId> --step 01 --status pass
    │      │
    ├─► Spawn Agent: Step 02 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   harshjudge complete-step <runId> --step 02 --status pass
    │      │
    │   ... (repeat for each step)
    │
    └─ harshjudge complete-run <runId> --status pass

Benefits:

Each step agent has isolated context (no token accumulation)
Large outputs (screenshots, logs) saved to files, not returned
Main agent only receives concise summaries
Automatic token optimization without manual management

Workflows

Intent	Reference	Key Commands
Initialize project	references/setup.md	`harshjudge init`
Create scenario	references/create.md	`harshjudge create`
Run scenario	references/run.md	`harshjudge start`, `harshjudge complete-step`, `harshjudge complete-run`
Fix failed test	references/iterate.md	`harshjudge status`, `harshjudge create`
Check status	references/status.md	`harshjudge status`

Project Structure

.harshJudge/
  config.yaml              # Project configuration
  prd.md                   # Product requirements (from assets/prd.md template)
  scenarios/{slug}/
    meta.yaml              # Scenario definition + run statistics
    steps/                 # Individual step files
      01-step-slug.md      # Step 01 details
      02-step-slug.md      # Step 02 details
      ...
    runs/{runId}/          # Run history
      result.json          # Run result with per-step data
      step-01/evidence/    # Step 01 evidence
      step-02/evidence/    # Step 02 evidence
      ...
  snapshots/               # Inspection tool outputs (token-saving pattern)

Quick Reference

HarshJudge CLI Commands

Command	Purpose
`harshjudge init <name>`	Initialize project (creates .harshJudge/)
`harshjudge create <slug>`	Create/update scenario with step files
`harshjudge star <slug>`	Toggle/set scenario starred status
`harshjudge start <slug>`	Start test run, returns step list
`harshjudge evidence <runId>`	Capture evidence for a step
`harshjudge complete-step <runId>`	Complete a step, get next step ID
`harshjudge complete-run <runId>`	Finalize run with status
`harshjudge status [slug]`	Check project or scenario status
`harshjudge discover tree [path]`	Browse .harshJudge/ structure
`harshjudge discover search <pattern>`	Search file content
`harshjudge dashboard open/close/status`	Manage dashboard server

Step Types

Each step declares its execution mode via type in the step file frontmatter:

Type	Tools	Evidence Captured
`frontend`	Browser tool (auto-detected)	screenshot, console_log, network_log, html_snapshot
`backend`	Bash (curl/httpie)	api_response, api_headers, db_snapshot
`cli`	Bash	stdout, stderr, exit_code

If type is omitted, the step agent infers from the step content.

See run-tools.md for tool-specific guidance per type.

Step Agent Prompt Template

When spawning an agent for each step:

Execute step {stepId} of scenario {scenarioSlug}:

## Step Content
{content from steps/{stepId}-{slug}.md}

## Step Type
{type from step frontmatter, or infer from content: frontend|backend|cli}

## Project Context
Base URL: {from config.yaml}
Services: {from prd.md — list of services under test}

## Previous Step
Status: {pass|fail|first step}

## Your Task
1. Read the step type from frontmatter (frontend/backend/cli)
2. Execute the actions using the appropriate tool:
   - frontend: use available browser tool
   - backend: use curl/httpie via Bash
   - cli: run commands via Bash
3. Capture evidence appropriate to the step type
4. Record evidence: harshjudge evidence <runId> --step {stepNumber} --type <evidence_type> --name <name> --data <path_or_data>

Return ONLY a JSON object:
{
  "status": "pass" | "fail",
  "evidencePaths": ["path1", "path2"],
  "error": null | "error message",
  "summary": "Brief description of what happened and result (1-2 sentences)"
}

DO NOT return full evidence content. DO NOT explain your work.

Error Handling

On ANY error:

STOP - Do not proceed
Report - Command, params, error, resolution
Check prd.md - Is this a known pattern?
Do NOT retry - Unless user instructs

harshjudge

Invocation

Context Preview

Supporting Files

SKILL.md

harshjudge

Invocation

Context Preview

Supporting Files

SKILL.md

HarshJudge E2E Testing

CLI Setup

Core Principles

Step-Based Execution

Workflows

Project Structure

Quick Reference

HarshJudge CLI Commands

Step Types

Step Agent Prompt Template

Error Handling

Similar Skills

HarshJudge E2E Testing

CLI Setup

Core Principles

Step-Based Execution

Workflows

Project Structure

Quick Reference

HarshJudge CLI Commands

Step Types

Step Agent Prompt Template

Error Handling

Similar Skills