By harshitAgr
Drive Codabench ML-benchmark workflows from Claude Code: search competitions, download datasets, submit, poll, read leaderboards.
MCP server for the Codabench REST API. Lets an AI agent drive a full participant ML-benchmark workflow: discover competitions, read rules, download data, submit, poll, read leaderboards.
Curated tools for the participant path:
search_competitions, get_competition, list_competition_phases,
get_phase, get_competition_rules, list_competition_tasks, get_taskdownload_dataset (streaming + SHA-256)list_my_submissions, get_submission, get_submission_logssubmit_to_phase (handles Codabench's 3-step upload flow)poll_submission (backoff + timeout + non-error "still_running")get_leaderboard, get_my_profilecodabench_request — generic REST escape hatch (GET-only by default)You need a Codabench API token. Two ways to get one:
api-token-auth endpoint → Try it outtoken value from the responsecurl -X POST https://www.codabench.org/api/api-token-auth/ \
-H "Content-Type: application/json" \
-d '{"username":"YOUR_USERNAME","password":"YOUR_PASSWORD"}'
The response is {"token": "..."}.
uvx codabench-mcp
To run the bleeding-edge main branch instead of the last release:
uvx --from git+https://github.com/harshitAgr/codabench-mcp codabench-mcp
Or for development:
git clone https://github.com/harshitAgr/codabench-mcp.git
cd codabench-mcp
uv sync
uv run codabench-mcp # requires CODABENCH_API_TOKEN
claude_desktop_config.json){
"mcpServers": {
"codabench": {
"command": "uvx",
"args": ["codabench-mcp"],
"env": {
"CODABENCH_API_TOKEN": "paste-your-token-here"
}
}
}
}
/plugin marketplace add harshitAgr/codabench-mcp
/plugin install codabench-mcp@codabench-mcp
Then export your token in the shell where you launch Claude Code:
export CODABENCH_API_TOKEN=paste-your-token-here
If you'd rather skip the plugin layer:
claude mcp add codabench \
--env CODABENCH_API_TOKEN=paste-your-token-here \
-- uvx codabench-mcp
| Variable | Required | Default | Purpose |
|---|---|---|---|
CODABENCH_API_TOKEN | yes | — | DRF token, sent as Authorization: Token <token> |
CODABENCH_BASE_URL | no | https://www.codabench.org | Override for tests |
CODABENCH_ALLOW_WRITE_RAW | no | 0 | Set to 1 to allow non-GET methods through codabench_request |
CODABENCH_MAX_DOWNLOAD_BYTES | no | 5368709120 (5 GB) | Cap for download_dataset |
uv sync
uv run pytest # unit tests, no network
uv run ruff check . # lint
uv run ruff format --check . # format check
MIT — see LICENSE.
mcp-name: io.github.harshitAgr/codabench-mcp
Requires secrets
Needs API keys or credentials to function
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub harshitagr/codabench-mcp --plugin codabench-mcpA growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Persistent file-based planning for AI coding agents. Crash-proof markdown plans (task_plan.md, findings.md, progress.md) that survive context loss and /clear, with an opt-in completion gate and multi-agent shared state. Manus-style. Works with Claude Code, Codex CLI, Cursor, Kiro, OpenCode and 60+ agents via the SKILL.md standard. Includes Arabic, German, Spanish, and Chinese (Simplified and Traditional).
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Browser automation and end-to-end testing MCP server by Microsoft. Enables Claude to interact with web pages, take screenshots, fill forms, click elements, and perform automated browser testing workflows.
Reliable automation, in-depth debugging, and performance analysis in Chrome using Chrome DevTools and Puppeteer
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.