Marketplace

file-format-token-accuracy-benchmark

Benchmarking framework to analyse tokenization per file format.

npx claudepluginhub thoeltig/file-format-token-accuracy-benchmark

README

View full README on GitHub

1 Plugin

file-format-benchmark

1·

Benchmarking framework to analyse tokenization per file format.

2mo

v1.5.4.0

thoeltig

Stats

Plugins1

Stars1

UpdatedMay 20, 2026

Links

View on GitHub View Marketplace JSON

File Format Token Accuracy Benchmark

Comprehensive benchmarking suite for measuring token usage and retrieval accuracy across file formats for LLM consumption to find the most efficient format.

Overview

This framework validates which file formats deliver the most reliable information to LLMs with optimal token efficiency. The benchmark focuses on structural and retrieval accuracy understanding data organization and extracting specific values which are fundamental to avoiding context confusion.

Note on Question Categories: Complex filtering and mathematical aggregation are interesting but test the model's intelligence rather than format effectiveness. Also the accuracy for filtering and aggregation would be irrelevant if the data is prefiltered or the data contains fields with the calculated values. Structural questions (understanding data shape/organization) and retrieval questions (extracting specific values) directly validate format clarity.

Initial Test Run & Methodology

The initial benchmark run (Dec 19, 2025) with Claude 4.5 Haiku tested 8 formats across 120 questions with weighted accuracy prioritizing structural understanding and retrieval (60% combined weight vs filtering/aggregation at 40%). 4 flat array data sets of variants 40 and 80 records, random optional fields and all mandatory fields.

[!NOTE] All benchmark results have been moved to a separate repository: file-format-token-accuracy-benchmark-results

Key Findings (see Initial BenchmarkReport.md):

CSV: Unbeatable for dense, mandatory data (70.98% weighted @ 9,008 tokens). Accuracy drops by ~15% with sparse data.
JSON Compact: Recommended baseline. 1.46x tokens compared to CSV but 70.12% weighted accuracy with consistency across all variants.
JSON Pretty: Not recommended. 1.88x tokens compared to JSON Compact with similiar accuracy. Formatting only adds tokens but does not increase accuracy.
YAML: Highest accuracy (71.96% weighted) but 2.62x token compared to CSV.
TOON: Close to CSV for dense, mandatory data. Accuracy stays consistent with sparse data but tokens increased by 2.17x.
Markdown: Requires x0.5 of CSV tokens but has a catastrophic weighted accuracy of 24.67%. ~75% of tokens wasted on wrong answers.
Linear Scaling Validated: All formats and variants scale linearly with data volume (48-52% reduction from 80 to 40 records).

Next Test Setup (Current Configuration)

Active Formats (8 total):

CSV (baseline efficiency)
JSON Compact (recommended baseline)
JSON Pretty (formatting overhead reference)
TOON Default (custom binary format)
TOON Keyfold (custom binary format)
XML Compact (minified, lowest XML token usage)
XML Pretty (indented, readability reference)
YAML (highest accuracy, premium cost)

Removed after Initial Test:

❌ JSONL (streaming variant—less relevant for benchmark)
❌ Markdown (24.67% accuracy—unreliable)
❌ Apache Logs (irrelevant for general data benchmarking)

Record Count: 31 records (single standardized count, replaces 40/80 records)

Structures: Flat and Nested per format (extends previous flat array structure)

Field Variants: Mandatory and Optional per format (same object but for optional random fields are set to null)

Benchmark Workflow Overview

A complete benchmark run consists of 5 steps:

Prepare Output Folder - Create benchmark directory with specified parameters
Generate Test Data - Create data files, questions, and templates for all formats/variants
Execute Tests - Run format-sequential tests with format-specific subagents
Run Analytics - Validate results and extract metrics from agent transcripts
Display Results - Show rankings and key insights

Output Directory Structure

When you run a benchmark, it generates a folder like benchmark_format_all_structure_all_variant_all_haiku/ containing:

file-format-token-accuracy-benchmark

README

1 Plugin

file-format-benchmark

file-format-token-accuracy-benchmark

README

File Format Token Accuracy Benchmark

Overview

Initial Test Run & Methodology

Next Test Setup (Current Configuration)

Benchmark Workflow Overview

Output Directory Structure

1 Plugin

file-format-benchmark

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

File Format Token Accuracy Benchmark

Overview

Initial Test Run & Methodology

Next Test Setup (Current Configuration)

Benchmark Workflow Overview

Output Directory Structure

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official