claude_encoding_guard

Preserve non-UTF-8 file encodings and line endings when Claude Code edits your files.

The Problem

Claude Code's Edit/Write tools always output UTF-8 with LF line endings. When editing files in GBK, Big5, Shift_JIS, or other legacy encodings, the original encoding is silently destroyed. CRLF line endings are also lost on Windows. The official response is "not planned".

Related issues: #6485, #7134, #28523, #38887.

How It Works

PreToolUse (Read)                       PostToolUse (Edit/Write)
    │                                        │
    ├─ binary check (binaryornot)            ├─ read cached encoding + line ending
    ├─ detect encoding (chardet 5.x)         ├─ convert UTF-8 → original encoding
    ├─ detect line ending (CRLF/LF)          ├─ normalize line endings to original
    ├─ convert original → UTF-8              ├─ delete session cache
    ├─ save session cache                    └─ done
    └─ Claude reads correct content

Conversion happens at Read time, before Claude Code loads the file into memory. This is critical — Claude Code interprets non-UTF-8 bytes as UTF-8, replacing invalid sequences with U+FFFD (irreversible). By converting to UTF-8 first, Claude sees correct content and edits cleanly.

Features

Encoding preservation: GBK, GB2312, GB18030, Big5, Big5-HKSCS, EUC-TW, Shift_JIS, EUC-JP, ISO-2022-JP, EUC-KR, Windows-1252, Windows-1251 (Cyrillic), ISO-8859-1
Line ending preservation: CRLF restored after Claude Code converts to LF
Binary file protection: Prevents chardet from misidentifying binary files (always on, not configurable)
Session isolation: Multiple Claude Code sessions won't interfere with each other
Zero configuration: Works out of the box

Install

Prerequisites

uv — required. Handles Python dependencies automatically via PEP 723 inline metadata. No manual pip install needed.

As Plugin

/plugin marketplace add ymonster/claude_encoding_guard
/plugin install encoding-guard

Verify

After installation, Read any non-UTF-8 file — Chinese characters should display correctly instead of garbled text.

Experimental: chardet 7.x branch

A migration to chardet 7.x is being prototyped on the chardet7-preview branch. It drops the binaryornot dependency (chardet 7.x has built-in binary detection) and uses 0BSD-licensed chardet (vs LGPL on 5.x). Not recommended for daily use yet — see the branch's README for status and install instructions.

Design Decisions

Read-time conversion: Claude Code v2.1.90+ silently accepts hook-modified files without re-reading content. Edit-time conversion results in U+FFFD corruption. Converting at Read time ensures Claude's first in-memory view is correct.
uv run --script: Plain uv run triggers project sync which closes the stdin pipe on Windows. --script skips project discovery.
chardet 5.x: Version 7.x reduced CJK detection confidence from 0.99 to 0.40 — below the safety threshold. Pinned via PEP 723 in an isolated environment.
Binary detection: chardet misidentifies some binary files as Windows-1252 (confidence 0.73). binaryornot filters these out. Always on, not configurable.
Encoding aliases: GB2312 → GBK (byte-compatible superset), ISO-8859-1 → Windows-1252 (industry practice).
Session-isolated cache: <tmpdir>/.cc_encoding_cache/<session_id>/ — no cross-session interference. Stale sessions (>24h) auto-cleaned.

Known Limitations

Recommended: use under version control as a fallback for binary misedits. We do our best to avoid editing binary files through multiple defensive layers (and in practice Claude Code rarely tries to edit a binary file anyway), but stray cases are always possible. We considered using local git stash as an automatic fallback but didn't want to interfere with the user's own git workflow (polluting stash list, reflog, etc.). So strongly recommended: use this plugin under git or another version control tool, so any misidentification can be safely rolled back.
Windows-1252 stale cache edge case. If cache deletion fails (e.g., antivirus lock) and the file's original Windows-1252 bytes happen to be valid UTF-8, the stale cache won't self-heal. This requires two unlikely conditions to coincide and doesn't affect CJK encodings (GBK/Big5/Shift_JIS bytes are not valid UTF-8).
Mixed line endings. Files with both CRLF and LF are normalized to the dominant style.

encoding-guard

Popularity

What's Inside

README

claude_encoding_guard

The Problem

How It Works

Features

Install

Prerequisites

As Plugin

Verify

Experimental: chardet 7.x branch

Design Decisions

Known Limitations

Confidence

Similar Plugins

ralph-loop

anthropic-essentials

agent-skills

claude-buddy