From productivity
This skill should be used when the user asks to "hwpx 만들어", "한글 문서 작성", "공문 만들어", "보고서 생성", "회의록 만들어", "제안서 작성", "hwpx 편집", "한글 파일 수정", "create hwpx", "make a hancom document", "edit hwp file", "generate hwpx", or describes creating, editing, or reading a Korean government or business document. Also trigger when the user attaches a .hwpx file, asks to extract text from hwpx, mentions OWPML, or mentions Hancom document creation, editing, or conversion — even without saying "hwpx" explicitly. NOT when: discussing Hancom as a product/company without a document task (e.g. "Hancom is slow", "Hancom pricing").
How this skill is triggered — by the user, by Claude, or both
Slash command
/productivity:hwpxThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Skill to create, edit, read Hancom Office HWPX files. Centered on **writing XML directly**.
references/editing-gotchas.mdreferences/environment.mdreferences/hwpx-format.mdreferences/scripts-guide.mdreferences/section-writing.mdreferences/style-maps.mdreferences/xml-integrity.mdscripts/analyze_template.pyscripts/build_hwpx.pyscripts/calc_col_widths.pyscripts/delete_table_rows.pyscripts/dump_table.pyscripts/insert_table_row.pyscripts/locate.pyscripts/next_id.pyscripts/office/pack.pyscripts/office/unpack.pyscripts/page_guard.pyscripts/patch_section.pyscripts/replace_cell.pySkill to create, edit, read Hancom Office HWPX files. Centered on writing XML directly. HWPX = ZIP-based XML container (OWPML standard). Bypasses python-hwpx API formatting bugs, allows fine-grained format control.
When user attaches .hwpx, do not auto-restore. Judge request intent first, pick mode. Restore = one mode, not default.
| Intent | Mode | Workflow |
|---|---|---|
| Reproduce attached doc near-exactly, swap only values/field names | Reference restore | Workflow 5 |
| Explicit request to add/delete/restructure content | Content edit | Workflow 2 |
| Only text/table content needed | Read/extract | Workflow 3 |
| Attachment is style reference only, content written fresh | Reference-based generation | Workflow 5 |
| No attachment | New creation | Workflow 1 |
Intent unclear → ask user, do not assume restore.
| Mode | Page count | Completion gate |
|---|---|---|
| Reference restore | Must match reference | validate.py --baseline + page_guard.py |
| Content edit | Changing is normal | validate.py --baseline + actually open in Hancom |
| New / reference-based generation | No constraint | validate.py |
"Same page count", page_guard.py, "compress/summarize text" rules apply only to reference restore mode. In content edit mode, do not revert work on page count change. For restore-mode steps and checklist, see Workflow 5.
validate.py --baselinescope: real-world HWPX originals often contain duplicatehp:pIDs that HWP allows. Validating without--baselineflags these pre-existing duplicates asINVALID(false positive).--baselineis required when validating against an original attached document (Workflows 2, 5); omit for new documents (Workflow 1).
Only required package: lxml. Install via pip install lxml if python -c "import lxml" fails.
SKILL_DIR = absolute path of directory holding this SKILL.md (.../skills/hwpx)$SKILL_DIR/references/environment.md.../skills/hwpx/
├── SKILL.md # 이 파일
├── scripts/
│ ├── office/
│ │ ├── unpack.py # HWPX → 디렉토리 (raw bytes + 순서 manifest)
│ │ └── pack.py # 디렉토리 → HWPX (원본 항목 순서·압축 복원)
│ ├── build_hwpx.py # 템플릿 + XML → .hwpx 조립 (핵심)
│ ├── analyze_template.py # HWPX 심층 분석 (레퍼런스 기반 생성용)
│ ├── validate.py # HWPX 구조 검증
│ ├── page_guard.py # 레퍼런스 대비 페이지 드리프트 위험 검사
│ ├── dump_table.py # 표 셀 맵 덤프 (rowAddr/colAddr/span/text 일람)
│ ├── locate.py # 텍스트 포함 요소(hp:tbl/tr/p) span 탐색
│ ├── insert_table_row.py # 표 행 삽입 + rowAddr/rowCnt/rowSpan 정정
│ ├── replace_cell.py # 표 셀 내용 교체 + linesegarray 제거
│ ├── strip_linesegarray.py # 표 셀의 LineSeg 배열 제거 (렌더링 오류 수정)
│ ├── patch_section.py # section XML의 지정 섹션을 원자적으로 패치
│ ├── calc_col_widths.py # 콘텐츠 기반 표 열 너비 계산
│ ├── next_id.py # 문서 내 다음 사용 가능한 고유 요소 ID 생성
│ ├── delete_table_rows.py # 표 요소에서 지정 행 삭제
│ └── text_extract.py # 텍스트 추출
├── templates/
│ ├── base/ # 베이스 템플릿 (Skeleton 기반)
│ │ ├── mimetype, META-INF/*, version.xml, settings.xml, Preview/*
│ │ └── Contents/ (header.xml, section0.xml, content.hpf)
│ ├── gonmun/ # 공문 오버레이 (header.xml, section0.xml)
│ ├── report/ # 보고서 오버레이
│ ├── minutes/ # 회의록 오버레이
│ └── proposal/ # 제안서/사업개요 오버레이 (색상 헤더바, 번호 배지)
└── references/
├── hwpx-format.md # OWPML XML element reference
├── editing-gotchas.md # Editing traps (FORMULA, substring collision, count, deletion)
├── xml-integrity.md # XML serialization safe patterns (lxml rules + code examples)
├── style-maps.md # Per-template charPrIDRef/paraPrIDRef/borderFillIDRef
├── section-writing.md # section0.xml XML templates (paragraph, table, structure)
├── scripts-guide.md # Utility script CLI usage details
└── environment.md # OS-specific Python invocation, encoding gotchas, temp-file rules
Template selection matrix:
| Template | Use for |
|---|---|
gonmun | Official correspondence (공문) |
report | Multi-section reports with figures |
minutes | Meeting records |
proposal | Proposals with approval signatures |
base | Everything else |
$SKILL_DIR/references/style-maps.md$SKILL_DIR/references/hwpx-format.md § "header.xml Editing Guide"If attached reference exists and intent is restore/edit, use Workflow 5 instead.
source "$VENV"
# 빈 문서 (base 템플릿)
python3 "$SKILL_DIR/scripts/build_hwpx.py" --output result.hwpx
# 템플릿 사용
python3 "$SKILL_DIR/scripts/build_hwpx.py" --template gonmun --output result.hwpx
# 커스텀 section0.xml 오버라이드
python3 "$SKILL_DIR/scripts/build_hwpx.py" --template gonmun --section my_section0.xml --output result.hwpx
# header도 오버라이드
python3 "$SKILL_DIR/scripts/build_hwpx.py" --header my_header.xml --section my_section0.xml --output result.hwpx
# 메타데이터 설정
python3 "$SKILL_DIR/scripts/build_hwpx.py" --template report --section my.xml \
--title "제목" --creator "작성자" --output result.hwpx
# 1. section0.xml을 임시파일로 작성
mkdir -p ./_work
SECTION=$(mktemp ./_work/section0_XXXX.xml)
cat > "$SECTION" << 'XMLEOF'
<?xml version='1.0' encoding='UTF-8'?>
<hs:sec xmlns:hp="http://www.hancom.co.kr/hwpml/2011/paragraph"
xmlns:hs="http://www.hancom.co.kr/hwpml/2011/section">
<!-- secPr 포함 첫 문단 (base/section0.xml에서 복사) -->
<!-- ... -->
<hp:p id="1000000002" paraPrIDRef="0" styleIDRef="0" pageBreak="0" columnBreak="0" merged="0">
<hp:run charPrIDRef="0">
<hp:t>본문 내용</hp:t>
</hp:run>
</hp:p>
</hs:sec>
XMLEOF
# 2. 빌드
python3 "$SKILL_DIR/scripts/build_hwpx.py" --section "$SECTION" --output result.hwpx
# 3. 정리
rm -f "$SECTION"
Full XML templates (paragraph, empty line, mixed runs, table, ID rules) — read
$SKILL_DIR/references/section-writing.md.
Key rules:
templates/base/Contents/section0.xml (secPr + colPr required in first run)<hp:t/> (self-closing, not <hp:t></hp:t>)calc_col_widths.py for ratios1000000001 — use next_id.py to avoid collisionsFull guide (charPr/paraPr/borderFill addition, font reference system, paraPr caution) — read
$SKILL_DIR/references/hwpx-format.md§ "header.xml Editing Guide".
Key rules:
templates/base/Contents/header.xml, add needed charPr/paraPr/borderFill, update itemCnthp:switch structure (hp:case + hp:default); keep borderFillIDRef="2"Full style ID tables for all templates — read
$SKILL_DIR/references/style-maps.md.
Pick template → look up charPrIDRef/paraPrIDRef/borderFillIDRef in style-maps.md before writing section0.xml.
Prerequisite: read
$SKILL_DIR/references/editing-gotchas.mdbefore any edits — covers FORMULA fields, substring collision, count verification, paragraph deletion, and other silent-failure traps.
source "$VENV"
# 1. HWPX → 디렉토리 (raw bytes 추출, .hwpx_pack_order manifest 기록)
python3 "$SKILL_DIR/scripts/office/unpack.py" document.hwpx ./unpacked/
# 2. XML 편집 — 편집 유형별 도구 선택:
# - 표 셀 내용 수정 → replace_cell.py 필수 (lineseg + ID 충돌 자동 처리)
# str.replace()로 셀 직접 수정 금지 — linesegarray 미제거로 "문서 변경됨" 경고 발생
# - 일반 텍스트 (표 外) → patch_section.py (safe str.replace + lineseg strip)
# - 행 삽입/삭제 → insert_table_row.py / delete_table_rows.py
# 본문: ./unpacked/Contents/section0.xml
# 스타일: ./unpacked/Contents/header.xml
# 3. 다시 HWPX로 패키징
python3 "$SKILL_DIR/scripts/office/pack.py" ./unpacked/ edited.hwpx
# 4. 검증 (원본 대비)
python3 "$SKILL_DIR/scripts/validate.py" edited.hwpx --baseline document.hwpx
Validate timing when overwriting original: if planning to overwrite with the original filename, run
validate --baselinefirst. Order:pack to temp → validate --baseline original → copy to final. Overwriting the original first removes the baseline, forcing validation without--baselineand risking false-positive duplicate ID reports.
Many items → split into stages to catch silent failures early, verify each stage in Hancom.
unpacked/Contents/section0.xml..py, put assert s.count(old) == expected on every str.replace(). Count off → aborts before corrupted file produced (references/editing-gotchas.md §3)._work_stepN.hwpx to avoid file-lock conflicts.Multi-cell dir-mode: when replacing many cells in one file, use
replace_cell.pydirectly on the unpacked dir — reads/writes section0.xml in-place, no zip overhead per call:python3 "$SKILL_DIR/scripts/replace_cell.py" ./unpacked/ --table-id TABLE_ID --cell 2,1 --para 0 0 "값1" python3 "$SKILL_DIR/scripts/replace_cell.py" ./unpacked/ --table-id TABLE_ID --cell 3,1 --para 0 0 "값2" python3 "$SKILL_DIR/scripts/office/pack.py" ./unpacked/ result.hwpx
Pattern for editing N template-based files simultaneously:
import sys
sys.stdout.reconfigure(encoding="utf-8") # Windows cp949 guard
import shutil
import subprocess
from pathlib import Path
SKILL_DIR = Path("/path/to/skills/hwpx") # set to absolute path of this skill
UNPACK_PY = str(SKILL_DIR / "scripts/office/unpack.py")
PACK_PY = str(SKILL_DIR / "scripts/office/pack.py")
VALIDATE_PY = str(SKILL_DIR / "scripts/validate.py")
# 파일별 데이터를 설정으로 분리
FILES = [
{"src": "template_A.hwpx", "out": "result_A.hwpx", "name": "홍길동", "dept": "총무과"},
{"src": "template_B.hwpx", "out": "result_B.hwpx", "name": "이순신", "dept": "인사과"},
]
for cfg in FILES:
slug = Path(cfg["out"]).stem
unpack_dir = Path(f"./_work/unpack_{slug}/") # slug 포함 필수 — 충돌 방지
subprocess.run(["python3", UNPACK_PY, cfg["src"], str(unpack_dir)], check=True)
section_path = unpack_dir / "Contents/section0.xml"
s = section_path.read_text(encoding="utf-8")
# 파일별 값 치환
assert s.count("<hp:t>이름</hp:t>") == 1
s = s.replace("<hp:t>이름</hp:t>", f'<hp:t>{cfg["name"]}</hp:t>')
section_path.write_text(s, encoding="utf-8")
tmp_out = Path(f"./_work/{slug}_tmp.hwpx")
subprocess.run(["python3", PACK_PY, str(unpack_dir), str(tmp_out)], check=True)
subprocess.run(["python3", VALIDATE_PY, str(tmp_out), "--baseline", cfg["src"]], check=True)
shutil.copy(tmp_out, cfg["out"])
print(f"[done] {cfg['out']}")
unpack_dir — prevents directory collision in N-file parallel runsvalidate --baseline first, overwrite second — maintain order (see §"Validate timing when overwriting original")validate.py checks structure onlyvalidate.py checks structure only. Completion gate for content edit = confirming it actually opens in Hancom.
Start-Process), confirm Hancom process (Hwp) alive. On crash, process doesn't appear or exits immediately.CloseMainWindow closes only one main window — remaining window locks file. Confirm full close (Stop-Process if not closed) before proceeding.When skill receives a filepath argument (e.g. read path/to/file.hwpx, bare path/to/file.hwpx, or user requests "read file.hwpx" / "file.hwpx 읽어줘"): interpret any filepath as a text-extraction request regardless of the read keyword → run text_extract.py. Skill does not auto-execute on invoke — run the commands below explicitly.
python3 "$SKILL_DIR/scripts/text_extract.py" path/to/file.hwpx --format markdown
source "$VENV"
# 순수 텍스트
python3 "$SKILL_DIR/scripts/text_extract.py" document.hwpx
# 테이블 포함
python3 "$SKILL_DIR/scripts/text_extract.py" document.hwpx --include-tables
# 마크다운 형식
python3 "$SKILL_DIR/scripts/text_extract.py" document.hwpx --format markdown
# Batch extraction from folder list (bash)
for f in ./folder1/*.hwpx ./folder2/*.hwpx ./folder3/*.hwpx; do
echo "=== $f ==="
python3 "$SKILL_DIR/scripts/text_extract.py" "$f" --format markdown
done
# Recursive search + save results to files
find . -name "*.hwpx" | while IFS= read -r f; do
out="${f%.hwpx}.txt"
python3 "$SKILL_DIR/scripts/text_extract.py" "$f" > "$out"
echo "→ $out"
done
Windows PowerShell ($SKILL_DIR is a bash variable — not usable in PowerShell directly. Replace with absolute or relative path):
$skillScripts = "C:\path\to\skills\hwpx\scripts" # Replace with absolute path to SKILL_DIR\scripts
Get-ChildItem -Recurse -Filter "*.hwpx" | ForEach-Object {
Write-Host "=== $($_.FullName) ==="
python "$skillScripts\text_extract.py" $_.FullName --format markdown
}
source "$VENV"
# 단독 새 문서
python3 "$SKILL_DIR/scripts/validate.py" document.hwpx
# 첨부 원본을 편집/복원한 결과 — 기존 중복 ID 오탐 방지
python3 "$SKILL_DIR/scripts/validate.py" result.hwpx --baseline original.hwpx
Validation items: ZIP validity, required files present, mimetype content/position/compression method, XML well-formedness, secCnt/itemCnt/IDRef, hp:p ID duplicates (with --baseline, only new duplicates are errors).
Workflow to analyze attached HWPX and (a) make restored copy with only values/field names swapped, or (b) fill same layout with new content. Use when intent classified as "reference restore" or "reference-based generation".
charPrIDRef, paraPrIDRef, borderFillIDRef reference systemrowCnt, colCnt, colSpan, rowSpan, cellSz, cellMarginhp:p, hp:tbl, rowCnt, colCnt, pageBreak, secPr without explicit user requestvalidate.py pass alone. page_guard.py must also passpage_guard.py failure, do not submit as complete — fix cause (excess length / structure change) and rebuildFor reference-based generation (style reference only, content written fresh), page-count criteria above do not apply — like new creation,
validate.pyis only gate.
analyze_template.pyvalidate.pypage_guard.py (re-fix on failure)source "$VENV"
# 1. 심층 분석 (구조 청사진 출력)
python3 "$SKILL_DIR/scripts/analyze_template.py" reference.hwpx
# 2. header.xml과 section0.xml을 추출하여 참고용으로 보관
mkdir -p ./_work
python3 "$SKILL_DIR/scripts/analyze_template.py" reference.hwpx \
--extract-header ./_work/ref_header.xml \
--extract-section ./_work/ref_section.xml
# 3. 분석 결과를 보고 새 section0.xml 작성
# - 동일한 charPrIDRef, paraPrIDRef 사용
# - 동일한 테이블 구조 (열 수, 열 너비, 행 수, rowSpan/colSpan)
# - 동일한 borderFillIDRef, cellMargin
# 4. 추출한 header.xml + 새 section0.xml로 빌드
python3 "$SKILL_DIR/scripts/build_hwpx.py" \
--header ./_work/ref_header.xml \
--section ./_work/new_section0.xml \
--output result.hwpx
# 5. 검증 (원본 대비 — 기존 중복 ID 오탐 방지)
python3 "$SKILL_DIR/scripts/validate.py" result.hwpx --baseline reference.hwpx
# 6. 쪽수 드리프트 가드 (필수)
python3 "$SKILL_DIR/scripts/page_guard.py" \
--reference reference.hwpx \
--output result.hwpx
| Item | Description |
|---|---|
| Font definitions | hangul/latin font mapping |
| borderFill | border type/thickness + background color (detail per side) |
| charPr | font size (pt), font name, color, bold/italic/underline/strikeout, fontRef |
| paraPr | align, line spacing, margin (left/right/prev/next/intent), heading, borderFillIDRef |
| Document structure | page size, margin, page border, body width |
| Body detail | every paragraph's id/paraPr/charPr + text content |
| Table detail | rows×cols, column-width array, per-cell span/margin/borderFill/vertAlign + content |
| Script | Purpose |
|---|---|
scripts/build_hwpx.py | Core — template + XML → HWPX assembly (includes --update-preview) |
scripts/analyze_template.py | HWPX deep analysis (blueprint for reference-based generation) |
scripts/office/unpack.py | HWPX → directory (raw bytes + .hwpx_pack_order manifest) |
scripts/office/pack.py | directory → HWPX (restores entry order/compression from manifest, mimetype first) |
scripts/validate.py | HWPX structure validation — ZIP/mimetype/XML + secCnt/itemCnt/IDRef/duplicate ID. With --baseline ref.hwpx, only new duplicate IDs vs. original are errors |
scripts/page_guard.py | page-drift risk check vs. reference (restore-mode gate / edit-mode reference) |
scripts/text_extract.py | HWPX text extraction — self-implemented, no external hwpx package needed |
scripts/dump_table.py | table cell map dump — list all table IDs or dump (rowAddr, colAddr, colSpan, rowSpan, text) for specific table; --cell col,row for verbose cell inspector (paraPr/charPr/runs/linesegarray); use before replace_cell.py to identify target cell addresses |
scripts/locate.py | byte-span search for text-containing elements (hp:tbl/hp:tr/hp:p/hp:tc) — find table/paragraph positions in single-line section0.xml (extract with --extract-dir); accepts .hwpx or unpacked directory |
scripts/delete_table_rows.py | delete table rows — remove <hp:tr> + auto-fix rowCnt/rowSpan/rowAddr (--list to view rows) |
scripts/insert_table_row.py | insert table row — insert <hp:tr> + auto-fix rowCnt/rowAddr/rowSpan (--grow to extend group-end rowSpan) |
scripts/replace_cell.py | replace table cell content — replace paragraphs of target <hp:tc>'s direct <hp:subList> + lineseg strip + ID collision check; accepts .hwpx or unpacked directory (in-place); --run for multi-charPr runs within a paragraph |
scripts/strip_linesegarray.py | remove <hp:linesegarray> — prevent "document corrupted" warning after text edits |
scripts/patch_section.py | safe text replacement — str.replace + lineseg strip + ID verification. --after anchor for context-limited replacement |
scripts/calc_col_widths.py | table column-width calculation — ratio → HWPUNIT (guarantees sum = body width) |
scripts/next_id.py | look up next hp:p ID — for collision-free new paragraph insertion |
Full CLI examples for all utility scripts — read
$SKILL_DIR/references/scripts-guide.md.
Covers: patch_section.py (safe text replace) · strip_linesegarray.py · calc_col_widths.py · next_id.py · locate.py / insert_table_row.py / replace_cell.py / delete_table_rows.py (table editing helpers).
| Value | HWPUNIT | Meaning |
|---|---|---|
| 1pt | 100 | Base unit |
| 10pt | 1000 | Default font size |
| 1mm | 283.5 | Millimeter |
| 1cm | 2835 | Centimeter |
| A4 width | 59528 | 210mm |
| A4 height | 84186 | 297mm |
| Left/right margin | 8504 | 30mm |
| Body width | 42520 | 150mm (A4 - left/right margins) |
Severity: 🔴 crash/data corruption · 🟡 silent failure/bad output · 🔵 style/consistency
.hwp (binary) files not supported. If user provides .hwp, guide them to re-save as .hwpx from Hancom Office. (File → Save As → File type: HWPX)hp:, hs:, hh:, hc: prefixes when editing XML.venv/bin/python3 if it exists; any Python with lxml importable works (see environment.md)hwpx-format.md; editing traps → editing-gotchas.md; XML serialization rules → xml-integrity.md; style IDs → style-maps.md; XML templates → section-writing.md; script CLI → scripts-guide.md; environment/encoding → environment.mdanalyze_template.py + extracted-XML-based restore/rewritepage_guard.py must also pass — separate from validate.py — to mark complete. In content-edit mode, page_guard.py is reference info, and validate.py --baseline + actually opening in Hancom is completion gateetree.fromstring() then etree.tostring() existing section0.xml/header.xml — pretty-print / standalone removal / xmlns addition cause HWP parser crashes. Same applies to content.hpf (contains 14 Hancom namespace declarations)str.replace() directly on raw XML string for text changes (no lxml needed) — except table cell text: use replace_cell.py instead (see rule 24)re.sub(r'>[ \t\r\n]+<', '><', xml) compact before string insertioninsert_pos after all str.replace() done (computing before modification gives wrong offset)<hp:linesegarray> — stale line-break cache makes HWP show "document corrupted/modified" warning (HWP auto-recalculates on open)unpack.py extracts raw bytes with no lxml re-serialization. When modifying script directly, this invariant must be keptRules 14–20 — code examples and safe patterns:
$SKILL_DIR/references/xml-integrity.md.
type="FORMULA" field, modifying cached <hp:t> value = no-op — Hancom recalculates and overwrites on open. Replace whole fieldBegin~fieldEnd span with static text, or fix formula input cell (references/editing-gotchas.md §1)assert s.count(old) == expected before every str.replace() — catches run splitting (0 matches) and substring collision (excess) before silent failurevalidate.py --baseline passes, confirm actually opens in Hancom. Fully close Hancom before repackaging (multiple windows: CloseMainWindow closes only main window), after applying to real file verify copy success via md5 or similar (see Workflow 2)<hp:tc>), use replace_cell.py — not str.replace() or patch_section.py. Table cells have per-subList <hp:linesegarray>; replace_cell.py strips it and checks ID collisions automatically. Raw str.replace() on cell content leaves stale lineseg → "문서가 변경됨" warning in Hancom.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub kadragon/agent-toolkit --plugin productivity