From cocosearch
Guides through adding language support to CocoSearch across up to 6 paths: handler, symbol extraction, grammar, context expansion, dependency extraction, and documentation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cocosearch:cocosearch-add-languageThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A structured workflow for adding language support to CocoSearch. Navigates up to 6 independent paths (handler, symbol extraction, grammar, context expansion, dependency extraction, documentation) and ensures every registration point is covered.
A structured workflow for adding language support to CocoSearch. Navigates up to 6 independent paths (handler, symbol extraction, grammar, context expansion, dependency extraction, documentation) and ensures every registration point is covered.
Philosophy: The most common failure when adding language support is missing a registration step. This skill makes that impossible by tracking every step explicitly.
Reference: docs/adding-languages.md is the authoritative technical guide. This skill wraps it in an interactive workflow.
Parse the user's request to determine what's being added.
Extract from the request:
.kt, .kts)Confirm with user: "I'll add support for [language] with extensions [list]. Let me determine which paths apply."
Check two things to decide which of the 5 paths (A-E) apply:
CocoIndex's SplitRecursively has built-in Tree-sitter chunking for ~28 languages. Search for the language mapping:
search_code(
query="LANGUAGE_EXTENSIONS supported languages",
use_hybrid_search=True,
smart_context=True
)
Also check the CocoIndex docs: if the language is in the built-in list, chunking works automatically -- no handler (Path A) needed.
If the user wants symbol extraction (Path B) or context expansion (Path E), the language must be in tree-sitter-language-pack:
uv run python -c "from tree_sitter_language_pack import SupportedLanguage; print(sorted(SupportedLanguage.__args__))"
Verify a specific language:
uv run python -c "from tree_sitter_language_pack import get_parser; p = get_parser('<language>'); print(p)"
Present the applicable paths:
| Path | When to Use | Applies? |
|---|---|---|
| A: Language Handler | Language NOT in CocoIndex's built-in list -- needs custom chunking | ? |
| B: Symbol Extraction | Language IS in tree-sitter-language-pack -- enables --symbol-type/--symbol-name filtering | ? |
| C: Both A + B | Not built-in for chunking but has tree-sitter support | ? |
| D: Grammar Handler | Domain-specific schema sharing a base language (e.g., Ansible = YAML) | ? |
| E: Context Expansion | Language IS in tree-sitter-language-pack -- enables smart_context=True boundary expansion | ? |
| F: Dependency Extractor | Language has import/require/reference patterns -- enables deps tree, deps impact, and get_file_dependencies/get_file_impact MCP tools. Use /cocosearch:cocosearch-add-extractor for dedicated guidance. | ? |
Present to user: "Based on my checks, here are the paths that apply: [list]. Ready to proceed?"
Skip this step if the language is in CocoIndex's built-in list (no custom chunking needed).
Choose the closest existing handler based on language type:
| Language Type | Analog Handler | Why |
|---|---|---|
| Config format (key-value, blocks) | hcl.py | Block-based structure with labels |
| Template language | gotmpl.py | Template directives + content |
| Script / shell language | bash.py | Function definitions + commands |
| Containerization / CI | dockerfile.py | Directive-based, sequential |
| JVM / compiled language | scala.py or groovy.py | OOP with classes, methods, imports |
Search for the analog:
search_code(
query="<analog-language> handler EXTENSIONS SEPARATOR_SPEC",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Read the analog handler fully before proceeding.
Copy from the template and implement:
src/cocosearch/handlers/<language>.py (copy _template.py)EXTENSIONS to all file extensions (with leading dot)SEPARATOR_SPEC with CustomLanguageConfig -- hierarchical regex separators from coarsest to finestextract_metadata() returning block_type, hierarchy, and language_idThe handler is autodiscovered at import time; no registration code needed.
language_id to _SKIP_PARSE_EXTENSIONS in src/cocosearch/indexer/parse_tracking.py. This prevents false no_grammar reports in parse tracking stats.Extensions are auto-derived from the handler's EXTENSIONS attribute via _default_include_patterns() in src/cocosearch/indexer/config.py. No manual config.py edits are needed.
EXTENSIONS list you set in step 3b (e.g., [".hcl", ".tf"]) is automatically converted to glob patterns (e.g., "*.hcl", "*.tf") and merged into include_patternsDockerfile, Containerfile), define an INCLUDE_PATTERNS class attribute on the handler (e.g., INCLUDE_PATTERNS = ["Dockerfile", "Dockerfile.*", "Containerfile"]) — these are also picked up automaticallyCheck if the language name needs a display override in cli.py:
search_code(
query="display_names languages_command",
symbol_name="languages_command",
use_hybrid_search=True,
smart_context=True
)
Add to the display_names dict only if .title() casing is wrong (e.g., "hcl": "HCL", "gotmpl": "Go Template").
Find the analog's test file for the pattern:
search_code(
query="test <analog-language> handler EXTENSIONS SEPARATOR_SPEC",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Create tests/unit/handlers/test_<language>.py covering:
Checkpoint with user: "Handler created at src/cocosearch/handlers/<language>.py with [N] extensions and [N] separator levels. Tests pass. Ready for the next path?"
Skip this step if the language is NOT in
tree-sitter-language-pack.
Choose based on language similarity:
| Language Type | Analog Query | Why |
|---|---|---|
| Python-like (indent-based) | python.scm | function/class definitions |
| C-like (braces) | go.scm or java.scm | declaration patterns |
| Config (blocks with labels) | hcl.scm | block-based structures |
| Functional | rust.scm | items, traits, impls |
Search for the analog:
search_code(
query="<analog-language> tree-sitter query definition function class",
use_hybrid_search=True,
smart_context=True
)
Read the analog .scm file to understand the capture patterns.
Before writing the query, explore the language's tree-sitter AST to find the correct node types:
uv run python -c "
from tree_sitter_language_pack import get_parser
parser = get_parser('<language>')
tree = parser.parse(b'''<sample-code>''')
def show(node, indent=0):
print(' ' * indent + f'{node.type} [{node.start_point[0]}:{node.start_point[1]}-{node.end_point[0]}:{node.end_point[1]}]')
for child in node.children:
show(child, indent + 2)
show(tree.root_node)
"
Identify the node types for functions, classes, methods, interfaces, etc.
Create src/cocosearch/indexer/queries/<language>.scm with S-expression patterns:
@definition.<type> captures for symbol types (function, class, method, interface)@name for symbol name capturesAdd extension-to-language mappings:
search_code(
query="LANGUAGE_MAP extension mapping",
symbol_name="LANGUAGE_MAP",
use_hybrid_search=True,
smart_context=True
)
Add entries to LANGUAGE_MAP in src/cocosearch/indexer/symbols.py:
"ext": "language_name",
Add the language to the symbol-aware set:
search_code(
query="SYMBOL_AWARE_LANGUAGES",
use_hybrid_search=True,
smart_context=True
)
Add the language name to SYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py.
Check if the language introduces new AST node types that need mapping to standard types:
search_code(
query="_map_symbol_type node type mapping",
symbol_name="_map_symbol_type",
use_hybrid_search=True,
smart_context=True
)
Add mappings in _map_symbol_type if the language uses non-standard node names for standard concepts (e.g., HCL uses "block" for what maps to "class").
Check if the language needs special qualified name logic:
search_code(
query="_build_qualified_name qualified name",
symbol_name="_build_qualified_name",
use_hybrid_search=True,
smart_context=True
)
Add language-specific logic to _build_qualified_name in symbols.py if the language has special naming patterns (e.g., Go receiver methods, HCL block labels).
Find the analog's test file:
search_code(
query="test <analog-language> symbol extraction",
symbol_type="class",
use_hybrid_search=True,
smart_context=True
)
Create tests/unit/indexer/symbols/test_<language>.py covering:
Checkpoint with user: "Symbol extraction configured for [language] with [N] query patterns. Tests pass. Ready for the next path?"
Skip this step unless the language is a domain-specific schema sharing a base language extension.
For grammar handler implementation, use the dedicated skill:
Invoke: /cocosearch:cocosearch-add-grammar
This skill provides in-depth guidance for matches() design, separator spec, metadata extraction, conflict avoidance, and grammar-specific testing.
After completing the grammar skill, return here for Step 7 (count assertions) and Step 8 (documentation).
Skip this step unless the language is in
tree-sitter-language-packAND context expansion is desired.
Explore the AST (same technique as Step 4b) to find which node types represent function/class definitions.
search_code(
query="DEFINITION_NODE_TYPES context expansion node types",
use_hybrid_search=True,
smart_context=True
)
Add the language entry to DEFINITION_NODE_TYPES in src/cocosearch/search/context_expander.py:
"<language>": {"<function_node_type>", "<class_node_type>"},
Add file extension mappings to EXTENSION_TO_LANGUAGE in the same file:
".<ext>": "<language>",
CONTEXT_EXPANSION_LANGUAGES updates automatically -- it's derived from DEFINITION_NODE_TYPES.keys().
The CONTEXT_EXPANSION_LANGUAGES set is exported and referenced in search docs. Update any docs listing supported context expansion languages.
Checkpoint with user: "Context expansion added for [language]. smart_context=True will now expand to [node types] boundaries."
Skip this step unless the language has import/require/reference patterns that can be extracted for dependency analysis.
For dependency extractor implementation, use the dedicated skill:
Invoke: /cocosearch:cocosearch-add-extractor
This skill provides in-depth guidance for pre-checks, analog selection, extractor implementation, optional module resolver, tests, and registration.
After completing the extractor skill, return here for Step 7 (count assertions) and Step 8 (documentation).
Checkpoint with user: "Dependency extractor added for [language] with [N] import patterns. Tests pass. Ready for count assertions?"
This is the most commonly missed step. Do not skip.
search_code(
query="test registry handler count _HANDLER_REGISTRY",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_registry.py:
len(_HANDLER_REGISTRY) >= N -- increment by number of new extensionslen(specs) == N -- increment by 1 (one CustomLanguageConfig per handler)search_code(
query="test grammar registry count _GRAMMAR_REGISTRY",
use_hybrid_search=True,
smart_context=True
)
Update in tests/unit/handlers/test_grammar_registry.py:
len(_GRAMMAR_REGISTRY) == N -- increment by 1len(grammars) == N -- increment by 1Both test_registry.py and test_grammar_registry.py assert len(specs) == N from get_all_custom_language_specs(). This is the combined total of all language handler specs + grammar handler specs. Increment by 1 for each new handler or grammar added.
Update module descriptions and counts:
search/ module descriptionsearch_code(
query="Supported Languages README badges",
use_hybrid_search=True,
smart_context=True
)
Update:
If the new language introduces a new pattern worth documenting, add it as a worked example (like the HCL example in Path C).
# Handler tests (if Path A)
uv run pytest tests/unit/handlers/test_<language>.py -v
# Symbol extraction tests (if Path B)
uv run pytest tests/unit/indexer/symbols/test_<language>.py -v
# Grammar tests (if Path D)
uv run pytest tests/unit/handlers/grammars/test_<grammar>.py -v
# Dependency extractor tests (if Path F)
uv run pytest tests/unit/deps/extractors/test_<language>.py -v
uv run pytest tests/unit/deps/test_resolver.py -v
# Registry count assertions
uv run pytest tests/unit/handlers/test_registry.py -v
uv run pytest tests/unit/handlers/test_grammar_registry.py -v
# Full handler test suite
uv run pytest tests/unit/handlers/ -v
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
Language support added for [language]!
Paths completed:
[x] Path A: Language Handler -- src/cocosearch/handlers/<language>.py
[x] Path B: Symbol Extraction -- src/cocosearch/indexer/queries/<language>.scm
[ ] Path D: Grammar Handler -- not applicable
[x] Path E: Context Expansion -- added to context_expander.py
[x] Path F: Dependency Extractor -- src/cocosearch/deps/extractors/<language>.py
Registration points:
[x] Handler file created (autodiscovered)
[x] EXTENSIONS auto-derived into include patterns
[x] LANGUAGE_MAP entries (symbols.py)
[x] Query file created (queries/<language>.scm)
[x] SYMBOL_AWARE_LANGUAGES updated (query.py)
[x] DEFINITION_NODE_TYPES updated (context_expander.py)
[x] EXTENSION_TO_LANGUAGE updated (context_expander.py)
[x] Test count assertions updated
[x] Documentation updated
Tests: PASS
Lint: PASS
To try it out:
uv run cocosearch languages # Verify language appears
uv run cocosearch index . # Reindex with new language support
uv run cocosearch search "query" --language <language>
Complete checklist of all registration points. Check off each one as you complete it:
Language Handler (Path A):
src/cocosearch/handlers/<language>.py createdEXTENSIONS attribute defined (auto-derived into include patterns)INCLUDE_PATTERNS attribute defined (if non-extension patterns needed, e.g., Dockerfile)_SKIP_PARSE_EXTENSIONS updated in src/cocosearch/indexer/parse_tracking.py (if no tree-sitter grammar)cli.py languages_command (if .title() casing is wrong)tests/unit/handlers/test_<language>.py createdSymbol Extraction (Path B):
src/cocosearch/indexer/queries/<language>.scm createdLANGUAGE_MAP in src/cocosearch/indexer/symbols.pySYMBOL_AWARE_LANGUAGES in src/cocosearch/search/query.py_map_symbol_type updated (if new AST node types need mapping)_build_qualified_name updated (if special naming logic needed)tests/unit/indexer/symbols/test_<language>.py createdGrammar Handler (Path D):
src/cocosearch/handlers/grammars/<grammar>.py createdtests/unit/handlers/grammars/test_<grammar>.py createdContext Expansion (Path E):
DEFINITION_NODE_TYPES updated in src/cocosearch/search/context_expander.pyEXTENSION_TO_LANGUAGE updated in src/cocosearch/search/context_expander.pyDependency Extractor (Path F):
src/cocosearch/deps/extractors/<language>.py created (autodiscovered)LANGUAGES set matches the language IDs from handler/grammarsrc/cocosearch/deps/resolver.py (if import resolution needed)_RESOLVERS dict (if added)tests/unit/deps/extractors/test_<language>.py createdtests/unit/deps/test_resolver.py (if resolver added)Count Assertions:
tests/unit/handlers/test_registry.py -- handler count and spec count updatedtests/unit/handlers/test_grammar_registry.py -- grammar count and spec count updatedDocumentation:
CLAUDE.md -- module descriptions and counts updatedREADME.md -- supported languages section updateddocs/adding-languages.md -- new example added (if novel pattern)For common search tips (hybrid search, smart_context, symbol filtering), see skills/README.md.
For installation instructions, see skills/README.md.
npx claudepluginhub violetcranberry/coco-search --plugin cocosearchAdds a dependency extractor for a language/grammar in CocoSearch, guiding through pre-checks, implementation, optional module resolver, tests, and registration.
Provides semantic code search and index management via the ccc CLI. Automatically initializes and refreshes the codebase index for relevant file retrieval.
Replaces grep/rg/ag/ack/fd with AST-aware code search via tilth MCP. Finds symbols, definitions, callers, imports, and text patterns using tree-sitter. Requires tilth MCP server.