Other commands
Other commands
Top-level indexed commands (everything that is not only under indexed index *, indexed config *, or indexed mcp *), plus file formats and parsing. Full flag tables for index and MCP live in Index commands and MCP commands so this page does not repeat them.
Global options (top-level)
From indexed --help, options before the COMMAND name:
| Group | Flag | Description |
|---|---|---|
| Options | --install-completion | Install shell completion for the current shell |
| Options | --show-completion | Print completion script to copy or customize |
| Options | --help | Show help and exit. Note: the top-level CLI does not accept -h (use --help) |
| Usage | --local | Use .indexed/ in the current working directory instead of ~/.indexed/ |
| Usage | --simple-output | Machine-readable JSON for programmatic use |
| Debug | --verbose | Enable INFO-level logging |
| Debug | --log-level | DEBUG | INFO | WARNING | ERROR |
| Debug | --json-logs | Emit logs as JSON |
Subcommands add their own option groups (for example Logging and Storage on indexed index create files). See the --help for the command you are running.
indexed init
indexed init [OPTIONS]Download the embedding model and create directories. From indexed init --help:
| Option | Default | Description |
|---|---|---|
--model / -m | all-MiniLM-L6-v2 | Model to download |
--force / -f | off | Re-download even if already cached |
--skip-model | off | Skip download; only create dirs and validate config |
indexed migrate
indexed migrate [OPTIONS]Migrate legacy ./data/ into global ~/.indexed/data/.
| Option | Description |
|---|---|
--dry-run | List what would be migrated without copying |
indexed docs
indexed docs [TOPIC]Open documentation in the browser. Optional TOPIC narrows the page (e.g. index, config, mcp, confluence, files, jira — see indexed docs --help).
Nested “docs” resources (same idea, different doc sets):
indexed index docs— index documentation in the browserindexed config docs— configuration documentationindexed mcp docs— MCP documentation
indexed license
indexed licenseShow license and terms (indexed license --help).
Connectors (flags and environment)
CLI flags and defaults for indexed index create files|jira|confluence are documented only in Index commands. Environment variables and credentials for Atlassian products are in Config commands — Connector credentials.
Guides: Local files & code, Jira, Confluence.
Supported file formats
Code files (.py, .ts, .go, etc.) are indexed as plain text with AST-aware chunking at function and class boundaries. Full code-aware chunking with semantic analysis is planned for a future release.
The Files connector uses Parsing architecture to parse documents. Files are routed by extension: Docling for rich documents, tree-sitter for code, and structure-aware plaintext parsing for everything else.
Document formats (Docling)
Rich document parsing with layout analysis, table extraction, and optional OCR.
| Format | Extensions | Notes |
|---|---|---|
.pdf | Text extraction with layout analysis; OCR for scanned pages | |
| Microsoft Word | .docx | Modern Word format |
| Microsoft PowerPoint | .pptx | Slide content extracted as text |
| Microsoft Excel | .xlsx | Spreadsheet content with table structure |
| HTML | .html, .htm | Structural parsing with tag semantics |
| Images | .png, .jpg, .jpeg, .tiff, .bmp | OCR-based text extraction (requires ocr_enabled) |
| LaTeX | .tex | Scientific document parsing |
Code formats (tree-sitter AST)
AST-aware chunking at semantic boundaries (functions, classes, methods). Falls back to line-based chunking for unsupported languages.
| Language | Extensions | AST boundaries |
|---|---|---|
| Python | .py | Functions, classes, methods, decorators |
| TypeScript | .ts, .tsx | Functions, classes, interfaces, type aliases |
| JavaScript | .js, .jsx | Functions, classes, methods |
| Java | .java | Classes, methods, interfaces, enums |
| Rust | .rs | Functions, impl blocks, structs, enums, traits |
| Go | .go | Functions, methods, structs, interfaces |
| C | .c, .h | Functions, structs, enums |
| C++ | .cpp, .cc, .cxx, .hpp | Functions, classes, structs, namespaces |
Other code files (e.g. .rb, .php, .sh) use line-based splitting.
Plaintext formats
| Format | Extensions | Parsing strategy |
|---|---|---|
| Markdown | .md | Structure-aware chunking via Docling (headings, lists, code blocks) |
| reStructuredText | .rst | Structure-aware chunking via Docling |
| Plain Text | .txt | Paragraph-based splitting |
| JSON | .json | Paragraph-based splitting |
| YAML | .yaml, .yml | Paragraph-based splitting |
| CSV | .csv | Paragraph-based splitting |
| TOML | .toml | Paragraph-based splitting |
| XML | .xml | Paragraph-based splitting |
Fallback behavior
Files that do not match any specific parser are routed to Docling as a fallback. If Docling cannot parse a file, it is skipped with a warning (unless --fail-fast is set).
Tips
- Unsupported formats are skipped — Indexed logs a warning and continues (unless
--fail-fast). - Binary files are ignored — videos, compiled binaries, and other non-text files are skipped.
- Use
excluded_extensionsin config or include/exclude patterns on the CLI. - OCR is off by default — enable with
ocr_enabled: truein configuration (see Config commands) for scanned PDFs and images.
Parsing architecture
The indexed-parsing package is the parsing and chunking engine behind the Files connector. It handles rich documents, source code, and plaintext through specialized parsers.
Overview
┌──────────────┐ ┌─────────────┐ ┌──────────────────────┐
│ Input File │────▶│ FileRouter │────▶│ Selected Parser │
│ │ │ │ │ │
│ .py, .pdf, │ │ Extension → │ │ DoclingParser or │
│ .md, .json │ │ Strategy │ │ CodeChunker or │
│ │ │ │ │ PlaintextParser or │
│ │ │ │ │ Docling fallback │
└──────────────┘ └─────────────┘ └──────────┬───────────┘
│
▼
┌──────────────────┐
│ ParsedDocument │
│ │
│ chunks[] │
│ metadata │
│ content_hash │
└──────────────────┘The pipeline exposes:
parse(path)— parse a file from diskparse_bytes(data, filename)— parse in-memory bytes
Both return a ParsedDocument with ParsedChunk entries and xxhash content hashing.
FileRouter
The FileRouter maps file extensions to parsing strategies. Routing takes ~2.4 µs per file.
| Extension pattern | Strategy | Parser |
|---|---|---|
.pdf, .docx, .pptx, .xlsx, .html, .htm, .tex | Document | DoclingParser |
.png, .jpg, .jpeg, .tiff, .bmp | Image/OCR | DoclingParser (with OCR) |
.py, .ts, .tsx, .js, .jsx, .java, .rs, .go, .c, .h, .cpp, .cc, .cxx, .hpp | Code AST | CodeChunker |
.md, .rst | Structured text | PlaintextParser (Docling-backed) |
.txt, .json, .yaml, .yml, .csv, .toml, .xml | Plaintext | PlaintextParser |
| Everything else | Fallback | DoclingParser |
DoclingParser
Layout-aware parsing for PDF, DOCX, PPTX, XLSX, HTML, and LaTeX; optional OCR for scans and images.
OCR backends:
| Backend | Size | Notes |
|---|---|---|
| RapidOCR (default) | ~15.5 MB | Bundled ONNX models; no runtime network |
| EasyOCR (optional) | ~88 MB | pip install docling[easyocr]; models cached under ~/.EasyOCR/model/ |
Enable OCR in your workspace config.toml under [sources.files] (see Configuration guide), for example:
[sources.files]
ocr_enabled = trueThen create the collection:
indexed index create files -c scanned-docs -p ./scansCodeChunker
AST-aware chunking via tree-sitter at semantic boundaries.
| Language | Semantic boundaries |
|---|---|
| Python | function_definition, class_definition, decorated_definition |
| TypeScript / JavaScript | function_declaration, class_declaration, interface_declaration, type_alias_declaration |
| Java | class_declaration, method_declaration, interface_declaration, enum_declaration |
| Rust | function_item, impl_item, struct_item, enum_item, trait_item |
| Go | function_declaration, method_declaration, type_declaration |
| C / C++ | function_definition, struct_specifier, class_specifier, namespace_definition |
Python AST chunking runs at ~950 µs per file in benchmarks.
PlaintextParser
- Markdown and RST — Docling-backed structure-aware chunking
- JSON, YAML, CSV, TOML, XML, TXT — paragraph-based splitting at blank lines
Output format
All parsers produce ParsedChunk (content, metadata, content_hash) and ParsedDocument (path, chunks, metadata, content_hash). A v1 adapter maps these into the indexing engine format.
Change tracking
| Strategy | Method | Best for |
|---|---|---|
| Auto | Git if .git exists, else content-hash | Default |
| Git | git diff --name-status | Repositories |
| Content hash | xxhash per file | Non-git folders |
| Mtime | Modification time | Large trees where speed matters |
Set change_tracking under [sources.files] in config, not as a flag on indexed index create files. For Files connector config keys, see Config commands.
[sources.files]
change_tracking = "git"indexed index create files -c my-docs -p ./repoPerformance (benchmarks)
| Operation | Time |
|---|---|
| FileRouter dispatch | ~2.4 µs per file |
| JSON parse | ~11 µs |
| Python AST chunking | ~950 µs |
| Markdown (7 KB) | ~500 ms |
| Large Markdown (24 KB) | ~1.4 s |