Other commands

Top-level indexed commands (everything that is not only under indexed index *, indexed config *, or indexed mcp *), plus file formats and parsing. Full flag tables for index and MCP live in Index commands and MCP commands so this page does not repeat them.

Global options (top-level)

From indexed --help, options before the COMMAND name:

Group	Flag	Description
Options	`--install-completion`	Install shell completion for the current shell
Options	`--show-completion`	Print completion script to copy or customize
Options	`--help`	Show help and exit. Note: the top-level CLI does not accept `-h` (use `--help`)
Usage	`--local`	Use `.indexed/` in the current working directory instead of `~/.indexed/`
Usage	`--simple-output`	Machine-readable JSON for programmatic use
Debug	`--verbose`	Enable INFO-level logging
Debug	`--log-level`	`DEBUG` \| `INFO` \| `WARNING` \| `ERROR`
Debug	`--json-logs`	Emit logs as JSON

Subcommands add their own option groups (for example Logging and Storage on indexed index create files). See the --help for the command you are running.

`indexed init`

indexed init [OPTIONS]

Download the embedding model and create directories. From indexed init --help:

Option	Default	Description
`--model` / `-m`	`all-MiniLM-L6-v2`	Model to download
`--force` / `-f`	off	Re-download even if already cached
`--skip-model`	off	Skip download; only create dirs and validate config

`indexed migrate`

indexed migrate [OPTIONS]

Migrate legacy ./data/ into global ~/.indexed/data/.

Option	Description
`--dry-run`	List what would be migrated without copying

`indexed docs`

indexed docs [TOPIC]

Open documentation in the browser. Optional TOPIC narrows the page (e.g. index, config, mcp, confluence, files, jira — see indexed docs --help).

Nested “docs” resources (same idea, different doc sets):

indexed index docs — index documentation in the browser
indexed config docs — configuration documentation
indexed mcp docs — MCP documentation

`indexed license`

indexed license

Show license and terms (indexed license --help).

Connectors (flags and environment)

CLI flags and defaults for indexed index create files|jira|confluence are documented only in Index commands. Environment variables and credentials for Atlassian products are in Config commands — Connector credentials.

Guides: Local files & code, Jira, Confluence.

Supported file formats

Code files (.py, .ts, .go, etc.) are indexed as plain text with AST-aware chunking at function and class boundaries. Full code-aware chunking with semantic analysis is planned for a future release.

The Files connector uses Parsing architecture to parse documents. Files are routed by extension: Docling for rich documents, tree-sitter for code, and structure-aware plaintext parsing for everything else.

Document formats (Docling)

Rich document parsing with layout analysis, table extraction, and optional OCR.

Format	Extensions	Notes
PDF	`.pdf`	Text extraction with layout analysis; OCR for scanned pages
Microsoft Word	`.docx`	Modern Word format
Microsoft PowerPoint	`.pptx`	Slide content extracted as text
Microsoft Excel	`.xlsx`	Spreadsheet content with table structure
HTML	`.html`, `.htm`	Structural parsing with tag semantics
Images	`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`	OCR-based text extraction (requires `ocr_enabled`)
LaTeX	`.tex`	Scientific document parsing

Code formats (tree-sitter AST)

AST-aware chunking at semantic boundaries (functions, classes, methods). Falls back to line-based chunking for unsupported languages.

Language	Extensions	AST boundaries
Python	`.py`	Functions, classes, methods, decorators
TypeScript	`.ts`, `.tsx`	Functions, classes, interfaces, type aliases
JavaScript	`.js`, `.jsx`	Functions, classes, methods
Java	`.java`	Classes, methods, interfaces, enums
Rust	`.rs`	Functions, impl blocks, structs, enums, traits
Go	`.go`	Functions, methods, structs, interfaces
C	`.c`, `.h`	Functions, structs, enums
C++	`.cpp`, `.cc`, `.cxx`, `.hpp`	Functions, classes, structs, namespaces

Other code files (e.g. .rb, .php, .sh) use line-based splitting.

Plaintext formats

Format	Extensions	Parsing strategy
Markdown	`.md`	Structure-aware chunking via Docling (headings, lists, code blocks)
reStructuredText	`.rst`	Structure-aware chunking via Docling
Plain Text	`.txt`	Paragraph-based splitting
JSON	`.json`	Paragraph-based splitting
YAML	`.yaml`, `.yml`	Paragraph-based splitting
CSV	`.csv`	Paragraph-based splitting
TOML	`.toml`	Paragraph-based splitting
XML	`.xml`	Paragraph-based splitting

Fallback behavior

Files that do not match any specific parser are routed to Docling as a fallback. If Docling cannot parse a file, it is skipped with a warning (unless --fail-fast is set).

Tips

Unsupported formats are skipped — Indexed logs a warning and continues (unless --fail-fast).
Binary files are ignored — videos, compiled binaries, and other non-text files are skipped.
Use excluded_extensions in config or include/exclude patterns on the CLI.
OCR is off by default — enable with ocr_enabled: true in configuration (see Config commands) for scanned PDFs and images.

Parsing architecture

The indexed-parsing package is the parsing and chunking engine behind the Files connector. It handles rich documents, source code, and plaintext through specialized parsers.

Overview

┌──────────────┐     ┌─────────────┐     ┌──────────────────────┐
│  Input File  │────▶│ FileRouter  │────▶│  Selected Parser     │
│              │     │             │     │                      │
│  .py, .pdf,  │     │ Extension → │     │  DoclingParser    or │
│  .md, .json  │     │ Strategy    │     │  CodeChunker     or │
│              │     │             │     │  PlaintextParser  or │
│              │     │             │     │  Docling fallback    │
└──────────────┘     └─────────────┘     └──────────┬───────────┘
                                                    │
                                                    ▼
                                          ┌──────────────────┐
                                          │ ParsedDocument   │
                                          │                  │
                                          │  chunks[]        │
                                          │  metadata        │
                                          │  content_hash    │
                                          └──────────────────┘

The pipeline exposes:

parse(path) — parse a file from disk
parse_bytes(data, filename) — parse in-memory bytes

Both return a ParsedDocument with ParsedChunk entries and xxhash content hashing.

FileRouter

The FileRouter maps file extensions to parsing strategies. Routing takes ~2.4 µs per file.

Extension pattern	Strategy	Parser
`.pdf`, `.docx`, `.pptx`, `.xlsx`, `.html`, `.htm`, `.tex`	Document	DoclingParser
`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`	Image/OCR	DoclingParser (with OCR)
`.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.java`, `.rs`, `.go`, `.c`, `.h`, `.cpp`, `.cc`, `.cxx`, `.hpp`	Code AST	CodeChunker
`.md`, `.rst`	Structured text	PlaintextParser (Docling-backed)
`.txt`, `.json`, `.yaml`, `.yml`, `.csv`, `.toml`, `.xml`	Plaintext	PlaintextParser
Everything else	Fallback	DoclingParser

DoclingParser

Layout-aware parsing for PDF, DOCX, PPTX, XLSX, HTML, and LaTeX; optional OCR for scans and images.

OCR backends:

Backend	Size	Notes
RapidOCR (default)	~15.5 MB	Bundled ONNX models; no runtime network
EasyOCR (optional)	~88 MB	`pip install docling[easyocr]`; models cached under `~/.EasyOCR/model/`

Enable OCR in your workspace config.toml under [sources.files] (see Configuration guide), for example:

[sources.files]
ocr_enabled = true

Then create the collection:

Terminal

indexed index create files -c scanned-docs -p ./scans

CodeChunker

AST-aware chunking via tree-sitter at semantic boundaries.

Language	Semantic boundaries
Python	`function_definition`, `class_definition`, `decorated_definition`
TypeScript / JavaScript	`function_declaration`, `class_declaration`, `interface_declaration`, `type_alias_declaration`
Java	`class_declaration`, `method_declaration`, `interface_declaration`, `enum_declaration`
Rust	`function_item`, `impl_item`, `struct_item`, `enum_item`, `trait_item`
Go	`function_declaration`, `method_declaration`, `type_declaration`
C / C++	`function_definition`, `struct_specifier`, `class_specifier`, `namespace_definition`

Python AST chunking runs at ~950 µs per file in benchmarks.

PlaintextParser

Markdown and RST — Docling-backed structure-aware chunking
JSON, YAML, CSV, TOML, XML, TXT — paragraph-based splitting at blank lines

Output format

All parsers produce ParsedChunk (content, metadata, content_hash) and ParsedDocument (path, chunks, metadata, content_hash). A v1 adapter maps these into the indexing engine format.

Change tracking

Strategy	Method	Best for
Auto	Git if `.git` exists, else content-hash	Default
Git	`git diff --name-status`	Repositories
Content hash	xxhash per file	Non-git folders
Mtime	Modification time	Large trees where speed matters

Set change_tracking under [sources.files] in config, not as a flag on indexed index create files. For Files connector config keys, see Config commands.

config.toml

[sources.files]
change_tracking = "git"

Terminal

indexed index create files -c my-docs -p ./repo

Performance (benchmarks)

Operation	Time
FileRouter dispatch	~2.4 µs per file
JSON parse	~11 µs
Python AST chunking	~950 µs
Markdown (7 KB)	~500 ms
Large Markdown (24 KB)	~1.4 s

Other commands

On this page