Indexed
Reference

Other commands

Top-level options, init, documentation, license, and file formats.

Other commands

Top-level indexed commands (everything that is not only under indexed index *, indexed config *, or indexed mcp *), plus file formats and parsing. Full flag tables for index and MCP live in Index commands and MCP commands so this page does not repeat them.


Global options (top-level)

From indexed --help, options before the COMMAND name:

GroupFlagDescription
Options--install-completionInstall shell completion for the current shell
Options--show-completionPrint completion script to copy or customize
Options--helpShow help and exit. Note: the top-level CLI does not accept -h (use --help)
Usage--localUse .indexed/ in the current working directory instead of ~/.indexed/
Usage--simple-outputMachine-readable JSON for programmatic use
Debug--verboseEnable INFO-level logging
Debug--log-levelDEBUG | INFO | WARNING | ERROR
Debug--json-logsEmit logs as JSON

Subcommands add their own option groups (for example Logging and Storage on indexed index create files). See the --help for the command you are running.


indexed init

indexed init [OPTIONS]

Download the embedding model and create directories. From indexed init --help:

OptionDefaultDescription
--model / -mall-MiniLM-L6-v2Model to download
--force / -foffRe-download even if already cached
--skip-modeloffSkip download; only create dirs and validate config

indexed migrate

indexed migrate [OPTIONS]

Migrate legacy ./data/ into global ~/.indexed/data/.

OptionDescription
--dry-runList what would be migrated without copying

indexed docs

indexed docs [TOPIC]

Open documentation in the browser. Optional TOPIC narrows the page (e.g. index, config, mcp, confluence, files, jira — see indexed docs --help).

Nested “docs” resources (same idea, different doc sets):

  • indexed index docs — index documentation in the browser
  • indexed config docs — configuration documentation
  • indexed mcp docs — MCP documentation

indexed license

indexed license

Show license and terms (indexed license --help).


Connectors (flags and environment)

CLI flags and defaults for indexed index create files|jira|confluence are documented only in Index commands. Environment variables and credentials for Atlassian products are in Config commands — Connector credentials.

Guides: Local files & code, Jira, Confluence.


Supported file formats

Code files (.py, .ts, .go, etc.) are indexed as plain text with AST-aware chunking at function and class boundaries. Full code-aware chunking with semantic analysis is planned for a future release.

The Files connector uses Parsing architecture to parse documents. Files are routed by extension: Docling for rich documents, tree-sitter for code, and structure-aware plaintext parsing for everything else.

Document formats (Docling)

Rich document parsing with layout analysis, table extraction, and optional OCR.

FormatExtensionsNotes
PDF.pdfText extraction with layout analysis; OCR for scanned pages
Microsoft Word.docxModern Word format
Microsoft PowerPoint.pptxSlide content extracted as text
Microsoft Excel.xlsxSpreadsheet content with table structure
HTML.html, .htmStructural parsing with tag semantics
Images.png, .jpg, .jpeg, .tiff, .bmpOCR-based text extraction (requires ocr_enabled)
LaTeX.texScientific document parsing

Code formats (tree-sitter AST)

AST-aware chunking at semantic boundaries (functions, classes, methods). Falls back to line-based chunking for unsupported languages.

LanguageExtensionsAST boundaries
Python.pyFunctions, classes, methods, decorators
TypeScript.ts, .tsxFunctions, classes, interfaces, type aliases
JavaScript.js, .jsxFunctions, classes, methods
Java.javaClasses, methods, interfaces, enums
Rust.rsFunctions, impl blocks, structs, enums, traits
Go.goFunctions, methods, structs, interfaces
C.c, .hFunctions, structs, enums
C++.cpp, .cc, .cxx, .hppFunctions, classes, structs, namespaces

Other code files (e.g. .rb, .php, .sh) use line-based splitting.

Plaintext formats

FormatExtensionsParsing strategy
Markdown.mdStructure-aware chunking via Docling (headings, lists, code blocks)
reStructuredText.rstStructure-aware chunking via Docling
Plain Text.txtParagraph-based splitting
JSON.jsonParagraph-based splitting
YAML.yaml, .ymlParagraph-based splitting
CSV.csvParagraph-based splitting
TOML.tomlParagraph-based splitting
XML.xmlParagraph-based splitting

Fallback behavior

Files that do not match any specific parser are routed to Docling as a fallback. If Docling cannot parse a file, it is skipped with a warning (unless --fail-fast is set).

Tips

  • Unsupported formats are skipped — Indexed logs a warning and continues (unless --fail-fast).
  • Binary files are ignored — videos, compiled binaries, and other non-text files are skipped.
  • Use excluded_extensions in config or include/exclude patterns on the CLI.
  • OCR is off by default — enable with ocr_enabled: true in configuration (see Config commands) for scanned PDFs and images.

Parsing architecture

The indexed-parsing package is the parsing and chunking engine behind the Files connector. It handles rich documents, source code, and plaintext through specialized parsers.

Overview

┌──────────────┐     ┌─────────────┐     ┌──────────────────────┐
│  Input File  │────▶│ FileRouter  │────▶│  Selected Parser     │
│              │     │             │     │                      │
│  .py, .pdf,  │     │ Extension → │     │  DoclingParser    or │
│  .md, .json  │     │ Strategy    │     │  CodeChunker     or │
│              │     │             │     │  PlaintextParser  or │
│              │     │             │     │  Docling fallback    │
└──────────────┘     └─────────────┘     └──────────┬───────────┘


                                          ┌──────────────────┐
                                          │ ParsedDocument   │
                                          │                  │
                                          │  chunks[]        │
                                          │  metadata        │
                                          │  content_hash    │
                                          └──────────────────┘

The pipeline exposes:

  • parse(path) — parse a file from disk
  • parse_bytes(data, filename) — parse in-memory bytes

Both return a ParsedDocument with ParsedChunk entries and xxhash content hashing.

FileRouter

The FileRouter maps file extensions to parsing strategies. Routing takes ~2.4 µs per file.

Extension patternStrategyParser
.pdf, .docx, .pptx, .xlsx, .html, .htm, .texDocumentDoclingParser
.png, .jpg, .jpeg, .tiff, .bmpImage/OCRDoclingParser (with OCR)
.py, .ts, .tsx, .js, .jsx, .java, .rs, .go, .c, .h, .cpp, .cc, .cxx, .hppCode ASTCodeChunker
.md, .rstStructured textPlaintextParser (Docling-backed)
.txt, .json, .yaml, .yml, .csv, .toml, .xmlPlaintextPlaintextParser
Everything elseFallbackDoclingParser

DoclingParser

Layout-aware parsing for PDF, DOCX, PPTX, XLSX, HTML, and LaTeX; optional OCR for scans and images.

OCR backends:

BackendSizeNotes
RapidOCR (default)~15.5 MBBundled ONNX models; no runtime network
EasyOCR (optional)~88 MBpip install docling[easyocr]; models cached under ~/.EasyOCR/model/

Enable OCR in your workspace config.toml under [sources.files] (see Configuration guide), for example:

[sources.files]
ocr_enabled = true

Then create the collection:

Terminal
indexed index create files -c scanned-docs -p ./scans

CodeChunker

AST-aware chunking via tree-sitter at semantic boundaries.

LanguageSemantic boundaries
Pythonfunction_definition, class_definition, decorated_definition
TypeScript / JavaScriptfunction_declaration, class_declaration, interface_declaration, type_alias_declaration
Javaclass_declaration, method_declaration, interface_declaration, enum_declaration
Rustfunction_item, impl_item, struct_item, enum_item, trait_item
Gofunction_declaration, method_declaration, type_declaration
C / C++function_definition, struct_specifier, class_specifier, namespace_definition

Python AST chunking runs at ~950 µs per file in benchmarks.

PlaintextParser

  • Markdown and RST — Docling-backed structure-aware chunking
  • JSON, YAML, CSV, TOML, XML, TXT — paragraph-based splitting at blank lines

Output format

All parsers produce ParsedChunk (content, metadata, content_hash) and ParsedDocument (path, chunks, metadata, content_hash). A v1 adapter maps these into the indexing engine format.

Change tracking

StrategyMethodBest for
AutoGit if .git exists, else content-hashDefault
Gitgit diff --name-statusRepositories
Content hashxxhash per fileNon-git folders
MtimeModification timeLarge trees where speed matters

Set change_tracking under [sources.files] in config, not as a flag on indexed index create files. For Files connector config keys, see Config commands.

config.toml
[sources.files]
change_tracking = "git"
Terminal
indexed index create files -c my-docs -p ./repo

Performance (benchmarks)

OperationTime
FileRouter dispatch~2.4 µs per file
JSON parse~11 µs
Python AST chunking~950 µs
Markdown (7 KB)~500 ms
Large Markdown (24 KB)~1.4 s