Indexed
IndexingConnectors

Local Files & Code

Index any local directory — Markdown, PDFs, DOCX, source code — into a searchable collection.

Local Files & Code

By the end of this guide, you will have a local directory indexed into a searchable collection, with filters configured to include only the files you care about.

Quick start

Terminal
indexed index create files -c my-docs -p ./documents

For every flag and default, see Index commands (indexed index create files --help).

Prerequisites

What gets indexed

  • Text extracted from each supported file type (see Supported file formats)
  • File path and modification metadata
  • Directory hierarchy
  • Code structure when AST chunking is enabled (function/class boundaries for supported languages)

Create a Collection

Point Indexed at your folder. It recursively scans, parses every supported format, chunks, and embeds — all on your machine.

Terminal
indexed index create files -c project-docs -p ~/work/docs
Indexing collection 'project-docs'...
  Parsed 24 documents
  Created 96 chunks
  Generated embeddings
✓ Collection 'project-docs' created (96 chunks, 5.1 MB)

Indexing is always recursive — all subdirectories are scanned. There is currently no depth-limit flag.

Obsidian vaults

Point -p at your vault root (e.g., ~/Documents/ObsidianVault). Indexed handles nested folders, wiki-links in Markdown, and frontmatter automatically.

Verify the collection was created:

Terminal
indexed index inspect project-docs
Collection: project-docs
  Type:       files
  Source:     /Users/you/work/docs
  Documents:  24
  Chunks:     96
  Size:       5.1 MB
  Created:    2026-04-06 10:15:33

Filter what gets indexed

On the CLI, --include and --exclude take regexes matched against the full file path (repeat the flag for each pattern). When both are set, includes are applied first.

Persistent defaults for a workspace live in [sources.files]: include_patterns are glob-style (for example *.md), and exclude_patterns are regexes — see Files connector. With --respect-gitignore (on by default), common dirs like node_modules and .git are skipped; see the same Index command reference for all indexed index create files flags.

Terminal
# Only index Markdown and text files
indexed index create files -c docs-only -p ~/work/docs \
  --include ".*\.md$" --include ".*\.txt$"

# Skip drafts and work-in-progress files
indexed index create files -c final-docs -p ~/work/docs \
  --exclude ".*\.draft\.md$" --exclude ".*WIP.*"

Configuration (OCR, change tracking, chunking)

Tune OCR, change tracking, table extraction, and code chunking under [sources.files] (or indexed config set sources.files.<key> <value>). Defaults and key names are in Config commands; narrative setup is in the Configuration guide.

.indexed/config.toml
[sources.files]
path = "./documents"
include_patterns = ["*.md", "*.pdf", "*.py"]
exclude_patterns = ["\\.tmp$", "/build/"]
fail_fast = false
respect_gitignore = true
ocr_enabled = true
table_structure = true
code_chunking = true
max_chunk_tokens = 512
change_tracking = "auto"

ocr_enabled defaults to off; set it to true for scanned PDFs and images. max_chunk_tokens applies in the parsing pipeline; embedding chunk size is core.v1.indexing.chunk_size (see Config commands).

Code Files

Indexed parses code files (.py, .ts, .go, .rs, .java, .c, .cpp, etc.) as plain text by default. When code_chunking is enabled (the default), tree-sitter AST-aware chunking splits code at semantic boundaries — functions, classes, and methods — rather than arbitrary line counts.

Supported languages and formats

See Supported File Formats for the full list of file types, code languages with AST support, and plaintext formats.

Keep the Index Fresh

When your documents change, update only what's new:

Terminal
indexed index update project-docs
Updating collection 'project-docs'...
  Before: 24 documents, 96 chunks
  After:  28 documents, 112 chunks
✓ Collection 'project-docs' updated

Indexed picks a change tracking strategy from sources.files.change_tracking (auto, git, content_hash, mtime, or none). See Parsing architecture and Config commands.

To automate updates:

crontab
# Update all Indexed collections every day at 8am (use the path from `which indexed`)
0 8 * * * /path/to/indexed index update 2>&1 >> ~/.indexed/update.log

If you need a clean re-index (e.g., after changing chunk size), use --force:

Terminal
indexed index create files -c project-docs -p ~/work/docs --force

Additional options

All indexed index create files flags and defaults: Index commands. OCR, change tracking, patterns, and excluded_extensions: Files connector and Configuration guide.

Troubleshooting

Unsupported file format error — Check the Supported File Formats list. If your format isn't listed, convert it to PDF or Markdown, which parse most reliably.

No results returned — Verify the collection exists with indexed index inspect <name>, then test from the CLI:

Terminal
indexed index search "your query here" -c project-docs

OCR not working on scanned PDFs — OCR is off by default (sources.files.ocr_enabled). Enable it in config, then re-create or update the collection (see Parsing architecture).

What's Next