Indexing Overview

A collection is the atomic unit of Indexed: a named bundle of documents from one source (files, Jira, or Confluence), chunked and embedded into a local FAISS index you can search with natural language. Parsing, chunking, embedding, search, and updates all work per collection.

For day-to-day CLI tasks, see Create, inspect, update & remove. For workspace paths, credentials, chunk size, and the embedding model, see the Configuration Guide.

Where collections live

By default, collections are stored under ~/.indexed/data/collections/<name>/.

To store a collection under the current project, use ./.indexed/data/collections/<name>/ instead: pass --local on indexed index create …, or prefix the whole command with indexed --local (see indexed --help and Index commands). For search, inspect, update, and remove, use the same indexed --local prefix so every command resolves the same .indexed/ tree.

indexed config init decides where config.toml and .env.example live (for example ~/.indexed/ after cd ~, or <repo>/.indexed/ from a repository root). That is separate from the --local collection path above — details are in the configuration guide.

Files on disk

Each collection directory holds four artifacts that must stay in sync — do not edit them by hand:

~/.indexed/data/collections/my-docs/
├── manifest.json      # collection name, connector type, timestamps, counts, embedding model
├── documents.json     # per-document metadata (id, title, source URL or path)
├── chunks.json        # chunk text and linkage to documents (treat as sensitive)
└── index.faiss        # binary vector index (vectors only — not human-readable text)

The vectors can still be sensitive; treat ~/.indexed/ (or .indexed/) with the same access controls as your sources. If a collection is corrupted, remove it with indexed index remove and recreate it.

The indexing pipeline

When you create or update a collection, Indexed runs this pipeline on your machine:

Parse — the connector reads the source (disk files via Docling / tree-sitter; Jira and Confluence via their APIs) and turns content into structured documents.
Chunk — text is split into overlapping segments (defaults and tuning: Configuration Guide). Code can be split on AST boundaries when enabled.
Embed — each chunk is encoded with the configured embedding model (default all-MiniLM-L6-v2, 384-dimensional vectors, loaded locally — no network during embedding). Runtime is ONNX-based; changing the model is covered in the embedding section of the config guide.
Store — documents.json, chunks.json, metadata, and the FAISS index are written under the collection path above.

Collection lifecycle

Operation	Command	When to use
Create	`indexed index create <connector> -c <name> …`	First time indexing a source
Inspect	`indexed index inspect [name]`	List collections or inspect one
Update	`indexed index update [name]`	Refresh from the source after changes
Remove	`indexed index remove <name>`	Delete a collection permanently

Connectors

Connector	Source	Typical command	Auth
`files`	Local paths	`indexed index create files -c <name> -p <path>`	None
`jira`	Jira Cloud or Server/DC	`indexed index create jira -c <name> -u <url> -q "<jql>"`	API token (+ email on Cloud)
`confluence`	Confluence Cloud or Server/DC	`indexed index create confluence -c <name> -u <url> -q "<cql>"`	API token (+ email on Cloud)

Connector-specific flags and setup:

Scripting against the CLI: use indexed --simple-output for stable JSON where supported (see Index commands).

Overview