Overview
Indexing Overview
A collection is the atomic unit of Indexed: a named bundle of documents from one source (files, Jira, or Confluence), chunked and embedded into a local FAISS index you can search with natural language. Parsing, chunking, embedding, search, and updates all work per collection.
For day-to-day CLI tasks, see Create, inspect, update & remove. For workspace paths, credentials, chunk size, and the embedding model, see the Configuration Guide.
Where collections live
By default, collections are stored under ~/.indexed/data/collections/<name>/.
To store a collection under the current project, use ./.indexed/data/collections/<name>/ instead: pass --local on indexed index create …, or prefix the whole command with indexed --local (see indexed --help and Index commands). For search, inspect, update, and remove, use the same indexed --local prefix so every command resolves the same .indexed/ tree.
indexed config init decides where config.toml and .env.example live (for example ~/.indexed/ after cd ~, or <repo>/.indexed/ from a repository root). That is separate from the --local collection path above — details are in the configuration guide.
Files on disk
Each collection directory holds four artifacts that must stay in sync — do not edit them by hand:
~/.indexed/data/collections/my-docs/
├── manifest.json # collection name, connector type, timestamps, counts, embedding model
├── documents.json # per-document metadata (id, title, source URL or path)
├── chunks.json # chunk text and linkage to documents (treat as sensitive)
└── index.faiss # binary vector index (vectors only — not human-readable text)The vectors can still be sensitive; treat ~/.indexed/ (or .indexed/) with the same access controls as your sources. If a collection is corrupted, remove it with indexed index remove and recreate it.
The indexing pipeline
When you create or update a collection, Indexed runs this pipeline on your machine:
- Parse — the connector reads the source (disk files via Docling / tree-sitter; Jira and Confluence via their APIs) and turns content into structured documents.
- Chunk — text is split into overlapping segments (defaults and tuning: Configuration Guide). Code can be split on AST boundaries when enabled.
- Embed — each chunk is encoded with the configured embedding model (default
all-MiniLM-L6-v2, 384-dimensional vectors, loaded locally — no network during embedding). Runtime is ONNX-based; changing the model is covered in the embedding section of the config guide. - Store —
documents.json,chunks.json, metadata, and the FAISS index are written under the collection path above.
Collection lifecycle
| Operation | Command | When to use |
|---|---|---|
| Create | indexed index create <connector> -c <name> … | First time indexing a source |
| Inspect | indexed index inspect [name] | List collections or inspect one |
| Update | indexed index update [name] | Refresh from the source after changes |
| Remove | indexed index remove <name> | Delete a collection permanently |
Connectors
| Connector | Source | Typical command | Auth |
|---|---|---|---|
files | Local paths | indexed index create files -c <name> -p <path> | None |
jira | Jira Cloud or Server/DC | indexed index create jira -c <name> -u <url> -q "<jql>" | API token (+ email on Cloud) |
confluence | Confluence Cloud or Server/DC | indexed index create confluence -c <name> -u <url> -q "<cql>" | API token (+ email on Cloud) |
Connector-specific flags and setup:
Scripting against the CLI: use indexed --simple-output for stable JSON where supported (see Index commands).