Indexed
Indexing

Overview

Understand collections — the atomic unit of Indexed — and how the indexing pipeline works.

Indexing Overview

A collection is the atomic unit of Indexed: a named bundle of documents from one source (files, Jira, or Confluence), chunked and embedded into a local FAISS index you can search with natural language. Parsing, chunking, embedding, search, and updates all work per collection.

For day-to-day CLI tasks, see Create, inspect, update & remove. For workspace paths, credentials, chunk size, and the embedding model, see the Configuration Guide.

Where collections live

By default, collections are stored under ~/.indexed/data/collections/<name>/.

To store a collection under the current project, use ./.indexed/data/collections/<name>/ instead: pass --local on indexed index create …, or prefix the whole command with indexed --local (see indexed --help and Index commands). For search, inspect, update, and remove, use the same indexed --local prefix so every command resolves the same .indexed/ tree.

indexed config init decides where config.toml and .env.example live (for example ~/.indexed/ after cd ~, or <repo>/.indexed/ from a repository root). That is separate from the --local collection path above — details are in the configuration guide.

Files on disk

Each collection directory holds four artifacts that must stay in sync — do not edit them by hand:

~/.indexed/data/collections/my-docs/
├── manifest.json      # collection name, connector type, timestamps, counts, embedding model
├── documents.json     # per-document metadata (id, title, source URL or path)
├── chunks.json        # chunk text and linkage to documents (treat as sensitive)
└── index.faiss        # binary vector index (vectors only — not human-readable text)

The vectors can still be sensitive; treat ~/.indexed/ (or .indexed/) with the same access controls as your sources. If a collection is corrupted, remove it with indexed index remove and recreate it.

The indexing pipeline

When you create or update a collection, Indexed runs this pipeline on your machine:

  1. Parse — the connector reads the source (disk files via Docling / tree-sitter; Jira and Confluence via their APIs) and turns content into structured documents.
  2. Chunk — text is split into overlapping segments (defaults and tuning: Configuration Guide). Code can be split on AST boundaries when enabled.
  3. Embed — each chunk is encoded with the configured embedding model (default all-MiniLM-L6-v2, 384-dimensional vectors, loaded locally — no network during embedding). Runtime is ONNX-based; changing the model is covered in the embedding section of the config guide.
  4. Storedocuments.json, chunks.json, metadata, and the FAISS index are written under the collection path above.

Collection lifecycle

OperationCommandWhen to use
Createindexed index create <connector> -c <name> …First time indexing a source
Inspectindexed index inspect [name]List collections or inspect one
Updateindexed index update [name]Refresh from the source after changes
Removeindexed index remove <name>Delete a collection permanently

Connectors

ConnectorSourceTypical commandAuth
filesLocal pathsindexed index create files -c <name> -p <path>None
jiraJira Cloud or Server/DCindexed index create jira -c <name> -u <url> -q "<jql>"API token (+ email on Cloud)
confluenceConfluence Cloud or Server/DCindexed index create confluence -c <name> -u <url> -q "<cql>"API token (+ email on Cloud)

Connector-specific flags and setup:

Scripting against the CLI: use indexed --simple-output for stable JSON where supported (see Index commands).