Reference
Supported File Formats
File formats that Indexed can parse and index using the Files connector.
Supported File Formats
The Files connector uses Unstructured to parse documents. The following formats are supported:
Document Formats
| Format | Extensions | Notes |
|---|---|---|
| Markdown | .md | Full Markdown parsing including frontmatter |
| Plain Text | .txt | Indexed as-is |
.pdf | Text extraction from PDF documents | |
| Microsoft Word | .docx | Modern Word format |
| Microsoft PowerPoint | .pptx | Slide content extracted as text |
| HTML | .html, .htm | HTML content with tags stripped |
| reStructuredText | .rst | Common in Python documentation |
| Rich Text Format | .rtf | Legacy rich text |
| OpenDocument Text | .odt | LibreOffice/OpenOffice documents |
| EPUB | .epub | E-book format |
Data Formats
| Format | Extensions | Notes |
|---|---|---|
| CSV | .csv | Comma-separated values |
| TSV | .tsv | Tab-separated values |
| JSON | .json | JSON documents |
| XML | .xml | XML documents |
Tips
- Unsupported formats are skipped — if Indexed encounters a file it can't parse, it logs a warning and moves on (unless
--fail-fastis set). - Binary files are ignored — images, videos, compiled binaries, and other non-text files are automatically skipped.
- Use include/exclude patterns to filter — if your directory has many unsupported files, use
--includeto target specific extensions:
indexed index create files -c docs -p ./data \
--include ".*\.(md|txt|pdf|docx)$"See Index Your Local Docs for the full guide.