Indexed
Reference

Supported File Formats

File formats that Indexed can parse and index using the Files connector.

Supported File Formats

The Files connector uses Unstructured to parse documents. The following formats are supported:

Document Formats

FormatExtensionsNotes
Markdown.mdFull Markdown parsing including frontmatter
Plain Text.txtIndexed as-is
PDF.pdfText extraction from PDF documents
Microsoft Word.docxModern Word format
Microsoft PowerPoint.pptxSlide content extracted as text
HTML.html, .htmHTML content with tags stripped
reStructuredText.rstCommon in Python documentation
Rich Text Format.rtfLegacy rich text
OpenDocument Text.odtLibreOffice/OpenOffice documents
EPUB.epubE-book format

Data Formats

FormatExtensionsNotes
CSV.csvComma-separated values
TSV.tsvTab-separated values
JSON.jsonJSON documents
XML.xmlXML documents

Tips

  • Unsupported formats are skipped — if Indexed encounters a file it can't parse, it logs a warning and moves on (unless --fail-fast is set).
  • Binary files are ignored — images, videos, compiled binaries, and other non-text files are automatically skipped.
  • Use include/exclude patterns to filter — if your directory has many unsupported files, use --include to target specific extensions:
indexed index create files -c docs -p ./data \
  --include ".*\.(md|txt|pdf|docx)$"

See Index Your Local Docs for the full guide.