Supported File Formats

The Files connector uses Unstructured to parse documents. The following formats are supported:

Document Formats

Format	Extensions	Notes
Markdown	`.md`	Full Markdown parsing including frontmatter
Plain Text	`.txt`	Indexed as-is
PDF	`.pdf`	Text extraction from PDF documents
Microsoft Word	`.docx`	Modern Word format
Microsoft PowerPoint	`.pptx`	Slide content extracted as text
HTML	`.html`, `.htm`	HTML content with tags stripped
reStructuredText	`.rst`	Common in Python documentation
Rich Text Format	`.rtf`	Legacy rich text
OpenDocument Text	`.odt`	LibreOffice/OpenOffice documents
EPUB	`.epub`	E-book format

Unsupported formats are skipped — if Indexed encounters a file it can't parse, it logs a warning and moves on (unless --fail-fast is set).
Binary files are ignored — images, videos, compiled binaries, and other non-text files are automatically skipped.
Use include/exclude patterns to filter — if your directory has many unsupported files, use --include to target specific extensions:

indexed index create files -c docs -p ./data \
  --include ".*\.(md|txt|pdf|docx)$"

See Index Your Local Docs for the full guide.