File Extractor Node
The File Extractor Node is responsible for processing raw documents (such as PDFs, spreadsheets, and images) and extracting their textual content. It functions as a "translator," converting binary files into plain text that can be read and analyzed by Language Models (LLMs).
Configuration
This node has a simplified configuration, focused solely on identifying which files should be processed.

Input
You need to select the variable that contains the list of files. Typically, these files come from the Input node.
- Field: "Input".
- What to select: Look for the variable of type
Filesdefined at the beginning of your flow (e.g.,{{ input.uploaded_documents }}).
Supported Formats
The platform supports automatic extraction from a wide variety of formats:
| Category | Supported Extensions | Notes |
|---|---|---|
| Documents | .pdf, .docx, .doc, .txt, .rtf | Extraction of structured text. |
| Spreadsheets | .xlsx, .xls, .csv | Converts tables into readable text. |
| Presentations | .pptx, .ppt | Extracts text from slides. |
| Images | .png, .jpg, .jpeg, .tiff | Uses OCR (Optical Character Recognition) to read text within the image. |
| Others | .html, .xml, .json | - |
Output Variables
After processing, this node generates a structured output ready to be sent to an LLM.
contents: A list containing the text extracted from each file.
How to use in the LLM?
In the next node (usually an LLM), you can reference the extracted content like this:
Analyze the following documents and provide a summary:
{{ file_extractor.contents }}
Example Flow
A common use case is creating an assistant that analyzes resumes or contracts:
-
Input Node: Defines a file field of type "Files".
-
File Extractor: Receives
{{ input.files }}. -
LLM: Receives
{{ extractor.contents }}with the instruction: "Extract the name and expiration date from these documents". -
Output: Returns the structured data.