CPF/CNPJ Extractor Node
The CPF Extractor node is a specialized tool for identifying, validating, and extracting CPF (Individual Taxpayer Registry) and CNPJ (National Corporate Taxpayer Registry) numbers from processed texts.
Unlike a simple text search, this node applies validation algorithms (check digits) and conflict detection logic.
Prerequisites
This node does not read files directly. It needs to receive the text content that has already been extracted from a document. Therefore, the standard flow is:
- Input (Receives the file)
- File Extractor (Converts PDF/Image to text)
- CPF Extractor (Reads the text and searches for documents)
Configuration
The configuration is simple and straightforward, requiring only the connection to the data source.

Input
You must select the variable that contains the text content of the files.
- Field: "Input".
- What to select: Look for the output of the previous extraction node, usually
{{ extrator_de_arquivos.contents }}.
What Does It Detect?
The node is capable of identifying formatted and non-formatted patterns:
- CPF:
123.456.789-00or12345678900 - CNPJ:
12.345.678/0001-00or12345678000100
Conflict Detection
A powerful feature of this node is the Conflict alert. It will mark a document as conflicting if:
- It finds multiple different CPFs in a document that should be personal.
- It finds inconsistent formatting.
This is extremely useful for automatic document screening.
Output Variables
The node generates a list of results (results) containing the data found per file.
Output Example (JSON)
[
{
"filename": "contract_joao.pdf",
"cpfs": ["123.456.789-00"],
"cnpjs": [],
"conflict": false
},
{
"filename": "strange_document.pdf",
"cpfs": ["111.222.333-44", "999.888.777-66"],
"conflict": true,
"conflict_list": ["Multiple CPFs found"]
}
]