Invoice data extraction with LlamaParse and OpenAI

Go to Workflow
37,004 views
Built by Jimleuk Jimleuk
Created on June 05, 2026

Description

This n8n workflow automates the process of parsing and extracting data from PDF invoices. With this workflow, accounts and finance people can realise huge time and cost savings in their busy schedules.

Read the Blog: https://blog.n8n.io/how-to-extract-data-from-pdf-to-excel-spreadsheet-advance-parsing-with-n8n-io-and-llamaparse/

How it works

This workflow will watch an email inbox for incoming invoices from suppliers
It will download the attached PDFs and processing them through a third party service called LlamaParse.
LlamaParse is specifically designed to handle and convert complex PDF data structures such as tables to markdown.
Markdown is easily to process for LLM models and so the data extraction by our AI agent is more accurate and reliable.
The workflow exports the extracted data from the AI agent to Google Sheets once the job complete.

Requirements

The criteria of the email trigger must be configured to capture emails with attachments.
The gmail label "invoice synced" must be created before using this workflow.
A LlamaIndex.ai account to use the LlamaParse service.
An OpenAI account to use GPT for AI work.
Google Sheets to save the output of the data extraction process although this can be replaced for whatever your needs.

Customizing this workflow

This workflow uses Gmail and Google Sheets but these can easily be swapped out for equivalent services such as Outlook and Excel.

Not using Excel? Simple redirect the output of the AI agent to your accounting software of choice.

Nodes Used (6)

Basic LLM Chain
@n8n/n8n-nodes-langchain.chainLlm
Gmail
n8n-nodes-base.gmail
Google Sheets
n8n-nodes-base.googleSheets
HTTP Request
n8n-nodes-base.httpRequest
OpenAI Model
@n8n/n8n-nodes-langchain.lmOpenAi
Structured Output Parser
@n8n/n8n-nodes-langchain.outputParserStructured