Automate Invoice Processing with OCR, GPT-4 & Salesforce Opportunity Creation

Go to Workflow
4 views
Built by Le Nguyen Le Nguyen
Created on June 06, 2026

Description

PDF Invoice Extractor (AI)

End-to-end pipeline: Watch Drive ➜ Download PDF ➜ OCR text ➜ AI normalize to JSON ➜ Upsert Buyer (Account) ➜ Create Opportunity ➜ Map Products ➜ Create OLI via Composite API ➜ Archive to OneDrive.

Node by node (what it does & key setup)

1) Google Drive Trigger
Purpose**: Fire when a new file appears in a specific Google Drive folder.
Key settings**:
Event: fileCreated
Folder ID: google drive folder id
Polling: everyMinute
Creds: googleDriveOAuth2Api
Output**: Metadata { id, name, ... } for the new file.

2) Download File From Google
Purpose**: Get the file binary for processing and archiving.
Key settings**:
Operation: download
File ID: ={{ $json.id }}
Creds: googleDriveOAuth2Api
Output**: Binary (default key: data) and original metadata.

3) Extract from File
Purpose**: Extract text from PDF (OCR as needed) for AI parsing.
Key settings**:
Operation: pdf
OCR: enable for scanned PDFs (in options)
Output**: JSON with OCR text at {{ $json.text }}.

4) Message a model (AI JSON Extractor)
Purpose: Convert OCR text into **strict normalized JSON array (invoice schema).
Key settings**:
Node: @n8n/n8n-nodes-langchain.openAi
Model: gpt-4.1 (or gpt-4.1-mini)
Message role: system (the strict prompt; references {{ $json.text }})
jsonOutput: true
Creds: openAiApi
Output (per item): $.message.content → the parsed **JSON (ensure it’s an array).

5) Create or update an account (Salesforce)
Purpose: Upsert **Buyer as Account using an external ID.
Key settings**:
Resource: account
Operation: upsert
External Id Field: tax_id__c
External Id Value: ={{ $json.message.content.buyer.tax_id }}
Name: ={{ $json.message.content.buyer.name }}
Creds: salesforceOAuth2Api
Output: Account record (captures Id) for downstream **Opportunity.

6) Create an opportunity (Salesforce)
Purpose**: Create Opportunity linked to the Buyer (Account).
Key settings**:
Resource: opportunity
Name: ={{ $('Message a model').item.json.message.content.invoice.code }}
Close Date: ={{ $('Message a model').item.json.message.content.invoice.issue_date }}
Stage: Closed Won
Amount: ={{ $('Message a model').item.json.message.content.summary.grand_total }}
AccountId: ={{ $json.id }} (from Upsert Account output)
Creds: salesforceOAuth2Api
Output**: Opportunity Id for OLI creation.

7) Build SOQL (Code / JS)
Purpose: Collect unique product **codes from AI JSON and build a SOQL query for PricebookEntry by Pricebook2Id.
Key settings**:
pricebook2Id (hardcoded in script): e.g., 01sxxxxxxxxxxxxxxx
Source lines: $('Message a model').first().json.message.content.products
Output**: { soql, codes }

8) Query PricebookEntries (Salesforce)
Purpose**: Fetch PricebookEntry.Id for each Product2.ProductCode.
Key settings**:
Resource: search
Query: ={{ $json.soql }}
Creds: salesforceOAuth2Api
Output**: Items with Id, Product2.ProductCode (used for mapping).

9) Code in JavaScript (Build OLI payloads)
Purpose: Join lines with PBE results and Opportunity Id ➜ build **OpportunityLineItem payloads.
Inputs**:
OpportunityId: ={{ $('Create an opportunity').first().json.id }}
Lines: ={{ $('Message a model').first().json.message.content.products }}
PBE rows: from previous node items
Output**: { body: { allOrNone:false, records:[{ OpportunityLineItem... }] } }
Notes**:
Converts discount_total ➜ per-unit if needed (currently commented for standard pricing).
Throws on missing PBE mapping or empty lines.

10) Create Opportunity Line Items (HTTP Request)
Purpose**: Bulk create OLIs via Salesforce Composite API.
Key settings**:
Method: POST
URL: https://<your-instance>.my.salesforce.com/services/data/v65.0/composite/sobjects
Auth: salesforceOAuth2Api (predefined credential)
Body (JSON): ={{ $json.body }}
Output**: Composite API results (per-record statuses).

11) Update File to One Drive
Purpose: Archive the **original PDF in OneDrive.
Key settings**:
Operation: upload
File Name: ={{ $json.name }}
Parent Folder ID: onedrive folder id
Binary Data: true (from the Download node)
Creds: microsoftOneDriveOAuth2Api
Output**: Uploaded file metadata.

Data flow (wiring)

Google Drive Trigger → Download File From Google
Download File From Google
→ Extract from File
→ Update File to One Drive
Extract from File → Message a model
Message a model
→ Create or update an account
Create or update an account → Create an opportunity
Create an opportunity → Build SOQL
Build SOQL → Query PricebookEntries
Query PricebookEntries → Code in JavaScript
Code in JavaScript → Create Opportunity Line Items

Quick setup checklist

🔐 Credentials: Connect Google Drive, OneDrive, Salesforce, OpenAI.
📂 IDs:
Drive Folder ID (watch)
OneDrive Parent Folder ID (archive)
Salesforce Pricebook2Id (in the JS SOQL builder)
🧠 AI Prompt: Use the strict system prompt; jsonOutput = true.
🧾 Field mappings:
Buyer tax id/name → Account upsert fields
Invoice code/date/amount → Opportunity fields
Product name must equal your Product2.ProductCode in SF.
✅ Test: Drop a sample PDF → verify:
AI returns array JSON only
Account/Opportunity created
OLI records created
PDF archived to OneDrive

Notes & best practices

If PDFs are scans, enable OCR in Extract from File.
If AI returns non-JSON, keep “Return only a JSON array” as the last line of the prompt and keep jsonOutput enabled.
Consider adding validation on parsing.warnings to gate Salesforce writes.
For discounts/taxes in OLI:
Standard OLI fields don’t support per-line discount amounts directly; model them in UnitPrice or custom fields.
Replace the Composite API URL with your org’s domain or use the Salesforce node’s Bulk Upsert for simplicity.

Nodes Used (6)

Code
n8n-nodes-base.code
Google Drive
n8n-nodes-base.googleDrive
HTTP Request
n8n-nodes-base.httpRequest
Microsoft OneDrive
n8n-nodes-base.microsoftOneDrive
OpenAI
@n8n/n8n-nodes-langchain.openAi
Salesforce
n8n-nodes-base.salesforce