Process OCR Documents from Google Drive into Searchable Knowledge Base with OpenAI & Pinecone

0 views

Built by

osama goda

Created on June 13, 2026

Description

How it works
This workflow automates a full RAG ingestion pipeline. When a new OCR JSON file is added to a Google Drive folder, the workflow extracts lesson metadata, parses and cleans the Arabic text, generates semantic chunks, creates AI embeddings, and stores them in a Pinecone vector index. After processing, the file is automatically moved to an archive folder to prevent duplicates.

Set up steps
Follow the sticky notes inside the workflow for detailed instructions.

Connect your Google Drive credentials.
Replace the input folder ID and archive folder ID with your own.
Connect your OpenAI account for embeddings.
Connect your Pinecone API key and select your index.

The workflow is ready to run once credentials and folder paths are configured.

Nodes Used (6)

Code

n8n-nodes-base.code

Default Data Loader

@n8n/n8n-nodes-langchain.documentDefaultDataLoader

Embeddings OpenAI

@n8n/n8n-nodes-langchain.embeddingsOpenAi

Google Drive

n8n-nodes-base.googleDrive

Pinecone Vector Store

@n8n/n8n-nodes-langchain.vectorStorePinecone

Recursive Character Text Splitter

@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter

Process OCR Documents from Google Drive into Searchable Knowledge Base with OpenAI & Pinecone

Description

Nodes Used (6)

Select Nodes to Filter