Auto-Update Knowledge Base with Drive, LlamaIndex & Azure OpenAI Embeddings

Go to Workflow
74 views
Built by Khairul Muhtadin Khairul Muhtadin
Created on June 05, 2026

Description

This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.

Why Use This Workflow?

Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge

Ideal For

Knowledge Managers / Documentation Teams:** Automatically keep product docs and SOPs in sync when source files change on Google Drive.
Support Teams:** Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time.
Developer / AI Teams:** Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.

How It Works

Trigger: Google Drive Trigger watches a specific document or folder for updates.
Data Collection: The updated file is downloaded from Google Drive.
Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job.
Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown.
Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store.
Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.

Setup Guide

Prerequisites

| Requirement | Type | Purpose |
|-------------|------|---------|
| n8n instance | Essential | Execute and import the workflow — use the n8n instance |
| Google Drive OAuth2 | Essential | Watch and download documents from Google Drive |
| LlamaIndex Cloud API | Essential | Parse and convert documents to structured markdown |
| Azure OpenAI Account | Essential | Generate embeddings (deployment configured to model name "3small") |
| Persistent Vector DB (e.g., Pinecone) | Optional | Persist embeddings for production-scale search |

Installation Steps

Import the workflow JSON into your n8n instance: open your n8n instance and import the file.
Configure credentials:
Azure OpenAI: Provide Endpoint, API Key and set deployment name.
LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY.
Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n.
Update environment-specific values:
Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs).
Customize settings:
Polling interval (Wait node): adjust for faster or slower job status checks.
Target file or folder: toggled on the Google Drive Trigger node.
Embedding model: change Azure OpenAI deployment if needed.
Test execution:
Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.

Technical Details

Core Nodes

| Node | Purpose | Key Configuration |
|------|---------|-------------------|
| Knowledge Base Updated Trigger (Google Drive Trigger) | Triggers on file/folder changes | Set trigger type to specific file or folder; configure OAuth2 credential |
| Download Knowledge Document (Google Drive) | Downloads file binary | Operation: download; ensure OAuth2 credential is selected |
| Parse Document via LlamaIndex (HTTP Request) | Uploads file to LlamaIndex parsing endpoint | POST multipart/form-data to /parsing/upload; use HTTP Header Auth credential |
| Monitor Document Processing (HTTP Request) | Polls parsing job status | GET /parsing/job/{{jobId}}; check status field |
| Check Parsing Completion (If) | Branches on job status | Condition: {{$json.status}} equals SUCCESS |
| Retrieve Parsed Content (HTTP Request) | Fetches parsed markdown result | GET /parsing/job/{{jobId}}/result/markdown |
| Default Data Loader (LangChain) | Loads parsed markdown into document format | Use as document source for embeddings |
| Embeddings Azure OpenAI | Generates embeddings for documents | Credentials: Azure OpenAI; Model/Deployment: 3small |
| Insert Data to Store (vectorStoreInMemory) | Stores documents + embeddings | Use memory store for prototyping; switch to DB for persistence |

Workflow Logic

On Drive change, the file binary is downloaded and sent to LlamaIndex.
Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check.
When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.

Customization Options

Basic Adjustments:
Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota.
Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs.
Embedding Model: Swap Azure deployment for a different model name as needed.

Advanced Enhancements:
Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search.
Notification: Add Slack or email nodes to notify when parsing completes or fails.
Summarization: Add an LLM summarization step to generate chunk-level summaries.

Scaling option:
Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.

Performance & Optimization

| Metric | Expected Performance | Optimization Tips |
|--------|----------------------|-------------------|
| Execution time (per doc) | ~10s–2min (depends on file size & LlamaIndex processing) | Chunk large docs; run embeddings in batches |
| API calls (per doc) | 3–8 (upload, poll(s), retrieve, embedding calls) | Increase poll interval; consolidate requests |
| Error handling | Retries via Wait loop and If checks | Add exponential backoff, failure notifications, and retry limits |

Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| Authentication errors | Invalid/missing credentials | Reconfigure n8n Credentials; do not paste API keys directly into nodes |
| File not found | Incorrect fileId or permissions | Verify Drive fileId and OAuth scopes; share file with the service account if needed |
| Parsing stuck in PENDING | LlamaIndex processing delay or rate limit | Increase Wait node interval, monitor LlamaIndex dashboard, add retry limits |
| Embedding failures | Model/deployment mismatch or quota limits | Confirm Azure deployment name (3small) and subscription quotas |

Created by: khmuhtadin
Category: Knowledge Management
Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store

Need custom workflows? Contact us

Nodes Used (5)

Default Data Loader
@n8n/n8n-nodes-langchain.documentDefaultDataLoader
Embeddings Azure OpenAI
@n8n/n8n-nodes-langchain.embeddingsAzureOpenAi
Google Drive
n8n-nodes-base.googleDrive
HTTP Request
n8n-nodes-base.httpRequest
Simple Vector Store
@n8n/n8n-nodes-langchain.vectorStoreInMemory