Build a support knowledge base from historical Gmail emails with OpenAI and PostgreSQL

Go to Workflow
0 views
Built by Vivekanand M Vivekanand M
Created on June 08, 2026

Description

KB Builder โ€” Historical Emails
n8n Workflow Template

๐Ÿ“˜ Description

This workflow automates the process of building a structured Knowledge Base (KB) from your existing Gmail inbox by processing historical email threads, extracting customer-support conversation pairs, classifying them with AI, generating vector embeddings, and storing everything in a PostgreSQL database โ€” ready to power AI-assisted draft generation in downstream workflows.

The workflow is triggered manually and fetches the last N emails from your connected Gmail account. Each email is parsed, filtered, and deduplicated before being processed. A thread-level fetch retrieves the full conversation context, separating customer messages from support replies. An AI classification step categorises each thread into a defined scenario type. Vector embeddings are generated for semantic similarity search. Qualified threads are inserted into three database tables โ€” kb_data for clean Q&A pairs, scenario_patterns for reusable handling logic, and corrections for diverse real-world examples โ€” with strict deduplication to keep the KB clean and non-redundant.

โš™๏ธ What This Workflow Does (Step-by-Step)

๐Ÿ“ฅ Manual Trigger โ€” Run on demand whenever you want to import a batch of historical emails into the KB. No scheduling required.

๐Ÿ“ฌ Fetch Emails from Gmail โ€” Pulls the last N emails (configurable, default 100) from your Gmail account using OAuth2. Returns full email metadata including thread IDs, sender, subject, labels, and body.

๐Ÿ” Parse & Filter โ€” Extracts structured fields from each email: sender address, subject, body (cleaned of links and quoted history), date, and thread ID. Filters out emails from your own domain (outbound support replies) and auto-generated emails (no-reply, notifications, newsletters). Only genuine inbound customer emails proceed.

๐Ÿงต Fetch Full Thread โ€” For each qualified email, fetches the entire Gmail thread using the thread ID. Retrieves all messages in the conversation to identify the original customer message and the latest support reply.

๐Ÿ—‚๏ธ Assemble Thread Context โ€” Reconstructs the conversation by separating customer messages from support replies. Builds a clean conversationText block used as AI input. Flags threads with no support reply for conditional handling.

๐Ÿค– AI Classification โ€” Calls GPT-4o-mini (or Claude) with the assembled conversation. Returns a structured JSON output containing: scenario category, a concise Q&A pair, a handling pattern description, key entities, sentiment, resolution status, and a summary.

๐Ÿ”ข Generate Embeddings โ€” Calls the OpenAI Embeddings API to generate a 1536-dimension vector for both the KB entry and the correction record. Used for semantic similarity search in the downstream draft-generation workflow.

๐Ÿ” Duplicate Detection & DB Insert โ€” Performs cosine similarity checks against existing records before inserting. KB entries and scenario patterns are blocked if a match exceeds 92% similarity. Corrections are inserted freely (deduped at 92%) since diverse examples improve AI draft quality. New records are written to three tables: kb_data, scenario_patterns, and corrections.

๐Ÿงฉ Prerequisites

Gmail account** โ€” OAuth2 credentials connected in n8n. The account must be the support inbox you want to import from.
OpenAI API key** โ€” Used for GPT-4o-mini classification ($0.002โ€“0.005 per thread) and text-embedding-3-small for vector generation ($0.0001 per record).
PostgreSQL database** โ€” With the pgvector extension enabled. Must have the three tables set up per the schema below.
n8n instance** โ€” Self-hosted or cloud. Requires the PostgreSQL and OpenAI nodes.

๐Ÿ—„๏ธ Database Schema

Table: kb_data
| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| thread_id | TEXT | Gmail thread ID |
| subject | TEXT | Email subject |
| category | TEXT | AI-assigned scenario type |
| question | TEXT | Customer issue / question |
| answer | TEXT | Support resolution |
| entities | TEXT | Key entities extracted by AI |
| sentiment | TEXT | Customer sentiment |
| resolution_status | TEXT | Resolved / Unresolved |
| embedding | vector(1536) | OpenAI embedding for similarity search |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |

Table: scenario_patterns
| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| category | TEXT | Scenario type |
| pattern_description | TEXT | Handling logic summary |
| example_subject | TEXT | Representative subject line |
| embedding | vector(1536) | OpenAI embedding |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |

Table: corrections
| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| gmail_thread_id | TEXT | Gmail thread ID |
| original_email_body | TEXT | Customer's original message |
| human_sent_text | TEXT | Actual support reply sent |
| ai_draft_text | TEXT | NULL for historical imports |
| diff_summary | TEXT | Import note or live diff |
| classification | TEXT | Scenario category |
| embedding | vector(1536) | OpenAI embedding |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |

๐Ÿ’ฐ Cost Estimate

| Item | Estimated Cost |
|---|---|
| Gmail OAuth2 | Free |
| GPT-4o-mini (100 threads classification) | ~$0.20โ€“0.50 |
| text-embedding-3-small (100 records ร— 2 embeddings) | ~$0.02 |
| PostgreSQL + pgvector (self-hosted) | ~$5โ€“15/mo |
| n8n self-hosted (AWS t3.small) | ~$10โ€“15/mo |
| Total per 100-email import run | ~$0.25โ€“0.75 |

โš™๏ธ Setup Instructions

Gmail โ€” Connect your Gmail account in n8n using OAuth2. Ensure the account is the support inbox. Grant read permissions for messages and threads.

PostgreSQL โ€” Create a new database, enable the pgvector extension (CREATE EXTENSION IF NOT EXISTS vector), then create the three tables using the schema above.

OpenAI โ€” Add your OpenAI API key as an n8n credential. Used for both the chat completion node (classification) and the HTTP Request node (embeddings).

Customise the AI prompt โ€” Open the AI Classification node and update the system prompt to reflect your business type, support tone, and the scenario categories relevant to your domain (e.g. Refund Request, Order Status, Technical Issue, Billing Query).

Set your domain filter โ€” In the Parse & Filter node, update the YOUR_DOMAIN variable to your support team's email domain so outbound replies are correctly excluded from customer email processing.

Set fetch limit โ€” In the Fetch Emails node, set the limit parameter to the number of historical emails you want to import per run. Start with 5โ€“10 to validate the pipeline before running at scale.

Run manually โ€” Click Execute Workflow. Monitor the output of each node to verify parsing, classification, and DB inserts are working correctly.

๐Ÿ’ก Key Benefits

โœ” Converts your inbox into a structured KB โ€” no manual tagging or categorisation required
โœ” AI classification assigns scenario types and extracts Q&A pairs automatically
โœ” Vector embeddings enable semantic similarity search in downstream draft workflows
โœ” Smart deduplication keeps KB and scenario tables clean โ€” no near-duplicate entries
โœ” Corrections table accumulates diverse examples โ€” improves AI draft quality over time
โœ” Thread-aware โ€” reconstructs full conversations, not just individual emails
โœ” Modular AI node โ€” swap GPT-4o-mini for Claude or any other model with minimal changes
โœ” One-time historical import feeds directly into live draft-generation workflows

๐Ÿ‘ฅ Perfect For

Support teams wanting to automate email draft generation with AI trained on their own history
Founders or small teams building a knowledge base from years of accumulated support emails
Developers building AI-powered support automation on top of Gmail
Anyone who wants their AI drafts to sound like their own team โ€” not generic templates

๐Ÿ”— Related Workflows

This workflow is Step 1 in a two-workflow system:

KB Builder โ€” Historical Emails* *(this workflow) โ€” Imports historical threads to seed the KB
AI Draft Generator* *(coming soon) โ€” Monitors new inbound emails, retrieves semantically similar KB entries and corrections, and generates personalised reply drafts delivered to your inbox for one-click review and send

Nodes Used (5)

AI Agent
@n8n/n8n-nodes-langchain.agent
Code
n8n-nodes-base.code
Gmail
n8n-nodes-base.gmail
OpenAI Chat Model
@n8n/n8n-nodes-langchain.lmChatOpenAi
Postgres
n8n-nodes-base.postgres