Analyze legal contract risk with Google Gemini hybrid RAG and Supabase

Go to Workflow
0 views
Built by Divyanshu Gupta Divyanshu Gupta
Created on June 05, 2026

Description


🚀 What This Workflow Does
This workflow transforms any PDF legal contract into a detailed AI-powered risk report — in under 5 minutes. Upload a contract, and the system automatically splits it into clauses, analyses each one using Hybrid RAG (semantic + keyword search), scores risk as HIGH / MEDIUM / LOW, and delivers plain-English explanations with safer alternative wording.

🔥 Why Hybrid RAG?
Most dangerous clauses don't use obvious legal keywords.
"The Client accepts full responsibility for all third-party claims" is an indemnification clause — but keyword search misses it.
Hybrid RAG combines:
Vector Search (pgvector)** — finds semantically similar risky patterns
BM25 Keyword Search** — catches explicit legal red flags
RRF Reranking** — merges both results with clause-type boosting

🔍 What It Does
Accepts a PDF contract via webhook (with async job_id tracking)
Splits contract into individual numbered clauses
Classifies each clause type using Google Gemini (indemnification, IP, termination, etc.)
Generates vector embeddings and searches a Supabase knowledge base
Scores each clause HIGH / MEDIUM / LOW using regex + AI
AI Agent (Gemini Flash) explains risk in plain language + suggests safer wording
Aggregates all results into a single JSON report
Saves report to Supabase (frontend polls for result asynchronously)

⚙️ Architecture (Two Pipelines)
Pipeline 1 — Ingestion: Builds the knowledge base of risky clause patterns in Supabase
Pipeline 2 — Query: Analyses new contracts against the knowledge base

Both pipelines run in the same workflow — the branch splits at Extract Embedding.

🧠 Key Technical Decisions
Async architecture** — Frontend fires request + polls Supabase. No timeout issues.
job_id tracking** — Preserved across all nodes via ...$json spread
RRF Reranking** — Combines vector + BM25 scores with type-based boost multipliers
Regex Risk Scorer** — First-pass risk classification before expensive LLM call
Gemini Flash** — Fast, cost-efficient LLM for per-clause annotation

📦 Requirements
Google Gemini API key** — for clause classification + embeddings + AI Agent
Supabase project** — with pgvector extension enabled
Supabase tables:** legal_clauses (knowledge base) + reports (results)
Supabase functions:** match_clauses() + keyword_search_clauses()
Frontend (optional):** HTML/CSS/JS web app hosted on Netlify

💡 Example Use Cases
Freelancers reviewing client contracts before signing
Startups evaluating vendor or investor agreements
Legal ops teams standardising contract review at scale
Business owners catching risky clauses without legal fees

🎯 Output
Per-clause: risk_level, plain-English explanation, risk_reason, safer_alternative, key_obligations, legal_area
Summary: overall_risk_score, risk_distribution, legal_areas map, high_risk_clauses list
Stored as JSON in Supabase reports table, keyed by job_id

Nodes Used (6)

AI Agent
@n8n/n8n-nodes-langchain.agent
Code
n8n-nodes-base.code
Google Gemini
@n8n/n8n-nodes-langchain.googleGemini
Google Gemini Chat Model
@n8n/n8n-nodes-langchain.lmChatGoogleGemini
HTTP Request
n8n-nodes-base.httpRequest
Supabase
n8n-nodes-base.supabase