Generate consensus-based answers using Claude, GPT, Grok and Gemini

19 views

Built by

Yehor EGMS

Created on June 13, 2026

Description

The original LLM Council concept was introduced by Andrej Karpathy and published as an open-source repository demonstrating multi-model consensus and ranking.
This workflow is my adaptation of that original idea, reimplemented and structured as a production-ready n8n template. Original repository - https://github.com/karpathy/llm-council

This n8n template implements the LLM Council pattern: a single user question is processed in parallel by multiple large language models, independently evaluated by peer models, and then synthesized into one high-quality, consensus-driven final answer.
It is designed for use cases where answer quality, balance, and reduced single-model bias are critical.

📌 Section 1: Trigger & Input

⚡ When Chat Message Received (Chat Trigger)
Purpose:
Receives a user’s message and initiates the entire workflow.

How it works:

A user sends a chat message

The message is stored as the Original Question

The same input is forwarded simultaneously to multiple LLM pipelines

Why it matters:
Provides a clean, unified entry point for all downstream multi-model logic.

📌 Section 2: Stage 1 — Parallel LLM Responses

🤖 Basic LLM Chains (x4)
Models used:

Anthropic Claude

OpenAI GPT

xAI Grok

Google Gemini

Purpose:
Each model independently generates its own response to the same question.

Key characteristics:

Identical prompt structure for all models

Independent reasoning paths

No shared context between models

Why it matters:
Produces diverse perspectives, reasoning styles, and solution approaches.

📌 Section 3: Stage 2 — Response Anonymization

🧾 Set Nodes (Response A / B / C / D)
Purpose:
Stores model outputs in an anonymized format:

Response A

Response B

Response C

Response D

Why it matters:
Prevents evaluator models from knowing which LLM authored which response, reducing bias during evaluation.

📌 Section 4: Stage 3 — Peer Evaluation & Ranking

📊 Evaluation Chains (Claude / GPT / Grok / Gemini)
Purpose:
Each model acts as a reviewer and:

Analyzes all four anonymized responses

Describes strengths and weaknesses of each

Produces a strict FINAL RANKING from best to worst

Ranking format (strict):

FINAL RANKING:
Response B
Response A
Response D
Response C

Why it matters:
Creates multiple independent quality assessments from different model perspectives.

📌 Section 5: Stage 4 — Ranking Aggregation

🧮 Code Node (JavaScript)
Purpose:
Aggregates all peer rankings by:

Parsing ranking positions

Calculating average position per response

Counting evaluation occurrences

Sorting responses by best average score

Output includes:

Aggregated rankings

Best response label

Best average score

Why it matters:
Transforms subjective rankings into a structured, quantitative consensus.

📌 Section 6: Stage 5 — Final Consensus Answer

🧠 Chairman LLM Chain
Purpose:
One model acts as the Council Chairman and:

Reviews all original responses

Considers peer rankings and aggregated scores

Identifies consensus patterns and disagreements

Produces a single, clear, high-quality final answer

Why it matters:
Delivers a refined response that reflects collective model intelligence rather than a simple average.

📊 Workflow Overview
Stage Node / Logic Purpose
1 Chat Trigger Receive user question
2 LLM Chains Generate independent responses
3 Set Nodes Anonymize outputs
4 Evaluation Chains Peer review & ranking
5 Code Node Aggregate rankings
6 Chairman LLM Final synthesized answer
🎯 Key Benefits

🧠 Multi-model intelligence — avoids reliance on a single LLM
⚖️ Reduced bias — anonymized peer evaluation
📊 Quality-driven selection — ranking-based consensus
🔁 Modular architecture — easy to add or replace models
🌍 Language-flexible — input and output languages configurable
🧩 Production-ready logic — clear stages, deterministic ranking

🚀 Ideal Use Cases

High-stakes decision support

Complex technical or architectural questions

Strategy and research synthesis

AI assistants requiring higher trust and reliability

Comparing and selecting the best LLM-generated answers

Nodes Used (11)

Anthropic Chat Model

@n8n/n8n-nodes-langchain.lmChatAnthropic

Basic LLM Chain

@n8n/n8n-nodes-langchain.chainLlm

Code

n8n-nodes-base.code

Gmail

n8n-nodes-base.gmail

Google Gemini Chat Model

@n8n/n8n-nodes-langchain.lmChatGoogleGemini

OpenAI Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenAi

OpenRouter Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenRouter

Slack

n8n-nodes-base.slack

n8n-nodes-base.telegram

WhatsApp Business Cloud

n8n-nodes-base.whatsApp

xAI Grok Chat Model

@n8n/n8n-nodes-langchain.lmChatXAiGrok

Generate consensus-based answers using Claude, GPT, Grok and Gemini

Description

Nodes Used (11)

Select Nodes to Filter