Generate consensus-based answers using Claude, GPT, Grok and Gemini

Go to Workflow
19 views
Built by Yehor EGMS Yehor EGMS
Created on June 13, 2026

Description

The original LLM Council concept was introduced by Andrej Karpathy and published as an open-source repository demonstrating multi-model consensus and ranking.
This workflow is my adaptation of that original idea, reimplemented and structured as a production-ready n8n template. Original repository - https://github.com/karpathy/llm-council

This n8n template implements the LLM Council pattern: a single user question is processed in parallel by multiple large language models, independently evaluated by peer models, and then synthesized into one high-quality, consensus-driven final answer.
It is designed for use cases where answer quality, balance, and reduced single-model bias are critical.

๐Ÿ“Œ Section 1: Trigger & Input

โšก When Chat Message Received (Chat Trigger)
Purpose:
Receives a userโ€™s message and initiates the entire workflow.

How it works:

A user sends a chat message

The message is stored as the Original Question

The same input is forwarded simultaneously to multiple LLM pipelines

Why it matters:
Provides a clean, unified entry point for all downstream multi-model logic.

๐Ÿ“Œ Section 2: Stage 1 โ€” Parallel LLM Responses

๐Ÿค– Basic LLM Chains (x4)
Models used:

Anthropic Claude

OpenAI GPT

xAI Grok

Google Gemini

Purpose:
Each model independently generates its own response to the same question.

Key characteristics:

Identical prompt structure for all models

Independent reasoning paths

No shared context between models

Why it matters:
Produces diverse perspectives, reasoning styles, and solution approaches.

๐Ÿ“Œ Section 3: Stage 2 โ€” Response Anonymization

๐Ÿงพ Set Nodes (Response A / B / C / D)
Purpose:
Stores model outputs in an anonymized format:

Response A

Response B

Response C

Response D

Why it matters:
Prevents evaluator models from knowing which LLM authored which response, reducing bias during evaluation.

๐Ÿ“Œ Section 4: Stage 3 โ€” Peer Evaluation & Ranking

๐Ÿ“Š Evaluation Chains (Claude / GPT / Grok / Gemini)
Purpose:
Each model acts as a reviewer and:

Analyzes all four anonymized responses

Describes strengths and weaknesses of each

Produces a strict FINAL RANKING from best to worst

Ranking format (strict):

FINAL RANKING:
Response B
Response A
Response D
Response C


Why it matters:
Creates multiple independent quality assessments from different model perspectives.

๐Ÿ“Œ Section 5: Stage 4 โ€” Ranking Aggregation

๐Ÿงฎ Code Node (JavaScript)
Purpose:
Aggregates all peer rankings by:

Parsing ranking positions

Calculating average position per response

Counting evaluation occurrences

Sorting responses by best average score

Output includes:

Aggregated rankings

Best response label

Best average score

Why it matters:
Transforms subjective rankings into a structured, quantitative consensus.

๐Ÿ“Œ Section 6: Stage 5 โ€” Final Consensus Answer

๐Ÿง  Chairman LLM Chain
Purpose:
One model acts as the Council Chairman and:

Reviews all original responses

Considers peer rankings and aggregated scores

Identifies consensus patterns and disagreements

Produces a single, clear, high-quality final answer

Why it matters:
Delivers a refined response that reflects collective model intelligence rather than a simple average.

๐Ÿ“Š Workflow Overview
Stage Node / Logic Purpose
1 Chat Trigger Receive user question
2 LLM Chains Generate independent responses
3 Set Nodes Anonymize outputs
4 Evaluation Chains Peer review & ranking
5 Code Node Aggregate rankings
6 Chairman LLM Final synthesized answer
๐ŸŽฏ Key Benefits

๐Ÿง  Multi-model intelligence โ€” avoids reliance on a single LLM
โš–๏ธ Reduced bias โ€” anonymized peer evaluation
๐Ÿ“Š Quality-driven selection โ€” ranking-based consensus
๐Ÿ” Modular architecture โ€” easy to add or replace models
๐ŸŒ Language-flexible โ€” input and output languages configurable
๐Ÿงฉ Production-ready logic โ€” clear stages, deterministic ranking

๐Ÿš€ Ideal Use Cases

High-stakes decision support

Complex technical or architectural questions

Strategy and research synthesis

AI assistants requiring higher trust and reliability

Comparing and selecting the best LLM-generated answers

Nodes Used (11)

Anthropic Chat Model
@n8n/n8n-nodes-langchain.lmChatAnthropic
Basic LLM Chain
@n8n/n8n-nodes-langchain.chainLlm
Code
n8n-nodes-base.code
Gmail
n8n-nodes-base.gmail
Google Gemini Chat Model
@n8n/n8n-nodes-langchain.lmChatGoogleGemini
OpenAI Chat Model
@n8n/n8n-nodes-langchain.lmChatOpenAi
OpenRouter Chat Model
@n8n/n8n-nodes-langchain.lmChatOpenRouter
Slack
n8n-nodes-base.slack
Telegram
n8n-nodes-base.telegram
WhatsApp Business Cloud
n8n-nodes-base.whatsApp
xAI Grok Chat Model
@n8n/n8n-nodes-langchain.lmChatXAiGrok