Evaluation Metric: Summarization

496 views

Built by

Jimleuk

Created on June 05, 2026

Description

This n8n template demonstrates how to calculate the evaluation metric "Summarization" which in this scenario, measures the LLM's accuracy and faithfulness in producing summaries which are based on an incoming Youtube transcript.

The scoring approach is adapted from https://cloud.google.com/vertex-ai/generative-ai/docs/models/metrics-templates#pointwise_summarization_quality

How it works
This evaluation works best for an AI summarization workflows.
For our scoring, we simple compare the generated response to the original transcript.
A key factor is to look out information in the response which is not mentioned in the documents.
A high score indicates LLM adherence and alignment whereas a low score could signal inadequate prompt or model hallucination.

Requirements
n8n version 1.94+
Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing

Nodes Used (6)

Basic LLM Chain

@n8n/n8n-nodes-langchain.chainLlm

Evaluation

n8n-nodes-base.evaluation

Google Drive

n8n-nodes-base.googleDrive

Google Gemini Chat Model

@n8n/n8n-nodes-langchain.lmChatGoogleGemini

OpenAI Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenAi

Structured Output Parser

@n8n/n8n-nodes-langchain.outputParserStructured

Evaluation Metric: Summarization

Description

Nodes Used (6)

Select Nodes to Filter