Evaluation metric example: Categorization

405 views

Built by

David Roberts

Created on June 05, 2026

Description

AI evaluation in n8n

This is a template for n8n's evaluation feature.

Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.

By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.

How it works

This template shows how to calculate a workflow evaluation metric: whether a category matches the expected one.

The workflow takes support tickets and generates a category and priority, which is then compared with the correct answers in the dataset.

We use an evaluation trigger to read in our dataset
It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
Once the category is generated by the agent, we check whether it matches the expected one in the dataset
Finally we pass this information back to n8n as a metric

Nodes Used (4)

AI Agent

@n8n/n8n-nodes-langchain.agent

Evaluation

n8n-nodes-base.evaluation

OpenAI Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenAi

Structured Output Parser

@n8n/n8n-nodes-langchain.outputParserStructured

Evaluation metric example: Categorization

Description

Nodes Used (4)

Select Nodes to Filter