Multi-Modal Expense Tracking with GPT-4, Gemini OCR, and Voice via Telegram

Go to Workflow
0 views
Built by Oussama Oussama
Created on June 08, 2026

Description

This n8n template creates an intelligent expense tracking system ๐Ÿค– that processes text, voice, and receipt images through Telegram. The assistant automatically categorizes expenses, handles currency conversions ๐ŸŒ, and maintains financial records in Google Sheets while providing smart spending insights ๐Ÿ’ก.

Use Cases:

๐Ÿ—ฃ๏ธ Personal expense tracking via Telegram chat
๐Ÿงพ Receipt scanning and data extraction
๐Ÿ’ฑ Multi-currency expense management
๐Ÿ“‚ Automated financial categorization
๐ŸŽ™๏ธ Voice-to-expense logging
๐Ÿ“Š Daily/weekly/monthly spending analysis

How it works:

Multi-Input Processing: Telegram trigger captures text messages, voice notes, and receipt images.
Content Analysis: A Switch node routes different input types (text, audio, images) to appropriate processors.
Voice Processing: ElevenLabs converts voice messages to text for expense extraction.
Receipt OCR: Google Gemini analyzes receipt images to extract amounts and descriptions.
Expense Classification: An LLM determines if the input is an expense or a general query.
Expense Parsing: For multiple expenses, the AI splits and normalizes each item.
Currency Conversion: An exchange rate API converts foreign currencies to USD.
Smart Categorization: The AI agent assigns expenses to predefined categories with emojis.
Data Storage: Google Sheets stores all expense records with automatic totals.
Intelligent Responses: The agent provides spending summaries, alerts, and financial insights.

Requirements:

๐ŸŒ Telegram Bot API access
๐Ÿค– OpenAI, Gemini, or any other AI model
๐Ÿ—ฃ๏ธ ElevenLabs API for voice processing
๐Ÿ“ Google Sheets API access
๐Ÿ’น Exchange rate API access

Good to know:

โš ๏ธ Daily spending alerts trigger when expenses exceed 100 USD.
๐Ÿท๏ธ Supports 12 predefined expense categories with emoji indicators.
๐Ÿ”„ Automatic currency detection and conversion to USD.
๐ŸŽค Voice messages are processed through speech-to-text.
๐Ÿ“ธ Receipt images are analyzed using computer vision.

Customizing this workflow:

โœ๏ธ Modify expense categories in the system prompt.
๐Ÿ“ˆ Adjust spending alert thresholds.
๐Ÿ’ต Change the base currency from USD to your preferred currency.
โœ… Add additional expense validation rules.
๐Ÿ”— Integrate with other financial platforms.

Nodes Used (8)

AI Agent
@n8n/n8n-nodes-langchain.agent
Azure OpenAI Chat Model
@n8n/n8n-nodes-langchain.lmChatAzureOpenAi
Basic LLM Chain
@n8n/n8n-nodes-langchain.chainLlm
Calculator
@n8n/n8n-nodes-langchain.toolCalculator
Google Gemini
@n8n/n8n-nodes-langchain.googleGemini
HTTP Request
n8n-nodes-base.httpRequest
MCP Client Tool
@n8n/n8n-nodes-langchain.mcpClientTool
Telegram
n8n-nodes-base.telegram