Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism

Go to Workflow
0 views
Built by Yehor EGMS Yehor EGMS
Created on June 13, 2026

Description

πŸŽ™οΈ n8n Workflow: Voice Message Transcription with Access Control

This n8n workflow enables automated transcription of voice messages in Telegram groups with built-in access control and intelligent fallback mechanisms. It's designed for teams that need to convert audio messages to text while maintaining security and handling various audio formats.

πŸ“Œ Section 1: Trigger & Access Control

⚑ Receive Message (Telegram Trigger)
Purpose: Captures incoming messages from users in your Telegram group.

How it works: When a user sends a message (voice, audio, or text), the workflow is triggered and the sender's information is captured.

Benefit: Serves as the entry point for the entire transcription pipeline.

πŸ” Sender Verification
Purpose: Validates whether the sender has permission to use the transcription service.

Logic:
Check sender against authorized users list
If authorized β†’ Proceed to next step
If not authorized β†’ Send "Access denied" message and stop workflow

Benefit: Prevents unauthorized users from consuming AI credits and accessing the service.

πŸ“Œ Section 2: Message Type Detection

🎡 Audio/Voice Recognition
Purpose: Identifies the type of incoming message and audio format.

Why it's needed: Telegram handles different audio types with different statuses:
Voice notes (voice messages)
Audio files (standard audio attachments)
Text messages (no audio content)

Process:
Check if message contains audio/voice content
If no audio file detected β†’ Send "No audio file found" message
If audio detected β†’ Assign file ID and proceed to format detection

🧩 File Type Determination (IF Node)
Purpose: Identifies the specific audio format for proper processing.

Supported formats:
OGG (Telegram voice messages)
MPEG/MP3
MP4/M4A
Other audio formats

Logic:

If format recognized β†’ Proceed to transcription
If format not recognized β†’ Send "File format not recognized" message

Benefit: Ensures compatibility with transcription services by validating file types upfront.

πŸ“Œ Section 3: Primary Transcription (OpenAI)

πŸ“₯ File Download
Purpose: Downloads the audio file from Telegram for processing.

πŸ€– OpenAI Transcription
Purpose: Transcribes audio to text using OpenAI's Whisper API.

Why OpenAI: High-quality transcription with cost-effective pricing.

Process:
Send downloaded file to OpenAI transcription API
Simultaneously send notification: "Transcription started"
If successful β†’ Assign transcribed text to variable and proceed
If error occurs β†’ Trigger fallback mechanism

Benefit: Fast, accurate transcription with multi-language support.

πŸ“Œ Section 4: Fallback Transcription (Gemini)

πŸ›Ÿ Gemini Backup Transcription
Purpose: Provides a safety net if OpenAI transcription fails.

Process:
Receives file only if OpenAI node returns an error
Downloads and processes the same audio file
Sends to Google Gemini for transcription
Assigns transcribed text to the same text variable

Benefit: Ensures high reliabilityβ€”if one service fails, the other takes over automatically.

πŸ“Œ Section 5: Message Length Handling

πŸ“ Text Length Check (IF Node)
Purpose: Determines if the transcribed text exceeds Telegram's character limit.

Logic:

If text ≀ 4000 characters β†’ Send directly to Telegram
If text > 4000 characters β†’ Split into chunks

Why: Telegram has a 4,000-character limit per message.

βœ‚οΈ Text Splitting (Code Node)
Purpose: Breaks long transcriptions into 4,000-character segments.

Process:
Receives text longer than 4,000 characters
Splits text into chunks of ≀4,000 characters
Maintains readability by avoiding mid-word breaks
Outputs array of text chunks

πŸ“Œ Section 6: Response Delivery

πŸ’¬ Send Transcription (Telegram Node)
Purpose: Delivers the transcribed text back to the Telegram group.

Behavior:
Short messages:** Sent as a single message
Long messages:** Sent as multiple sequential messages

Benefit: Users receive complete transcriptions regardless of length, ensuring no content is lost.

πŸ“Š Workflow Overview Table

| Section | Node Name | Purpose |
|---------|-----------|---------|
| 1. Trigger | Receive Message | Captures incoming Telegram messages |
| 2. Access Control | Sender Verification | Validates user permissions |
| 3. Detection | Audio/Voice Recognition | Identifies message type and audio format |
| 4. Validation | File Type Check | Verifies supported audio formats |
| 5. Download | File Download | Retrieves audio file from Telegram |
| 6. Primary AI | OpenAI Transcription | Main transcription service |
| 7. Fallback AI | Gemini Transcription | Backup transcription service |
| 8. Processing | Text Length Check | Determines if splitting is needed |
| 9. Splitting | Code Node | Breaks long text into chunks |
| 10. Response | Send to Telegram | Delivers transcribed text |

🎯 Key Benefits

πŸ” Secure access control: Only authorized users can trigger transcriptions
πŸ’° Cost management: Prevents unauthorized credit consumption
🎡 Multi-format support: Handles various Telegram audio types
πŸ›‘οΈ High reliability: Dual-AI fallback ensures transcription success
πŸ“± Telegram-optimized: Automatically handles message length limits
🌍 Multi-language: Both AI services support numerous languages
⚑ Real-time notifications: Users receive status updates during processing
πŸ”„ Automatic chunking: Long transcriptions are intelligently split
🧠 Smart routing: Files are processed through the optimal path
πŸ“Š Complete delivery: No content loss regardless of transcription length

πŸš€ Use Cases

Team meetings:** Transcribe voice notes from team discussions
Client communications:** Convert client voice messages to searchable text
Documentation:** Create text records of verbal communications
Accessibility:** Make audio content accessible to all team members
Multi-language teams:** Leverage AI transcription for various languages

Nodes Used (4)

Code
n8n-nodes-base.code
Google Gemini
@n8n/n8n-nodes-langchain.googleGemini
OpenAI
@n8n/n8n-nodes-langchain.openAi
Telegram
n8n-nodes-base.telegram