Convert PDF, DOC, and Images to Markdown using Datalab.to API

Go to Workflow
0 views
Built by Joseph Joseph
Created on June 05, 2026

Description

This n8n workflow converts various file formats (.pdf, .doc, .png, .jpg, .webp) to clean markdown text using the datalab.to API. Perfect for AI agents, LLM processing, and RAG (Retrieval Augmented Generation) data preparation for vector databases.

Workflow Description

Input
Trigger Node**: Form trigger or webhook to accept file uploads
Supported Formats**: PDF documents, Word documents (.doc/.docx), and images (PNG, JPG, WEBP)

Processing Steps
File Validation: Check file type and size constraints
HTTP Request Node:
Method: POST to https://api.datalab.to/v1/marker
Headers: X-API-Key with your datalab.to API key
Body: Multipart form data with the file
Response Processing: Extract the converted markdown text
Output Formatting: Clean and structure the markdown for downstream use

Output
Clean, structured markdown text ready for:
LLM prompt injection
Vector database ingestion
AI agent knowledge base processing
Document analysis workflows

Setup Instructions
Get API Access: Sign up at datalab.to to obtain your API key
Configure Credentials:
Create a new credential in n8n
Add Generic Header: X-API-Key with your API key as the value
Import Workflow: Ready to process files immediately

Use Cases
AI Workflows**: Convert documents for LLM analysis and processing
RAG Systems**: Prepare clean text for vector database ingestion
Content Management**: Batch convert files to searchable markdown format
Document Processing**: Extract text from mixed file types in automated pipelines

The workflow handles the complexity of different file formats while delivering consistent, AI-ready markdown output for your automation needs.

Nodes Used (1)

HTTP Request
n8n-nodes-base.httpRequest