Convert PDF, DOC, and Images to Markdown using Datalab.to API
Go to WorkflowDescription
This n8n workflow converts various file formats (.pdf, .doc, .png, .jpg, .webp) to clean markdown text using the datalab.to API. Perfect for AI agents, LLM processing, and RAG (Retrieval Augmented Generation) data preparation for vector databases.
Workflow Description
Input
Trigger Node**: Form trigger or webhook to accept file uploads
Supported Formats**: PDF documents, Word documents (.doc/.docx), and images (PNG, JPG, WEBP)
Processing Steps
File Validation: Check file type and size constraints
HTTP Request Node:
Method: POST to https://api.datalab.to/v1/marker
Headers: X-API-Key with your datalab.to API key
Body: Multipart form data with the file
Response Processing: Extract the converted markdown text
Output Formatting: Clean and structure the markdown for downstream use
Output
Clean, structured markdown text ready for:
LLM prompt injection
Vector database ingestion
AI agent knowledge base processing
Document analysis workflows
Setup Instructions
Get API Access: Sign up at datalab.to to obtain your API key
Configure Credentials:
Create a new credential in n8n
Add Generic Header: X-API-Key with your API key as the value
Import Workflow: Ready to process files immediately
Use Cases
AI Workflows**: Convert documents for LLM analysis and processing
RAG Systems**: Prepare clean text for vector database ingestion
Content Management**: Batch convert files to searchable markdown format
Document Processing**: Extract text from mixed file types in automated pipelines
The workflow handles the complexity of different file formats while delivering consistent, AI-ready markdown output for your automation needs.