Let's Build: document intelligence pipeline using Transloadit
The document intelligence market is booming. Companies like Reducto.ai are gaining traction by helping enterprises extract structured data from PDFs and scanned documents. Their value proposition is compelling: upload a document, define a schema, get clean JSON back.
That said, if you want product-level control – custom routing, storage, compliance, and tight integration with the rest of your file workflows – it’s often more powerful to build your own Reducto-style experience on top of primitives.
Transloadit has everything you need to provide you with those primitives. In this guide, we’ll show you how to combine our document processing Robots with 🤖 /ai/chat to create a flexible, schema-driven pipeline, which you can shape into your own document AI product.
If you want a turnkey product, a dedicated document AI vendor can be a great fit. But if you need to blend document extraction with uploads, conversions, storage, and downstream workflows, building on Transloadit gives you more leverage.
What document intelligence really means
At its core, document intelligence involves three key capabilities:
- Parse – extract text from documents using OCR, preserving layout and structure.
- Split – break multi-page documents into manageable chunks.
- Extract – pull structured data that matches a predefined schema.
Let’s build each of these using Transloadit's versatile toolkit.
Setting up your TypeScript project
First, set up a project using the Transloadit Node SDK:
yarn init -y
yarn add transloadit
Note
The v4 Node SDK requires Node.js 20 or newer. If you are upgrading from v3 or CommonJS, see the migration guide.
Create your client:
import { Transloadit } from 'transloadit'
const transloadit = new Transloadit({
authKey: process.env.TRANSLOADIT_AUTH_KEY!,
authSecret: process.env.TRANSLOADIT_AUTH_SECRET!,
})
Step 1: Document parsing with OCR (optional)
The 🤖 /document/ocr Robot extracts text from PDFs, including scanned PDFs. It supports multiple providers and can return results with layout coordinates or plain text. If your source isn’t a PDF, convert it first using the 🤖 /document/convert Robot.
If you are using a PDF-capable model such as Claude Sonnet 4, you can skip OCR and send the PDF directly to the 🤖 /ai/chat Robot. OCR is still useful when you need layout coordinates or want to normalize non-PDF files first.
const parseResult = await transloadit.createAssembly({
params: {
steps: {
ocr_extract: {
robot: '/document/ocr',
use: ':original',
provider: 'gcp',
format: 'json',
granularity: 'full',
result: true,
},
},
},
files: {
document: './invoice.pdf',
},
waitForCompletion: true,
})
const ocrResults = parseResult.results.ocr_extract
The granularity: 'full' option returns bounding box coordinates for each text block, which is
useful for understanding layout.
Step 2: Document splitting
For large documents, the 🤖 /document/split Robot lets you extract specific pages:
const splitResult = await transloadit.createAssembly({
params: {
steps: {
first_pages: {
robot: '/document/split',
use: ':original',
pages: ['1-5'],
},
remaining_pages: {
robot: '/document/split',
use: ':original',
pages: ['6-'],
},
},
},
files: {
document: './large-report.pdf',
},
waitForCompletion: true,
})
Step 3: Schema-driven data extraction with AI
Here’s where the magic happens. The 🤖 /ai/chat Robot can
process documents and return structured JSON that matches your schema. This is directly comparable
to Reducto’s Extract API. Claude Sonnet 4 supports PDFs, so we’ll use
model: 'anthropic/claude-4-sonnet-20250514' below. When you set format: 'json', the output is a
JSON file in the Assembly results.
Zod v4 ships a native z.toJSONSchema() helper. The snippets below use a small helper that calls it
when available and falls back to zod-to-json-schema for Zod v3 projects.
If your account does not have shared AI credentials configured, create AI Template
Credentials in the Transloadit dashboard (for OpenAI, Anthropic, or Google) and reference them
via credentials.
For a quick start, you can omit credentials and set test_credentials: true to use
Transloadit-provided test keys. While this is convenient for demos, shared keys can be rate-limited,
so production workloads should supply their own credentials.
import { z } from 'zod'
import { zodToJsonSchema } from 'zod-to-json-schema'
const toJsonSchema = (schema: z.ZodTypeAny) =>
typeof (z as { toJSONSchema?: (schema: z.ZodTypeAny) => unknown }).toJSONSchema === 'function'
? (z as { toJSONSchema: (schema: z.ZodTypeAny) => unknown }).toJSONSchema(schema)
: zodToJsonSchema(schema)
const invoiceSchema = z.object({
invoice_number: z.string(),
vendor_name: z.string(),
vendor_address: z.string().optional(),
invoice_date: z.string().optional(),
due_date: z.string().optional(),
total_amount: z.number(),
currency: z.string().optional(),
line_items: z
.array(
z.object({
description: z.string(),
quantity: z.number().optional(),
unit_price: z.number().optional(),
total: z.number().optional(),
}),
)
.optional(),
tax_amount: z.number().optional(),
payment_terms: z.string().optional(),
})
const extractionResult = await transloadit.createAssembly({
params: {
steps: {
extract_data: {
robot: '/ai/chat',
use: ':original',
credentials: 'my_ai_credentials',
model: 'anthropic/claude-4-sonnet-20250514',
format: 'json',
schema: JSON.stringify(toJsonSchema(invoiceSchema)),
messages: `Extract all invoice data from this document.
Be precise with amounts and dates.
If a field is not present, omit it from the response.`,
result: true,
},
},
},
files: {
invoice: './invoice.pdf',
},
waitForCompletion: true,
})
const extractedFile = extractionResult.results.extract_data[0]
const invoiceData = await fetch(extractedFile.ssl_url).then((response) => response.json())
console.log(`Invoice #${invoiceData.invoice_number}: $${invoiceData.total_amount}`)
Building a complete pipeline
Now let’s combine everything into a production-ready pipeline that:
- optionally extracts text with OCR,
- splits large files,
- extracts structured data with AI, and
- stores results in S3.
import { z } from 'zod'
import { zodToJsonSchema } from 'zod-to-json-schema'
import { Transloadit } from 'transloadit'
const toJsonSchema = (schema: z.ZodTypeAny) =>
typeof (z as { toJSONSchema?: (schema: z.ZodTypeAny) => unknown }).toJSONSchema === 'function'
? (z as { toJSONSchema: (schema: z.ZodTypeAny) => unknown }).toJSONSchema(schema)
: zodToJsonSchema(schema)
const financialDocumentSchema = z.object({
document_type: z.enum(['invoice', 'receipt', 'statement', 'contract']),
document_date: z.string(),
parties: z.array(
z.object({
name: z.string(),
role: z.enum(['vendor', 'customer', 'signatory']),
address: z.string().optional(),
}),
),
amounts: z.array(
z.object({
description: z.string(),
value: z.number(),
currency: z.string(),
}),
),
key_terms: z.array(z.string()).optional(),
summary: z.string(),
})
type FinancialDocument = z.infer<typeof financialDocumentSchema>
const transloadit = new Transloadit({
authKey: process.env.TRANSLOADIT_AUTH_KEY!,
authSecret: process.env.TRANSLOADIT_AUTH_SECRET!,
})
async function processFinancialDocument(filePath: string) {
const result = await transloadit.createAssembly({
params: {
steps: {
pdf_verified: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '==', 'application/pdf']],
},
non_pdf: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '!=', 'application/pdf']],
},
pdf_converted: {
robot: '/document/convert',
use: 'non_pdf',
format: 'pdf',
},
extract_structured: {
robot: '/ai/chat',
use: ['pdf_verified', 'pdf_converted'],
credentials: 'ai_credentials',
model: 'anthropic/claude-4-sonnet-20250514',
format: 'json',
schema: JSON.stringify(toJsonSchema(financialDocumentSchema)),
messages: 'Extract the structured data from this document.',
result: true,
},
store_results: {
robot: '/s3/store',
use: [':original', 'extract_structured'],
credentials: 's3_credentials',
path: 'documents/${file.name}/',
},
},
},
files: {
document: filePath,
},
waitForCompletion: true,
})
if (result.ok !== 'ASSEMBLY_COMPLETED') {
throw new Error(`Assembly failed: ${result.error}`)
}
const extractedFile = result.results.extract_structured[0]
return fetch(extractedFile.ssl_url).then((response) => response.json())
}
const data = await processFinancialDocument('./contract.pdf')
console.log(`Processed ${data.document_type}: ${data.summary}`)
The /document/convert → PDF path supports the following input types:
- Word:
.doc,.docx - PowerPoint:
.ppt,.pptx,.pps,.ppz,.pot - Excel:
.xls,.xlsx,.xla - OpenDocument:
.odt,.ott,.odd,.oda - Web/markup:
.html,.xhtml,.xml, Markdown (.md) - Text & rich text:
.txt,.csv,.rtf,.rtx,.tex/LaTeX - Images:
.jpg,.jpeg,.png,.gif,.svg - Vector/print:
.ai,.eps,.ps
If you need OCR output for layout-aware workflows, insert a /document/ocr Step and
change use: ':original' to use: 'ocr_text', then include that Step in your storage
targets.
Processing multiple document types
With the 🤖 /file/filter Robot, you can route PDFs
directly to the model while converting everything else to PDF first. Using != makes the non‑PDF
branch explicit:
// Reuse invoiceSchema + toJsonSchema from above.
const multiTypeResult = await transloadit.createAssembly({
params: {
steps: {
pdf_verified: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '==', 'application/pdf']],
},
non_pdf: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '!=', 'application/pdf']],
},
pdf_converted: {
robot: '/document/convert',
use: 'non_pdf',
format: 'pdf',
},
extract_data: {
robot: '/ai/chat',
use: ['pdf_verified', 'pdf_converted'],
credentials: 'ai_credentials',
model: 'anthropic/claude-4-sonnet-20250514',
format: 'json',
schema: JSON.stringify(toJsonSchema(invoiceSchema)),
messages: 'Extract data from this document.',
result: true,
},
},
},
files: {
doc1: './receipt.pdf',
doc2: './receipt-photo.jpg',
},
waitForCompletion: true,
})
Flow overview:
:original
├─ pdf_verified (file/filter: mime == pdf) ──▶ /ai/chat
└─ non_pdf (file/filter: mime != pdf) ──▶ /document/convert (pdf) ──▶ /ai/chat
Using Templates for reusability
For production use, save your Assembly Instructions as a Template and reference it by ID:
const result = await transloadit.createAssembly({
params: {
template_id: 'your-invoice-extraction-template',
fields: {
custom_prompt: 'Focus on extracting payment terms and due dates.',
},
},
files: {
document: './invoice.pdf',
},
waitForCompletion: true,
})
Handling large document batches
In order to process a batch of many documents efficiently, it is best to keep concurrency limited:
import pMap from 'p-map'
async function processBatch(files: string[]): Promise<Map<string, FinancialDocument>> {
const concurrency = 5
const batchResults = await pMap(
files,
async (file) => {
const result = await transloadit.createAssembly({
params: {
template_id: 'document-extraction-template',
},
files: { document: file },
waitForCompletion: true,
})
// Assumes the template includes a step named `extract_structured`.
const extractedFile = result.results.extract_structured[0]
const data = await fetch(extractedFile.ssl_url).then((response) => response.json())
return {
file,
data,
}
},
{ concurrency },
)
return new Map(batchResults.map(({ file, data }) => [file, data]))
}
Why teams choose Transloadit for document AI
- A single API covers ingest, conversion, OCR, AI extraction, and delivery.
- Import/export integrations for cloud storage (S3, Azure, Google Cloud Storage, Dropbox, and more).
- Assembly Instructions let you version, reuse, and branch pipelines without glue code.
- A single vendor for documents and broader file workloads (previews, thumbnails, virus scanning, image/audio/video processing).
Transloadit vs. Dedicated document AI platforms
Think of Reducto as a focused product and Transloadit as a composable platform. You can replicate the core extraction flow and then extend it with everything around it.
| Feature | Transloadit | Reducto.ai |
|---|---|---|
| OCR / text extraction | ✅ 🤖 /document/ocr (PDFs) | ✅ Parse API |
| Document splitting | ✅ 🤖 /document/split | ✅ Split API |
| Schema-based extraction | ✅ 🤖 /ai/chat with JSON schema | ✅ Extract API |
| Storage integrations | ✅ Import/export to cloud storage | External tooling |
| Workflow orchestration | ✅ Assembly Instructions + Templates | External tooling |
| AI providers | ✅ OpenAI, Anthropic, Google (with credentials) | — |
| Broader file workloads | ✅ Image/video/audio + previews + security | Document-focused stack |
When to use this approach
This Transloadit-based approach is a strong fit when:
- you want to build your own document AI product or internal platform,
- you want one vendor for ingest, conversion, extraction, and delivery,
- you need to combine document intelligence with broader media workflows, and
- you want flexibility in choosing AI providers and predictable pricing.
Ready to build?
Document intelligence doesn’t have to mean adding another specialized vendor. With the 🤖 /document/ocr, 🤖 /document/split, and 🤖 /ai/chat Robots, you can build sophisticated extraction pipelines that rival dedicated platforms.
Start by exploring our AI Robot docs and see how far you can take your document workflows inside one platform.