Let's Build: AI invoice parser with Transloadit
Invoices are a core workflow for finance teams, but extracting fields by hand is slow and error prone. In this Let’s Build, we’ll assemble a small invoice parsing service using Transloadit’s primitives: route files, normalize formats, and ask AI for structured JSON.
What we’re building
- Accept any invoice format (PDF, DOCX, image)
- Convert non-PDFs to PDF with 🤖 /document/convert
- Extract structured fields with 🤖 /ai/chat
Schema first (zod)
We’ll use Zod to define the invoice schema and hand the JSON Schema to the AI step. This keeps the output predictable.
import { z } from 'zod'
import { zodToJsonSchema } from 'zod-to-json-schema'
const invoiceSchema = z.object({
vendor_name: z.string().optional(),
invoice_number: z.string().optional(),
invoice_date: z.string().optional(),
due_date: z.string().optional(),
currency: z.string().optional(),
subtotal: z.number().optional(),
tax: z.number().optional(),
total: z.number().optional(),
line_items: z
.array(
z.object({
description: z.string().optional(),
quantity: z.number().optional(),
unit_price: z.number().optional(),
total: z.number().optional(),
}),
)
.optional(),
})
const toJsonSchema = (schema: z.ZodTypeAny): unknown =>
typeof (z as { toJSONSchema?: typeof zodToJsonSchema }).toJSONSchema === 'function'
? (z as { toJSONSchema: typeof zodToJsonSchema }).toJSONSchema(schema)
: zodToJsonSchema(schema)
const invoiceJsonSchema = toJsonSchema(invoiceSchema)
Pipeline overview
PDF invoices ──▶ /ai/chat
Other docs ──▶ /document/convert (pdf) ──▶ /ai/chat
Assembly snippet
const assembly = await transloadit.createAssembly({
params: {
steps: {
pdf_verified: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '==', 'application/pdf']],
},
non_pdf: {
robot: '/file/filter',
use: ':original',
accepts: [['${file.mime}', '!=', 'application/pdf']],
},
pdf_converted: {
robot: '/document/convert',
use: 'non_pdf',
format: 'pdf',
},
extract_invoice: {
robot: '/ai/chat',
use: ['pdf_verified', 'pdf_converted'],
model: 'anthropic/claude-4-sonnet-20250514',
format: 'json',
schema: JSON.stringify(invoiceJsonSchema),
messages: 'Extract invoice fields. Omit keys if unknown.',
result: true,
},
},
},
files: {
document: filePath,
},
waitForCompletion: true,
})
Try the example app
The runnable version of this post lives at example_apps/invoice-parser/run.ts.
node example_apps/invoice-parser/run.ts
Suggested inputs for visuals
Use the bundled samples or swap in your own invoice PDFs for screenshots:
_assets/demos/inputs/scan.pdf_assets/demos/inputs/first_document.pdf_assets/demos/inputs/second_document.pdf
Next steps
- Add multi-currency normalization in your application layer.
- Store results in your database and add human review queues.
- Expand the schema to support multiple vendors and locales.
