Integrating OCR in the browser with tesseract.js

Optical Character Recognition (OCR) has traditionally been a server-side task, requiring users to upload documents to a server for processing. However, with advancements in web technologies, it's now possible to perform text recognition directly in the browser. This shift towards browser-based OCR offers immediate feedback, enhanced privacy, and reduced server load. In this article, we'll explore how to integrate OCR into your web applications using the open-source Tesseract.js library, enabling instant text recognition without leaving the browser.
Why browser-based OCR?
Performing OCR in the browser offers several benefits:
- Immediate Feedback: Users receive instant results without waiting for server processing.
- Enhanced Privacy: Sensitive documents never leave the user's device, addressing privacy concerns.
- Reduced Server Load: Offloading processing to the client reduces server costs and resource usage.
- Offline Capabilities: Users can perform OCR without an internet connection, improving accessibility.
Introducing tesseract.js: a powerful open-source OCR library
Tesseract.js is an open-source JavaScript library that brings the robust capabilities of Google's Tesseract OCR engine to web applications. Version 6.0.0 introduces significant improvements in memory management, runtime performance, and overall stability. The library now focuses on core text recognition functionality, with all output formats except 'text' disabled by default for optimal performance.
What’s new in tesseract.js v6.0.0
Tesseract.js v6.0.0 comes with several key improvements:
- Fixed memory leaks for more stable long-running sessions.
- Reduced runtime and memory usage for faster text recognition.
- Output formats other than 'text' are disabled by default to streamline performance.
- Simplified API initialization for easier integration.
Browser compatibility and requirements
Tesseract.js requires a modern browser with WebAssembly (WASM) support. Most current browsers support WASM, including:
- Chrome/Chromium 57+
- Firefox 52+
- Safari 11+
- Edge 79+
Note: Ensure your server correctly serves WebAssembly files with the MIME type 'application/wasm'.
Getting started with tesseract.js
Installation
You can add Tesseract.js to your project using npm:
npm install tesseract.js
Or include it via CDN:
<script src="https://unpkg.com/tesseract.js@v6.0.0/dist/tesseract.min.js"></script>
Note: Tesseract.js automatically loads the necessary WASM files. Make sure your server supports the correct MIME types for WASM.
Basic example: recognizing text from an image
Below is a simple example demonstrating how to perform OCR on an image:
<input type="file" id="imageInput" accept="image/*" />
<div id="result"></div>
<script>
async function performOCR(file) {
try {
const worker = Tesseract.createWorker({
logger: (msg) => console.log('Worker progress:', msg),
})
await worker.load()
await worker.loadLanguage('eng')
await worker.initialize('eng')
const {
data: { text },
} = await worker.recognize(file)
await worker.terminate()
return text
} catch (error) {
console.error('OCR Error:', error)
throw error
}
}
document.getElementById('imageInput').addEventListener('change', async (e) => {
const file = e.target.files[0]
const resultElement = document.getElementById('result')
if (!file) return
if (!file.type.startsWith('image/')) {
resultElement.textContent = 'Please select an image file.'
return
}
resultElement.textContent = 'Processing...'
try {
const text = await performOCR(file)
resultElement.textContent = text
} catch (error) {
resultElement.textContent = `Error: ${error.message}`
}
})
</script>
Handling multiple languages
Tesseract.js supports various languages. Here’s how you can perform OCR on images containing text in multiple languages:
async function performMultilingualOCR(file, languages = ['eng', 'deu']) {
const worker = Tesseract.createWorker({
logger: (msg) => console.log('Worker progress:', msg),
})
await worker.load()
const languageString = languages.join('+')
await worker.loadLanguage(languageString)
await worker.initialize(languageString)
const {
data: { text },
} = await worker.recognize(file)
await worker.terminate()
return text
}
Performance optimization
Improving OCR performance and accuracy can be achieved with a few additional techniques.
Image preprocessing
Enhance image quality before OCR by applying filters or resizing the image:
function preprocessImage(file) {
return new Promise((resolve) => {
const img = new Image()
img.onload = () => {
const canvas = document.createElement('canvas')
const ctx = canvas.getContext('2d')
// Optimal size for OCR
const maxWidth = 1000
const scale = img.width > maxWidth ? maxWidth / img.width : 1
canvas.width = img.width * scale
canvas.height = img.height * scale
// Draw the scaled image
ctx.drawImage(img, 0, 0, canvas.width, canvas.height)
// Apply filters to enhance image clarity
ctx.filter = 'grayscale(100%) contrast(150%)'
ctx.drawImage(canvas, 0, 0)
canvas.toBlob(resolve, 'image/jpeg', 0.9)
}
img.src = URL.createObjectURL(file)
})
}
async function optimizedOCR(file) {
const processedImage = await preprocessImage(file)
return performOCR(processedImage)
}
Memory management
Efficiently managing the worker's lifecycle is crucial, especially when processing multiple images:
async function batchProcessImages(files) {
const worker = Tesseract.createWorker({
logger: (msg) => console.log('Worker progress:', msg),
})
await worker.load()
await worker.loadLanguage('eng')
await worker.initialize('eng')
const results = []
try {
for (const file of files) {
const {
data: { text },
} = await worker.recognize(file)
results.push(text)
}
} finally {
await worker.terminate()
}
return results
}
Error handling and validation
Robust error handling is essential for a smooth user experience. The example below adds file type and size validation along with proper error reporting:
async function validateAndPerformOCR(file) {
const MAX_SIZE = 5 * 1024 * 1024 // 5MB
const SUPPORTED_TYPES = ['image/jpeg', 'image/png', 'image/webp']
if (!SUPPORTED_TYPES.includes(file.type)) {
throw new Error('Unsupported file type. Please use JPEG, PNG, or WebP images.')
}
if (file.size > MAX_SIZE) {
throw new Error('File size exceeds 5MB limit.')
}
try {
const worker = Tesseract.createWorker({
logger: (msg) => console.log('Worker progress:', msg),
})
await worker.load()
await worker.loadLanguage('eng')
await worker.initialize('eng')
const {
data: { text },
} = await worker.recognize(file)
await worker.terminate()
return text
} catch (error) {
throw new Error(`OCR processing failed: ${error.message}`)
}
}
Security considerations and best practices
When implementing browser-based OCR, consider the following guidelines:
- Inform users that processing occurs locally to ensure data privacy.
- Validate file types and sizes to prevent unexpected behavior.
- Monitor memory usage and clean up worker instances appropriately.
- Consider progressive loading for large images to avoid blocking the UI.
- Provide clear, real-time feedback during processing.
- Handle errors gracefully with user-friendly messages.
Conclusion
Tesseract.js v6.0.0 provides a powerful solution for implementing OCR directly in web browsers with improved performance, robust memory management, and a simplified API. By following the best practices outlined in this guide, you can build efficient, secure, and user-friendly OCR applications that respect user privacy and deliver rapid results.
If you need a more advanced OCR solution with server-side processing and support for various document formats, consider checking out Transloadit's Document OCR service.