Optical Character Recognition (OCR) unlocks text content within images and PDFs, enabling features like searchable documents, automated data entry, and content analysis. In this DevTip, we'll build a document OCR tool using GCP OCR and Node.js to efficiently extract text from images and PDFs in your applications.

Introduction

GCP OCR, powered by the Google Cloud Vision API, provides robust image analysis capabilities, including OCR for text extraction. Integrating this service into your Node.js application allows you to process images and PDFs programmatically and extract text data efficiently.

This guide walks you through setting up the Google Cloud Vision API, authenticating your application, and writing Node.js code to perform OCR on images and PDFs.

Prerequisites

Ensure you have the following:

Setting up Google cloud vision API

1. Create a gcp project

  1. Go to the Google Cloud Console.
  2. Click on the project dropdown and select New Project.
  3. Enter a project name and click Create.

2. Enable the vision API

  1. In the Cloud Console, navigate to APIs & Services > Library.
  2. Search for Cloud Vision API.
  3. Click on Cloud Vision API and then click Enable.

Authentication setup

  1. Create a service account key:

    • Go to IAM & Admin > Service Accounts
    • Create a new service account or select an existing one
    • Create a new key (JSON format)
    • Download and securely store the JSON key file
  2. Set up authentication in your application:

    const vision = require('@google-cloud/vision')
    
    const client = new vision.ImageAnnotatorClient({
      keyFilename: 'path/to/your/service-account-key.json',
    })
    

    Or use environment variables:

    export GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"
    

Installing the Google cloud vision client library

Initialize a new Node.js project and install the necessary library:

mkdir ocr-project
cd ocr-project
npm init -y
npm install @google-cloud/vision@4.3.2 # Latest stable version as of March 2024

Writing the Node.js code

Create an index.js file in your project directory:


// index.js

const vision = require('@google-cloud/vision');
const path = require('path');
const fs = require('fs');

// Creates a client
const client = new vision.ImageAnnotatorClient({
  keyFilename: path.join(__dirname, 'path/to/your/service-account-key.json'),
});

async function extractTextFromImage(imagePath) {
  try {
    const [result] = await client.textDetection(imagePath);
    const detections = result.textAnnotations;
    return detections.map(text => text.description);
  } catch (error) {
    console.error('Error during text detection:', error);
    if (error.code === 7) {
      console.error('API quota exceeded');
    }
    throw error;
  }
}

// Example usage
extractTextFromImage('images/sample.jpg')
  .then(text => console.log('Extracted text:', text))
  .catch(error => console.error('Error:', error));

Processing PDF files

To extract text from PDFs, use the asynchronous batch annotation feature with Google Cloud Storage:


async function extractTextFromPDF(gcsSourceUri, gcsDestinationUri) {
  const client = new vision.ImageAnnotatorClient();

  const inputConfig = {
    mimeType: 'application/pdf',
    gcsSource: {
      uri: gcsSourceUri
    }
  };

  const outputConfig = {
    gcsDestination: {
      uri: gcsDestinationUri
    },
    batchSize: 1
  };

  const features = [{ type: 'DOCUMENT_TEXT_DETECTION' }];
  const request = {
    requests: [{
      inputConfig,
      features,
      outputConfig,
    }]
  };

  const [operation] = await client.asyncBatchAnnotateFiles(request);
  const [filesResponse] = await operation.promise();

  return filesResponse;
}

API limitations and pricing

  • Free tier: 1,000 units per month
  • PDF processing limit: 2,000 pages per file
  • API pricing: $1.50 per 1,000 units after free tier
  • PDF processing: 5 pages = 1 unit

Regional configuration

For data residency requirements, you can specify regional endpoints:

const client = new vision.ImageAnnotatorClient({
  apiEndpoint: 'eu-vision.googleapis.com', // European Union
  // or 'us-vision.googleapis.com' // United States
})

Troubleshooting

Common issues and solutions

  1. Authentication Errors

    • Verify the service account key file path
    • Ensure the service account has proper permissions
    • Check if the API is enabled in your project
  2. PDF Processing Issues

    • Confirm the PDF file is under 2,000 pages
    • Verify GCS bucket permissions
    • Check PDF file format compatibility
  3. Rate Limiting

    • Implement exponential backoff for retries
    • Monitor quota usage in the GCP Console
    • Consider batch processing for large volumes

Best practices

  1. Error Handling

async function processDocument(filePath) {
  try {
    const result = await extractTextFromImage(filePath);
    return result;
  } catch (error) {
    if (error.code === 'ENOENT') {
      throw new Error('File not found');
    }
    if (error.code === 7) {
      throw new Error('API quota exceeded');
    }
    throw error;
  }
}

  1. Batch Processing

async function processBatch(files, concurrency = 3) {
  const results = [];
  for (let i = 0; i < files.length; i += concurrency) {
    const batch = files.slice(i, i + concurrency);
    const batchResults = await Promise.all(
      batch.map(file => processDocument(file).catch(error => ({ error })))
    );
    results.push(...batchResults);
    await new Promise(resolve => setTimeout(resolve, 1000)); // Rate limiting
  }
  return results;
}

Conclusion

Integrating GCP OCR into your Node.js applications enables powerful OCR capabilities. By automating text extraction from images and PDFs, you can enhance your application's functionality, streamline workflows, and provide more value to your users.

Transloadit also offers a Document OCR robot as part of our Artificial Intelligence service for seamless and scalable OCR processing.