Extracting text from images is a common requirement in modern applications, whether it is for processing scanned documents, enhancing accessibility, or automating data entry. AWS Rekognition provides robust text detection capabilities that can be seamlessly integrated into your Node.js applications. For synchronous operations, the service supports JPEG and PNG formats with images up to 15MB in size and can detect up to 100 words per image.

Introduction to AWS Rekognition and text detection

AWS Rekognition is a powerful image and video analysis service that leverages deep learning to identify objects, scenes, text, and faces within images. Its text detection functionality is ideal for automating data entry, enhancing accessibility, and processing scanned documents. In this guide, you will learn how to integrate AWS Rekognition into your Node.js application for reliable text extraction.

Setting up AWS Rekognition

Before you begin, ensure that you set up your AWS credentials and have the necessary IAM permissions. Create an IAM user and attach a policy like the following to allow access to the rekognition:DetectText action:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["rekognition:DetectText"],
      "Resource": "*"
    }
  ]
}

Configure your AWS credentials using one of these methods:

  • Environment variables (set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION).
  • AWS credentials file (located at ~/.aws/credentials on Linux/Mac or C:\Users\YourUser\.aws\credentials on Windows).
  • Explicit configuration within your code.

Installing the AWS SDK for Node.js

Install the AWS SDK package for Rekognition using npm:

npm install @aws-sdk/client-rekognition

Integrating AWS SDK into a Node.js application

Import the necessary classes from the AWS SDK and configure the Rekognition client. You can rely on environment variables or pass explicit credentials if needed:

import { RekognitionClient, DetectTextCommand } from '@aws-sdk/client-rekognition'

const client = new RekognitionClient({
  region: process.env.AWS_REGION || 'us-west-2',
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID, // Optionally, replace with your explicit key
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY, // Optionally, replace with your explicit secret
  },
})

Performing text detection with AWS Rekognition

You can detect text from both local files and images stored in S3. Below is an example function that loads a local image file and returns detected text lines with their confidence scores and bounding box information:

import fs from 'fs'
import { RekognitionClient, DetectTextCommand } from '@aws-sdk/client-rekognition'

// Assumes 'client' is already configured as shown above
const detectTextFromImage = async (imagePath) => {
  const imageBytes = fs.readFileSync(imagePath)
  const params = {
    Image: { Bytes: imageBytes },
  }

  try {
    const data = await client.send(new DetectTextCommand(params))
    const textLines = data.TextDetections.filter((text) => text.Type === 'LINE').map((text) => ({
      text: text.DetectedText,
      confidence: text.Confidence,
      boundingBox: text.Geometry.BoundingBox,
    }))
    return textLines
  } catch (err) {
    if (err.name === 'InvalidImageFormatException') {
      throw new Error('Invalid image format. Only JPEG and PNG are supported.')
    } else if (err.name === 'ImageTooLargeException') {
      throw new Error('Image size exceeds the 15MB limit.')
    } else if (err.name === 'ThrottlingException') {
      throw new Error('API request rate exceeded. Please retry with backoff.')
    }
    throw err
  }
}

Analyzing images from S3

If your image is stored in an S3 bucket, you can specify the bucket and image key as follows:

import { RekognitionClient, DetectTextCommand } from '@aws-sdk/client-rekognition'

const detectTextFromS3Image = async (bucket, key) => {
  const params = {
    Image: {
      S3Object: {
        Bucket: bucket,
        Name: key,
      },
    },
  }

  try {
    const data = await client.send(new DetectTextCommand(params))
    const textLines = data.TextDetections.filter((text) => text.Type === 'LINE').map((text) => ({
      text: text.DetectedText,
      confidence: text.Confidence,
      boundingBox: text.Geometry.BoundingBox,
    }))
    return textLines
  } catch (err) {
    if (err.name === 'InvalidImageFormatException') {
      throw new Error('Invalid image format. Only JPEG and PNG are supported.')
    } else if (err.name === 'ImageTooLargeException') {
      throw new Error('Image exceeds the 15MB limit.')
    } else if (err.name === 'ThrottlingException') {
      throw new Error('API request rate exceeded. Please retry with backoff.')
    }
    throw err
  }
}

Practical example and code snippets

The following complete example processes a local image file. It verifies the existence of the file, sends the image to AWS Rekognition for text detection, and logs the detected text lines along with their confidence scores and positions.

import { RekognitionClient, DetectTextCommand } from '@aws-sdk/client-rekognition'
import fs from 'fs'

const client = new RekognitionClient({
  region: process.env.AWS_REGION || 'us-west-2',
})

const processImage = async (imagePath) => {
  if (!fs.existsSync(imagePath)) {
    throw new Error('Image file not found.')
  }

  const imageBytes = fs.readFileSync(imagePath)
  const params = {
    Image: { Bytes: imageBytes },
  }

  try {
    const data = await client.send(new DetectTextCommand(params))
    console.log('Detected text lines:')

    data.TextDetections.filter((text) => text.Type === 'LINE').forEach((text) => {
      console.log(`- Text: ${text.DetectedText}`)
      console.log(`  Confidence: ${text.Confidence.toFixed(2)}%`)
      console.log(
        `  Position: (${text.Geometry.BoundingBox.Left.toFixed(2)}, ${text.Geometry.BoundingBox.Top.toFixed(2)})`,
      )
    })

    return data.TextDetections
  } catch (err) {
    if (err.name === 'InvalidImageFormatException') {
      console.error('Error: Invalid image format. Only JPEG and PNG are supported.')
    } else if (err.name === 'ImageTooLargeException') {
      console.error('Error: Image size exceeds the 15MB limit.')
    } else if (err.name === 'ThrottlingException') {
      console.error('Error: API request rate exceeded. Please retry with backoff.')
    } else {
      console.error('Error detecting text:', err.message)
    }
    throw err
  }
}

// Usage example
processImage('path/to/your/image.jpg')
  .then((results) => {
    console.log(`Found ${results.length} text elements`)
  })
  .catch((err) => {
    console.error('Failed to process image:', err.message)
  })

AWS Rekognition service limits and pricing

When using AWS Rekognition for text detection, keep these key points in mind:

  • The maximum file size for synchronous operations is 15MB.
  • Supported image formats are JPEG and PNG.
  • The service can detect up to 100 words per image.
  • A free tier is available, including 1,000 images per month for the first 12 months.
  • After the free tier, pricing starts at $0.0010 per image.

For more details, refer to the AWS Rekognition Pricing page and the service limits documentation.

Conclusion

AWS Rekognition offers powerful text detection capabilities that can be easily integrated into your Node.js applications. By following the steps above, you can set up your AWS credentials, install the AWS SDK, and implement robust error handling for both local and S3-based images.

Interested in automating your image analysis workflow further? Check out how Transloadit's Image Describe Robot leverages similar technology to help you process images at scale.