Recognize text in images
🤖/image/ocr recognizes text in images and returns it in a machine-readable format.
With this Robot you can detect and extract text from images using optical character recognition (OCR).
For example, you can use the results to obtain the content of traffic signs, name tags, package labels and many more. You can also pass the text down to other Robots to filter images that contain (or do not contain) certain phrases. For images of dense documents, results may vary and be less accurate than for small pieces of text in photos.
Warning: Transloadit aims to be deterministic, but this Robot uses third-party AI services. The providers (AWS, GCP) will evolve their models over time, giving different responses for the same input images. Avoid relying on exact responses in your tests and application.
Usage example
Recognize text in an uploaded image and save it to a text file:
{
"steps": {
"recognized": {
"robot": "/image/ocr",
"use": ":original",
"provider": "gcp",
"format": "text"
}
}
}
Parameters
-
use
String / Array of Strings / Object requiredSpecifies which Step(s) to use as input.
-
You can pick any names for Steps except
":original"
(reserved for user uploads handled by Transloadit) -
You can provide several Steps as input with arrays:
"use": [ ":original", "encoded", "resized" ]
💡 That’s likely all you need to know about
use
, but you can view Advanced use cases. -
-
provider
StringrequiredWhich AI provider to leverage. Valid values are
"aws"
and"gcp"
.Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case. AWS supports detection for the following languages: English, Arabic, Russian, German, French, Italian, Portuguese and Spanish. GCP allows for a wider range of languages, with varying levels of support which can be found on the official documentation.
-
granularity
String ⋅ default:"full"
Whether to return a full response including coordinates for the text (
"full"
), or a flat list of the extracted phrases ("list"
). This parameter has no effect if theformat
parameter is set to"text"
. -
format
String ⋅ default:"json"
In what format to return the extracted text.
"json"
returns a JSON file."meta"
does not return a file, but stores the data inside Transloadit's file object (under${file.meta.recognized_text}
, which is an array of strings) that's passed around between encoding Steps, so that you can use the values to burn the data into videos, filter on them, etc."text"
returns the recognized text as a plain UTF-8 encoded text file.
Demos
Related blog posts
- Introducing the OCR Robot for easy text extraction August 26, 2021
- Celebrating transloadit’s 2021 milestones and progress January 31, 2022
- Introducing text extraction from PDFs with AI Robot November 9, 2023