
Recognize objects in images
🤖/image/describe recognizes objects in images and returns them as English words.
As mentioned this Robot enables you to recognize objects on images.
You can use the labels that we return in your application to automatically classify images. You can also pass the labels down to other Robots to filter images that contain (or do not contain) certain content.
Warning: Transloadit aims to be deterministic, but this Robot uses third-party AI services. The providers (AWS, GCP) will evolve their models over time, giving different responses for the same input images. Avoid relying on exact responses in your tests and application.
Parameters
-
use
String / Array of Strings / ObjectrequiredSpecifies which Step(s) to use as input.
-
You can pick any names for Steps except
":original"
(reserved for user uploads handled by Transloadit) -
You can provide several Steps as input with arrays:
"use": [ ":original", "encoded", "resized" ]
💡 That’s likely all you need to know about
use
, but you can view advanced use cases:› Advanced use cases
-
Step bundling. Some Robots can gather several Step results for a single invocation. For example, 🤖/file/compress would normally create one archive for each file passed to it. If you'd set
bundle_steps
to true, however, it will create one archive containing all the result files from all Steps you give it. To enable bundling, provide an object like the one below to theuse
parameter:"use": { "steps": [ ":original", "encoded", "resized" ], "bundle_steps": true }
This is also a crucial parameter for 🤖/video/adaptive, otherwise you'll generate 1 playlist for each viewing quality.
Keep in mind that all input Steps must be present in your Template. If one of them is missing (for instance it is rejected by a filter), no result is generated because the Robot waits indefinitely for all input Steps to be finished.Here’s a demo that showcases Step bundling.
-
Group by original. Sticking with 🤖/file/compress example, you can set
group_by_original
totrue
, in order to create a separate archive for each of your uploaded or imported files, instead of creating one archive containing all originals (or one per resulting file). This is important for for 🤖/media/playlist where you'd typically set:"use": { "steps": [ "segmented" ], "bundle_steps": true, "group_by_original": true }
-
Fields. You can be more discriminatory by only using files that match a field name by setting the
fields
property. When this array is specified, the corresponding Step will only be executed for files submitted through one of the given field names, which correspond with the strings in thename
attribute of the HTML file input field tag for instance. When using a back-end SDK, it corresponds withmyFieldName1
in e.g.:$transloadit->addFile('myFieldName1', './chameleon.jpg')
.This parameter is set to
true
by default, meaning all fields are accepted.Example:
"use": { "steps": [ ":original" ], "fields": [ "myFieldName1" ] }
-
Use as. Sometimes Robots take several inputs. For instance, 🤖/video/merge can create a slideshow from audio and images. You can map different Steps to the appropriate inputs.
Example:
"use": { "steps": [ { "name": "audio_encoded", "as": "audio" }, { "name": "images_resized", "as": "image" } ] }
Sometimes the ordering is important, for instance, with our concat Robots. In these cases, you can add an index that starts at 1. You can also optionally filter by the multipart field name. Like in this example, where all files are coming from the same source (end-user uploads), but with different
<input>
names:Example:
"use": { "steps": [ { "name": ":original", "fields": "myFirstVideo", "as": "video_1" }, { "name": ":original", "fields": "mySecondVideo", "as": "video_2" }, { "name": ":original", "fields": "myThirdVideo", "as": "video_3" } ] }
For times when it is not apparent where we should put the file, you can use Assembly Variables to be specific. For instance, you may want to pass a text file to 🤖/image/resize to burn the text in an image, but you are burning multiple texts, so where do we put the text file? We specify it via
${use.text_1}
, to indicate the first text file that was passed.Example:
"watermarked": { "robot": "/image/resize", "use" : { "steps": [ { "name": "resized", "as": "base" }, { "name": "transcribed", "as": "text" }, ], }, "text": [ { "text" : "Hi there", "valign": "top", "align" : "left", }, { "text" : "From the 'transcribed' Step: ${use.text_1}", "valign" : "bottom", "align" : "right", "x_offset": 16, "y_offset": -10, } ] }
-
-
provider
StringrequiredWhich AI provider to leverage. Valid values are
"aws"
and"gcp"
.Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case.
-
granularity
String ⋅ default:"full"
Whether to return a flow blown response (
"full"
), or a flat list of descriptions ("list"
). -
format
String ⋅ default:"json"
In what format to return the descriptions.
"json"
returns a JSON file."meta"
does not return a file, but stores the data inside Transloadit's file object (under${file.meta.descriptions}
) that's passed around between encoding Steps, so that you can use the values to burn the data into videos, filter on them, etc.
-
explicit_descriptions
Boolean ⋅ default:false
Whether to return only explicit or only non-explicit descriptions of the provided image. Explicit descriptions include labels for nudity, violence etc. If set to
false
, only non-explicit descriptions (such as human or chair) will be returned. If set totrue
, only explicit descriptions will be returned.The possible descriptions depend on the chosen provider. The list of labels from AWS can be found in their documentation. GCP labels the image based on five categories, as described in their documentation.
Demos
- Automatically make a slideshow from recognized objects in an image
- Recognize and reject certain objects in images
- Recognize and reject nudity in images
Related blog posts
- 🧠 Tech Preview of our new AI bots February 17, 2020
- Introducing the OCR Robot August 26, 2021
- Transloadit Milestones of 2021 January 31, 2022
- Let's Build: An Image Alt-Text Generator May 9, 2022
- Block unwanted files July 25, 2022