Tech preview: new AI Robots for enhanced media processing
We're happy to share that we are launching a new line of AI bots in Tech Preview. We've had one AI Robot in production for some time: /image/facedetect. It's powered by internal software and we've been pleased with its performance. Today's announced bots, however, are powered by external cloud services.
It's hard to miss the AI advancements that the The Big Five are making. With access to virtually unlimited data, models can be trained to achieve unparalleled accuracy. We felt that offering these AI capabilites right inside our encoding pipelines could add tremendous value to customers seeking to further automate their media processing.
We've tested and mapped out AI offerings by the Google Cloud Platform (GCP) and Amazon Web Services (AWS), and started drawing Venn diagrams to pinpoint overlapping functionality.
Our idea was to offer an abstraction over the lowest common denominator. In other words: with a
single API, our customers can plug in, say, image recognition of either provider, and get back
uniform responses. It would be just a matter of specifying either provider: 'aws'
or
provider: 'gcp'
to switch from one provider to the next.
Why let Transloadit wrap this?
It goes without saying that Transloadit cannot beat or even meet the pricing of the AI providers themselves, so if you need to process massive amounts of data, consider integrating with GCP/AWS directly.
However, if your use case doesn't revolve around squeezing every last penny out of every last byte, there are four reasons why our customers may want to use these AI services in conjunction with Transloadit:
1. LEGO-like composability
You can drop AI in existing encoding pipelines, mixing and matching with 73 features (Robots) to create workflows unique to your business. All of this without writing imperative boilerplate code to string it all together, which would result in more moving parts and points of failure.
Transloadit offers an integral solution that can be wielded with a single deterministic JSON recipe. With twenty lines of declarative instructions, you could order Transloadit to pass a video through these Robots:
- /speech/transcribe: turn the video into human-readable text
- /text/translate: translate the text into Japanese
- /text/speak: synthesize the Japanese into spoken language
- /video/merge: merge the new spoken Japanese as an audio track over the original video
Essentially, you have now made Transloadit translate a video automatically 😄 It's probably not ready for prime time, but this does illustrate how powerful our Assembly line can be. For code samples around our declarative composability, check further down.
2. Easily compare and switch between AI providers
The vendors use different notations for languages (when translating), they structure their responses differently and they have different docs, SDKs, formats, settings, etc.
Transloadit abstracts all of this and accepts uniform input, delivering uniform output — no matter the provider.
After having used Amazon, you can see how Google describes the same image without changing anything
but the provider
parameter. This way, you can easily compare results and latencies in your app, to
see what bests suits your use case. And that could change of course. These AIs are constantly
learning and improving for the majority of cases, but if one of your own customers has an unlucky
minority case, you could offer to switch in a heartbeat.
3. Possibly cut down on vendors
If you are either:
- already using Transloadit for your media processing
- in the market for an AI feature but would also like to augment that with automated image optimization, encoding, or leverage any of our other 73 features
.. this saves you the hassle of integrating with yet another provider. We already indicated the engineering costs associated with many moving parts, but there is also different billing to consider, SLA agreements to monitor, and support desks to deal with.
4. Hassle-free 💆♀️
We automatically sanitize and cleanup inputs. For instance, while AWS will accept any audio file to transcribe, depending on settings, Google will want it in the PCM format with signed 16-bit, 1-channel, little-endian encoding. With Transloadit, you just throw any audio (or video!) at us, and we'll make sure it gets converted to whatever way the AI provider you picked, likes it.
Features we are launching today
Today, we are launching two Robots in Tech Preview:
Our /image/describe Robot. Input an image and get
back a list of objects that were detected: Tree, Car, House, etc. We can return it as a text file,
JSON file, or pass it to another Robot for processing. Common use cases include
automatically flagging (in)appropriate content, providing alt
captions for images, and/or making
images searchable.
Our /speech/transcribe Robot. Input an audio or video recording and get back human-readable text. We can return it as a text file, JSON file, or pass it to another Robot for processing. Common use cases include automated subtitling, or making audio/video searchable.
We're launching them in conjunction with an upgrade to:
Our /file/filter Robot. Pass it a file and criteria,
and this Robot acts as a gatekeeper, optionally passing files through to another
Step, like exporting. We changed it so that it now also takes an includes
operator.
With the newly added includes
operator, you can now start automatically rejecting (or flagging)
undesired content like so:
"described": {
"use": ":original",
"robot": "/image/describe",
"provider": "aws",
"format": "meta",
"granularity": "list"
},
"filtered": {
"use": "described",
"robot": "/file/filter",
"declines": [
[ "${file.meta.descriptions}", "includes", [ "Naked", "Sex" ] ]
]
},
"exported": {
"use": "filtered",
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS"
}
Now, if I wanted to only allow pictures of cars for my used cars sales website, and I preferred Google's image recognition, I'd just change:
"declines"
to"accepts"
[ "Naked", "Sex" ]
to[ "Car", "Tires" ]
"provider": "aws"
to"provider": "gcp"
And that's it! ✨
We also have a full code sample featuring our /image/describe Robot further down, as well as links to demos for our /speech/transcribe and /image/facedetect Robots.
What AI features are planned?
Besides the two Robots launched today in Tech Preview, our Venn diagrams have showed us we should also build the following:
- /image/ocr: input an image, get back any human-readable text that it had on it, like name/traffic signs
- /document/ocr: input a PDF, get back any human-readable text that it had on it, so that documents can be made searchable if they aren't already
- /text/translate: input human-readable text and get it back in a different language
- /text/speak: input human-readable text and get back an audio file with a recording of synthesized speech
Missing something on this list? We're happy to take suggestions for more!
What about pricing?
We track input and output bytes passing through these Robots, and subtract that from your regular Transloadit plan — no need for any extra subscriptions. We do charge a minimum fee of 1MB per transaction: if you submit a 100KB image, and a 2KB text file is returned, even though that adds up to 102KB, we still subtract 1MB from your plan. On our Startup Plan (10GB for $49/mo) that would have costed $0.0049, on our Medium Business Plan $0.00166. More info on our Pricing page.
What about other providers like Microsoft Azure?
We feel there's enough value here to start offering this as Tech Preview today. Since we abstract the providers, we're not dependent on a single offering. So should GCP be shut down in 2023 (just kidding! we think!) or AWS raises prices on us, there are options. Integrations remain the same when switching providers. In fact, we are looking into adding Microsoft Azure into the mix as well.
And, just like we are powering our /image/facedetect AI
Robot ourselves, when it becomes feasible in the future to run high-quality transcription
AI ourselves, you may find we add a provider: 'transloadit'
to the
/speech/transcribe Robot, offered at a lower
price.
What does "tech preview" mean?
It means that you can start using this tech today! We might still make changes to the API and pricing (but we do not foresee them outside of adding more features).
How do I get started?
After signing up, pick a programming language of choice, and crank out an integration! Sounds hard? Let's look at a demo.
{
"steps": {
":original": {
"robot": "/upload/handle"
},
"described": {
"use": ":original",
"robot": "/image/describe",
"provider": "aws",
"format": "meta",
"granularity": "list",
"result": true
},
"filtered": {
"result": true,
"use": "described",
"robot": "/file/filter",
"accepts": [["${file.meta.descriptions}", "includes", "Building"]]
},
"exported": {
"use": "filtered",
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS",
"url_prefix": "https://demos.transloadit.com/"
}
}
}
# Prerequisites: brew install curl jq || sudo apt install curl jq
# To avoid tampering, use Signature Authentication
echo '{
"template_id": undefined,
"auth": {
"key": "YOUR_TRANSLOADIT_KEY"
},
"steps": {
":original": {
"robot": "/upload/handle"
},
"described": {
"use": ":original",
"robot": "/image/describe",
"provider": "aws",
"format": "meta",
"granularity": "list",
"result": true
},
"filtered": {
"result": true,
"use": "described",
"robot": "/file/filter",
"accepts": [["${file.meta.descriptions}", "includes", "Building"]]
},
"exported": {
"use": "filtered",
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS",
"url_prefix": "https://demos.transloadit.com/"
}
}
}' | curl \
--request POST \
--form 'params=<-' \
--form myfile1=@./prinsengracht.jpg \
--form myfile2=@./chameleon.jpg \
https://api2.transloadit.com/assemblies | jq
// Install via Swift Package Manager:
// dependencies: [
// .package(url: "https://github.com/transloadit/TransloaditKit", .upToNextMajor(from: "3.0.0"))
// ]
// Or via CocoaPods:
// pod 'Transloadit', '~> 3.0.0'
// Auth
let credentials = Credentials(key: "YOUR_TRANSLOADIT_KEY")
// Init
let transloadit = Transloadit(credentials: credentials, session: "URLSession.shared")
// Add files to upload
let filesToUpload: [URL] = ...
// Execute
let assembly = transloadit.assembly(steps: [_originalStep, describedStep, filteredStep, exportedStep], andUpload: filesToUpload) { result in
switch result {
case .success(let assembly):
print("Retrieved (assembly)")
case .failure(let error):
print("Assembly error (error)")
}
}.pollAssemblyStatus { result in
switch result {
case .success(let assemblyStatus):
print("Received assemblystatus (assemblyStatus)")
case .failure(let error):
print("Caught polling error (error)")
}
<!-- This pulls Uppy from our CDN -->
<!-- For smaller self-hosted bundles, install Uppy and plugins manually: -->
<!-- npm i --save @uppy/core @uppy/dashboard @uppy/remote-sources @uppy/transloadit ... -->
<link
href="https://releases.transloadit.com/uppy/v4.3.0/uppy.min.css"
rel="stylesheet"
/>
<button id="browse">Select Files</button>
<script type="module">
import {
Uppy,
Dashboard,
ImageEditor,
RemoteSources,
Transloadit,
} from 'https://releases.transloadit.com/uppy/v4.3.0/uppy.min.mjs'
const uppy = new Uppy()
.use(Transloadit, {
waitForEncoding: true,
alwaysRunAssembly: true,
assemblyOptions: {
params: {
// To avoid tampering, use Signature Authentication:
// https://transloadit.com/docs/topics/signature-authentication/
auth: {
key: 'YOUR_TRANSLOADIT_KEY',
},
// It's often better store encoding instructions in your account
// and use a template_id
instead of adding these steps inline
steps: {
':original': {
robot: '/upload/handle',
},
described: {
use: ':original',
robot: '/image/describe',
provider: 'aws',
format: 'meta',
granularity: 'list',
result: true,
},
filtered: {
result: true,
use: 'described',
robot: '/file/filter',
accepts: [['${file.meta.descriptions}', 'includes', 'Building']],
},
exported: {
use: 'filtered',
robot: '/s3/store',
credentials: 'YOUR_AWS_CREDENTIALS',
url_prefix: 'https://demos.transloadit.com/',
},
},
},
},
})
.use(Dashboard, { trigger: '#browse' })
.use(ImageEditor, { target: Dashboard })
.use(RemoteSources, {
companionUrl: 'https://api2.transloadit.com/companion',
})
.on('complete', ({ transloadit }) => {
// Due to waitForEncoding:true
this is fired after encoding is done.
// Alternatively, set waitForEncoding
to false
and provide a notify_url
console.log(transloadit) // Array of Assembly Statuses
transloadit.forEach((assembly) => {
console.log(assembly.results) // Array of all encoding results
})
})
.on('error', (error) => {
console.error(error)
})
</script>
// yarn add transloadit || npm i transloadit
// Import
const Transloadit = require('transloadit')
const main = async () => {
// Init
const transloadit = new Transloadit({
authKey: 'YOUR_TRANSLOADIT_KEY',
authSecret: 'MY_TRANSLOADIT_SECRET',
})
// Set Encoding Instructions
const options = {
files: {
myfile_1: './prinsengracht.jpg',
myfile_2: './chameleon.jpg',
},
params: {
steps: {
':original': {
robot: '/upload/handle',
},
described: {
use: ':original',
robot: '/image/describe',
provider: 'aws',
format: 'meta',
granularity: 'list',
result: true,
},
filtered: {
result: true,
use: 'described',
robot: '/file/filter',
accepts: [['${file.meta.descriptions}', 'includes', 'Building']],
},
exported: {
use: 'filtered',
robot: '/s3/store',
credentials: 'YOUR_AWS_CREDENTIALS',
url_prefix: 'https://demos.transloadit.com/',
},
},
},
}
// Execute
const result = await transloadit.createAssembly(options)
// Show results
console.log({ result })
}
main().catch(console.error)
# [sudo] npm install transloadify -g
# Auth
export TRANSLOADIT_KEY="YOUR_TRANSLOADIT_KEY"
# Save Encoding Instructions
echo '{
"steps": {
":original": {
"robot": "/upload/handle"
},
"described": {
"use": ":original",
"robot": "/image/describe",
"provider": "aws",
"format": "meta",
"granularity": "list",
"result": true
},
"filtered": {
"result": true,
"use": "described",
"robot": "/file/filter",
"accepts": [["${file.meta.descriptions}", "includes", "Building"]]
},
"exported": {
"use": "filtered",
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS",
"url_prefix": "https://demos.transloadit.com/"
}
}
}' > ./steps.json
# Execute
transloadify
--input "prinsengracht.jpg"
--input "chameleon.jpg"
--steps "./steps.json"
--output "./output.example"
// composer require transloadit/php-sdk
use transloadit\Transloadit;
$transloadit = new Transloadit([
"key" => "YOUR_TRANSLOADIT_KEY",
"secret" => "MY_TRANSLOADIT_SECRET",
]);
// Start the Assembly
$response = $transloadit->createAssembly([
"files" => ["prinsengracht.jpg", "chameleon.jpg"],
"params" => [
"steps" => [
":original" => [
"robot" => "/upload/handle",
],
"described" => [
"use" => ":original",
"robot" => "/image/describe",
"provider" => "aws",
"format" => "meta",
"granularity" => "list",
"result" => true,
],
"filtered" => [
"result" => true,
"use" => "described",
"robot" => "/file/filter",
"accepts" => [["${file.meta.descriptions}", "includes", "Building"]],
],
"exported" => [
"use" => "filtered",
"robot" => "/s3/store",
"credentials" => "YOUR_AWS_CREDENTIALS",
"url_prefix" => "https://demos.transloadit.com/",
],
],
],
]);
# gem install transloadit
# $ irb -rubygems
# >> require 'transloadit'
# => true
transloadit = Transloadit.new([
:key => "YOUR_TRANSLOADIT_KEY",
])
# Set Encoding Instructions
_original = transloadit.step(":original", "/upload/handle", {})
described = transloadit.step("described", "/image/describe", [
:use => ":original",
:provider => "aws",
:format => "meta",
:granularity => "list",
:result => true
])
filtered = transloadit.step("filtered", "/file/filter", [
:result => true,
:use => "described",
:accepts => [["${file.meta.descriptions}", "includes", "Building"]]
])
exported = transloadit.step("exported", "/s3/store", [
:use => "filtered",
:credentials => "YOUR_AWS_CREDENTIALS",
:url_prefix => "https://demos.transloadit.com/"
])
transloadit.assembly([
:steps => [_original, described, filtered, exported]
])
# Add files to upload
files = []
files.push("prinsengracht.jpg")
files.push("chameleon.jpg")
# Start the Assembly
response = assembly.create! *files
until response.finished?
sleep 1; response.reload!
end
if !response.error?
# handle success
end
# pip install pytransloadit
from transloadit import client
tl = client.Transloadit('YOUR_TRANSLOADIT_KEY', 'MY_TRANSLOADIT_SECRET')
assembly = tl.new_assembly()
# Set Encoding Instructions
assembly.add_step(":original", "/upload/handle", {})
assembly.add_step("described", "/image/describe", {
'use': ':original',
'provider': 'aws',
'format': 'meta',
'granularity': 'list',
'result': True
})
assembly.add_step("filtered", "/file/filter", {
'result': True,
'use': 'described',
'accepts': [['${file.meta.descriptions}', 'includes', 'Building']]
})
assembly.add_step("exported", "/s3/store", {
'use': 'filtered',
'credentials': 'YOUR_AWS_CREDENTIALS',
'url_prefix': 'https://demos.transloadit.com/'
})
# Add files to upload
assembly.add_file(open('prinsengracht.jpg', 'rb'))
assembly.add_file(open('chameleon.jpg', 'rb'))
# Start the Assembly
assembly_response = assembly.create(retries=5, wait=True)
print(assembly_response.data.get('assembly_ssl_url'))
# or:
print(assembly_response.data['assembly_ssl_url'])
// go get gopkg.in/transloadit/go-sdk.v1
package main
import (
"context"
"fmt"
"github.com/transloadit/go-sdk"
)
func main() {
// Create client
options := transloadit.DefaultConfig
options.AuthKey = "YOUR_TRANSLOADIT_KEY"
options.AuthSecret = "MY_TRANSLOADIT_SECRET"
client := transloadit.NewClient(options)
// Initialize new Assembly
assembly := transloadit.NewAssembly()
// Set Encoding Instructions
assembly.AddStep(":original", map[string]interface{}{
"robot": "/upload/handle",
})
assembly.AddStep("described", map[string]interface{}{
"use": ":original",
"robot": "/image/describe",
"provider": "aws",
"format": "meta",
"granularity": "list",
"result": true,
})
assembly.AddStep("filtered", map[string]interface{}{
"result": true,
"use": "described",
"robot": "/file/filter",
"accepts": [["${file.meta.descriptions}", "includes", "Building"]],
})
assembly.AddStep("exported", map[string]interface{}{
"use": "filtered",
"robot": "/s3/store",
"credentials": "YOUR_AWS_CREDENTIALS",
"url_prefix": "https://demos.transloadit.com/",
})
// Add files to upload
assembly.AddFile("prinsengracht.jpg"))
assembly.AddFile("chameleon.jpg"))
// Start the Assembly
info, err := client.StartAssembly(context.Background(), assembly)
if err != nil {
panic(err)
}
// All files have now been uploaded and the Assembly has started but no
// results are available yet since the conversion has not finished.
// WaitForAssembly provides functionality for polling until the Assembly
// has ended.
info, err = client.WaitForAssembly(context.Background(), info)
if err != nil {
panic(err)
}
fmt.Printf("You can check some results at: ")
fmt.Printf(" - %s\n", info.Results[":original"][0].SSLURL)
fmt.Printf(" - %s\n", info.Results["described"][0].SSLURL)
fmt.Printf(" - %s\n", info.Results["filtered"][0].SSLURL)
fmt.Printf(" - %s\n", info.Results["exported"][0].SSLURL)
}
// implementation 'com.transloadit.sdk:transloadit:1.0.0'
import com.transloadit.sdk.Assembly;
import com.transloadit.sdk.Transloadit;
import com.transloadit.sdk.exceptions.LocalOperationException;
import com.transloadit.sdk.exceptions.RequestException;
import com.transloadit.sdk.response.AssemblyResponse;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
// Initialize the Transloadit client
Transloadit transloadit = new Transloadit("YOUR_TRANSLOADIT_KEY", "MY_TRANSLOADIT_SECRET");
Assembly assembly = transloadit.newAssembly();
// Set Encoding Instructions
Map<String, Object> _originalStepOptions = new HashMap();
assembly.addStep(":original", "/upload/handle", _originalStepOptions);
Map<String, Object> describedStepOptions = new HashMap();
describedStepOptions.put("use", ":original");
describedStepOptions.put("provider", "aws");
describedStepOptions.put("format", "meta");
describedStepOptions.put("granularity", "list");
describedStepOptions.put("result", true);
assembly.addStep("described", "/image/describe", describedStepOptions);
Map<String, Object> filteredStepOptions = new HashMap();
filteredStepOptions.put("result", true);
filteredStepOptions.put("use", "described");
filteredStepOptions.put("accepts", new String[] { new String[] { "${file.meta.descriptions}", "includes", "Building" } });
assembly.addStep("filtered", "/file/filter", filteredStepOptions);
Map<String, Object> exportedStepOptions = new HashMap();
exportedStepOptions.put("use", "filtered");
exportedStepOptions.put("credentials", "YOUR_AWS_CREDENTIALS");
exportedStepOptions.put("url_prefix", "https://demos.transloadit.com/");
assembly.addStep("exported", "/s3/store", exportedStepOptions);
// Add files to upload
assembly.addFile(new File("prinsengracht.jpg"));
assembly.addFile(new File("chameleon.jpg"));
// Start the Assembly
try {
AssemblyResponse response = assembly.save();
// Wait for Assembly to finish executing
while (!response.isFinished()) {
response = transloadit.getAssemblyByUrl(response.getSslUrl());
}
System.out.println(response.getId());
System.out.println(response.getUrl());
System.out.println(response.json());
} catch (RequestException | LocalOperationException e) {
// Handle exception here
}
}
}
We're uploading two photos, one of which contains bridges:
When I check the meta.descriptions
of the results of the described
Step, I'll see:
[ 'Water',
'Outdoors',
'Bridge',
'Building',
'Canal',
'Castle',
'Architecture',
'Fort' ]
https://demos.transloadit.com/53/fae6219071430cb7b794cf9f3513c2/prinsengracht.jpg
Only photos with 'Bridge'
are allowed through, so the photo of our chameleon would not have been
saved on S3 in the proper location either.
In such cases you could choose to:
- gracefully ignore
- error out hard
- pipe unrecognized images to an export Step that uses a different directory, like
./flagged-for-review/
Here are more related AI demos with code samples for all major platforms:
Docs
Since these Robots remain in Tech Preview for now, we could still change the implementation, but we've already written preliminary documentation:
Have fun!
We're happy to expand this post, our docs, and how the bots work, based on your feedback. Just leave a comment below or on Twitter.