
Synthesize speech in documents
🤖/text/speak synthesizes speech in documents.
You can use the audio that we return in your application, or you can pass the audio down to other Robots to add a voice track to a video for example.
Another common use case is making your product accessible to people with a reading disability.
Usage example
Synthesize speech from uploaded text documents, using a female voice in American English:
{
"steps": {
"synthesized": {
"robot": "/text/speak",
"use": ":original",
"provider": "aws",
"voice": "female-1",
"target_language": "en-US"
}
}
}Parameters
output_metaRecord<string, boolean> | boolean | Array<string>Allows you to specify a set of metadata that is more expensive on CPU power to calculate, and thus is disabled by default to keep your Assemblies processing fast.
For images, you can add
"has_transparency": truein this object to extract if the image contains transparent parts and"dominant_colors": trueto extract an array of hexadecimal color codes from the image.For videos, you can add the
"colorspace: true"parameter to extract the colorspace of the output video.For audio, you can add
"mean_volume": trueto get a single value representing the mean average volume of the audio file.You can also set this to
falseto skip metadata extraction and speed up transcoding.resultboolean(default:false)Whether the results of this Step should be present in the Assembly Status JSON
queuebatchSetting the queue to 'batch', manually downgrades the priority of jobs for this step to avoid consuming Priority job slots for jobs that don't need zero queue waiting times
force_acceptboolean(default:false)Force a Robot to accept a file type it would have ignored.
By default, Robots ignore files they are not familiar with. 🤖/video/encode, for example, will happily ignore input images.
With the
force_acceptparameter set totrue, you can force Robots to accept all files thrown at them. This will typically lead to errors and should only be used for debugging or combatting edge cases.usestring | Array<string> | Array<object> | objectSpecifies which Step(s) to use as input.
- You can pick any names for Steps except
":original"(reserved for user uploads handled by Transloadit) - You can provide several Steps as input with arrays:
{ "use": [ ":original", "encoded", "resized" ] }
Tip
That's likely all you need to know about
use, but you can view Advanced use cases.- You can pick any names for Steps except
promptstring | nullWhich text to speak. You can also set this to
nulland supply an input text file.provider— requiredaws | gcp | replicate | fal | transloaditWhich AI provider to leverage.
Transloadit outsources this task and abstracts the interface so you can expect the same data structures, but different latencies and information being returned. Different cloud vendors have different areas they shine in, and we recommend to try out and see what yields the best results for your use case.
target_languagestring(default:"en-US")The written language of the document. This will also be the language of the spoken text.
The language should be specified in the BCP-47 format, such as
"en-GB","de-DE"or"fr-FR". Please consult the list of supported languages and voices.voicefemale-1 | female-2 | female-3 | female-child-1 | male-1 | male-child-1(default:"female-1")The gender to be used for voice synthesis. Please consult the list of supported languages and voices.
ssmlboolean(default:false)Supply Speech Synthesis Markup Language instead of raw text, in order to gain more control over how your text is voiced, including rests and pronounciations.