Merge video, audio, images into one video
🤖/video/merge composes a new video by adding an audio track to existing still image(s) or video.
This Robot is able to generate a video from:
- An image and an audio file
- A video and an audio file
- Several images
Merging an audio file and an image
To merge an audio file and an image to create a video, pass both the audio file and the image to an
Assembly Step via the use
parameter. For this to work, just use the as-syntax:
{
"steps": {
"merged": {
"robot": "/video/merge",
"use": {
"steps": [
{ "name": ":original", "as": "audio" },
{ "name": ":original", "as": "image" }
],
"bundle_steps": true
},
"ffmpeg_stack": "v6.0.0"
}
}
}
Suppose youʼve uploaded both an image and an audio file using the same upload form. In the example
above, the system will correctly identify the files if you use the same Step name twice
(":original"
in this case). However, you can use any other valid Assembly Step name
instead of ":original"
.
If youʼre using multiple file input fields, you can tell Transloadit which field supplies the audio
file and which supplies the image. For instance, if you have two file input fields named the_image
and the_audio
, the following Assembly Instructions will make it work:
{
"steps": {
"merged": {
"robot": "/video/merge",
"use": {
"steps": [
{ "name": ":original", "fields": "the_audio", "as": "audio" },
{ "name": ":original", "fields": "the_image", "as": "image" }
],
"bundle_steps": true
},
"ffmpeg_stack": "v6.0.0"
}
}
}
Merging an audio file and a video
You can merge a video file without sound with an audio track. Just label the video as video
and
the audio file as audio
with the as
key in the JSON.
Suppose you use two file input fields in the same upload form — one for a video file and the other
for an audio file. Specify which field is for the video and which is for the audio by using the
name
attribute of each input field. Use the value of this attribute for the fields
key in the
JSON:
{
"steps": {
"merged": {
"robot": "/video/merge",
"use": {
"steps": [
{ "name": ":original", "fields": "the_video", "as": "video" },
{ "name": ":original", "fields": "the_audio", "as": "audio" }
],
"bundle_steps": true
},
"ffmpeg_stack": "v6.0.0"
}
}
}
You can also supply the video and audio file using other Assembly Steps, and leave out
the fields
attribute.
Warning: When merging audio and video files, it's recommended to set a target format & codecs
via preset
or via ffmpeg.codec:v
, ffmpeg.codec:a
, and ffmpeg.f
. Otherwise, merging will
default to backwards-compatible, but less desirable legacy codecs.
Merging several images to generate a video
Itʼs possible to create a video from images with Transloadit. Just label all images as image
using
the as
key in the JSON:
{
"steps": {
"merged": {
"robot": "/video/merge",
"use": {
"steps": [{ "name": ":original", "as": "image" }],
"bundle_steps": true
},
"framerate": "1/10",
"duration": 8.5,
"ffmpeg_stack": "v6.0.0"
}
}
}
This will work fine in a multi-file upload context. Files are sorted by their basename. So if you
name them 01.jpeg
and 02.jpeg
, they will be merged in the correct order.
You can also supply your images using other Assembly Steps of course, results from 🤖/image/resize Steps for example.
Parameters
-
use
String / Array of Strings / Object requiredSpecifies which Step(s) to use as input.
-
You can pick any names for Steps except
":original"
(reserved for user uploads handled by Transloadit) -
You can provide several Steps as input with arrays:
"use": [ ":original", "encoded", "resized" ]
💡 That’s likely all you need to know about
use
, but you can view Advanced use cases. -
-
output_meta
Object / Boolean ⋅ default:{}
Allows you to specify a set of metadata that is more expensive on CPU power to calculate, and thus is disabled by default to keep your Assemblies processing fast.
For images, you can add
"has_transparency": true
in this object to extract if the image contains transparent parts and"dominant_colors": true
to extract an array of hexadecimal color codes from the image.For videos, you can add the
"colorspace: true"
parameter to extract the colorspace of the output video.For audio, you can add
"mean_volume": true
to get a single value representing the mean average volume of the audio file.You can also set this to
false
to skip metadata extraction and speed up transcoding. -
preset
StringrequiredGenerates the video according to pre-configured video presets.
If you specify your own FFmpeg parameters using the Robot's
ffmpeg
parameter and you have not specified a preset, then the default"flash"
preset is not applied. This is to prevent you from having to override each of the flash preset's values manually. -
width
Integer(1
-1920
)requiredWidth of the new video, in pixels.
If the value is not specified and the
preset
parameter is available, thepreset
's supplied width will be implemented. -
height
Integer(1
-1080
)requiredHeight of the new video, in pixels.
If the value is not specified and the
preset
parameter is available, thepreset
's supplied height will be implemented. -
resize_strategy
String ⋅ default:"pad"
If the given width/height parameters are bigger than the input image's dimensions, then the
resize_strategy
determines how the image will be resized to match the provided width/height. See the available resize strategies. -
background
String ⋅ default:"#00000000"
requiredThe background color of the resulting video the
"rrggbbaa"
format (red, green, blue, alpha) when used with the"pad"
resize strategy. The default color is black. -
framerate
String ⋅ default:"1/5"
When merging images to generate a video this is the input framerate. A value of "1/5" means each image is given 5 seconds before the next frame appears (the inverse of a framerate of "5"). Likewise for "1/10", "1/20", etc. A value of "5" means there are 5 frames per second.
-
image_durations
Float[] ⋅ default:[]
When merging images to generate a video this allows you to define how long (in seconds) each image will be shown inside of the video. So if you pass 3 images and define
[2.4, 5.6, 9]
the first image will be shown for 2.4s, the second image for 5.6s and the last one for 9s. Theduration
parameter will automatically be set to the sum of the image_durations, so17
in our example. It can still be overwritten, though, in which case the last image will be shown until the defined duration is reached. -
duration
Float ⋅ default:5.0
When merging images to generate a video or when merging audio and video this is the desired target duration in seconds. The float value can take one decimal digit. If you want all images to be displayed exactly once, then you can set the duration according to this formula:
duration = numberOfImages / framerate
. This also works for the inverse framerate values like1/5
.If you set this value to
null
(default), then the duration of the input audio file will be used when merging images with an audio file.When merging audio files and video files, the duration of the longest video or audio file is used by default.
-
audio_delay
Float ⋅ default:0.0
When merging a video and an audio file, and when merging images and an audio file to generate a video, this is the desired delay in seconds for the audio file to start playing. Imagine you merge a video file without sound and an audio file, but you wish the audio to start playing after 5 seconds and not immediately, then this is the parameter to use.
-
replace_audio
Boolean ⋅ default:false
Determines whether the audio of the video should be replaced with a provided audio file.
-
vstack
Boolean ⋅ default:false
Stacks the input media vertically. All streams need to have the same pixel format and width - so consider using a /video/encode Step before using this parameter to enforce this.
-
ffmpeg
ObjectrequiredA parameter object to be passed to FFmpeg. If a preset is used, the options specified are merged on top of the ones from the preset. For available options, see the FFmpeg documentation. Options specified here take precedence over the preset options.
FFmpeg parameters
-
ffmpeg_stack
String ⋅ default:"v5.0.0"
Selects the FFmpeg stack version to use for encoding. These versions reflect real FFmpeg versions. We currently recommend to use
"v6.0.0"
.Supported values:
"v5.0.0"
,"v6.0.0"
.A full comparison of video presets, per stack, can be found here.
Demos
- Add an audio track to video footage
- Encode a zooming effect onto an image
- Merge audio into video at a specific time
- Take a scrolling screenshot of a website automatically (by using a URL)
Related blog posts
- Introducing video merge Robot: image & audio to video August 7, 2013
- A happy 2014 from Transloadit! January 14, 2014
- On upgrades & goodbyes August 8, 2014
- Kicking Transloadit into gear for the new year February 1, 2015
- Enhancing FFmpeg for superior encoding performance July 30, 2015
- Happy 2016 from Transloadit December 31, 2015
- New pricing model for future Transloadit customers February 7, 2018
- Mastering audio sync with Transloadit's audio delay March 12, 2019
- Tutorial: using /video/merge to develop video slideshows June 14, 2019
- No-code real-time video uploading with Bubble & Transloadit August 2, 2019
- Let's Build: video from album art with Transloadit October 10, 2021
- Automatically generate music previews from Spotify November 16, 2021
- Build a Reddit video subtitling bot with Transloadit February 10, 2022
- Let's Build: music card generator with Transloadit May 5, 2022
- Creating engaging audio visualizations with Transloadit April 2, 2023