A /video/merge tutorial

Recently, I showcased the /video/merge Robot by giving an example use case in which video and sound are merged together, aided by the audio_delay parameter. I've now decided to present a more in-depth tutorial, pairing the same Robot together with some of the other powerful Robots available at Transloadit.

This time, we will be covering another unique use case — creating a video slideshow comprised of multiple images and merging it together with an audio recording. I'll be going through every step from beginning to end, before presenting the final result.

The /video/merge Robot stitching two files together

Preparing our files

As usual, before creating our Assembly, we need to prepare the necessary files. As a practical example, I'll be turning sections of a PDF document into one coherent slideshow. Specifically, one of the machine learning cheatsheets provided by Stanford University. I want to combine the pages of the PDF together to create a video and then narrate over that video. Something a little educational for myself for an upcoming assignment 😉

After that, we will store the result in Dropbox, so it can be accessed anytime.

Taking a look at the Robots

Let's get down to business by taking a quick look at all of the Robots that will be of use to us in this Assembly:

Importing the recorded audio file will be done by the /upload/handle Robot, while the PDF file will be imported by the /http/import Robot. We then use the /document/thumbs Robot to generate images from our PDF file, after which the /video/merge Robot will put all the images together into one main video file. Finally, a well-used favorite of mine, the /dropbox/store Robot will let me store the final result!

Part 1 - Importing our files

Here's a look at the first two Steps of the Template I created. One for uploading and one for importing over HTTP:

"steps": {
  ":original": {
    "robot": "/upload/handle"
  },
  "document": {
    "robot": "/http/import",
    "url": "https://github.com/afshinea/stanford-cs-229-machine-learning/raw/master/en/cheatsheet-supervised-learning.pdf"
  }
}

First, we need my recorded audio file. I'm going to use the /upload/handle Robot, a simple Robot that will make it simpler for me to programmatically upload a file locally from my machine. I recommend reading our recent Re-loadit post for a deeper look into the Robot!

After we have the audio file, we can use the /http/import Robot to directly look at the link to the GitHub repository I mentioned above. The Robot will fetch the PDF file from the link and let us do whatever we want with it in any future Steps in the Assembly.

Part 2 - Generating images from PDF

Our third Step in the Assembly, after importing our files, is using the /document/thumbs Robot. This Step looks as follows in my Assembly Instructions:

"thumbnail": {
  "use": [
    "document"
  ],
  "robot": "/document/thumbs",
  "result": true,
  "resize_strategy": "fit"
}

Multiple images for each page will now be output by the Robot. I set the result parameter to true, so I can go to the Assemblies page and check if the Robot has split the images the way I want. The resize_strategy is set to 'fit' for this example, which means our images will keep their aspect ratio and be resized based on the larger side of the image.

Part 3 - Merging everything together

The fourth Step is where the /video/merge Robot starts to play an important role. We take the thumbnail and recording Steps of the Template and pass them to the Robot to use for merging.

"merged": {
  "use": {
    "steps": [
      {
        "name": ":original",
        "as": "audio"
      },
      {
        "name": "thumbnail",
        "as": "image"
      }
    ]
  },
  "robot": "/video/merge",
  "result": true,
  "duration": 40,
  "ffmpeg_stack": "v6.0.0",
  "framerate": "1/10",
  "preset": "webm-1080p",
  "resize_strategy": "fit"
}

Two Steps are used for the merging process. My audio file, known as :original from Part 1, and the set of images created from my PDF file - thumbnail from Part 2. The PDF I'm using has four pages total and it would be a good idea to display each image on screen for ten seconds at the very least - there's quite a bit of content on them.

Due to this, I set the duration parameter to 40 seconds and the framerate to 1/10, meaning I will have a new image display every ten seconds. WebM is a handy format that I recommend for its smaller size compared to MP4, so we'll set that here through the preset parameter. This should successfully merge our audio and images together into one nice video.

Part 4 - The final result

Finally, the fifth and final Step in my Assembly is to export the results to Dropbox. The exported parameter looks at the merged property, so it knows what file we want to use for exporting. My Template Credentials give me access to my Dropbox via my Assembly, after which the path parameter specifies the name of my resultant video and where I want it stored. Here, ${file.id} gives us a unique 32-character long ID, so there's no chance of any conflicts when I export my results.

"steps": {
  ":original": {
    "robot": "/upload/handle"
  },
  "document": {
    "robot": "/http/import",
    "url": "https://github.com/afshinea/stanford-cs-229-machine-learning/raw/master/en/cheatsheet-supervised-learning.pdf"
  },
  "thumbnail": {
    "use": [
      "document"
    ],
    "robot": "/document/thumbs",
    "result": true,
    "resize_strategy": "fit"
  },
  "merged": {
    "use": {
      "steps": [
        {
          "name": ":original",
          "as": "audio"
        },
        {
          "name": "thumbnail",
          "as": "image"
        }
      ],
      "bundle_steps": true
    },
    "robot": "/video/merge",
    "result": true,
    "duration": 40,
    "ffmpeg_stack": "v6.0.0",
    "framerate": "1/10",
    "preset": "webm-1080p",
    "resize_strategy": "fit"
  },
  "exported": {
    "use": [
      "merged"
    ],
    "robot": "/dropbox/store",
    "credentials": "my_videos",
    "path": "Videos/${file.id}.${file.ext}"
  }
}

The video file produced from this Template can be seen below.

Let's look back at everything this Assembly does. I upload an audio file and fetch a PDF. The PDF file is split into separate images for each page. The images and audio are mixed together into one WebM video with a set duration and framerate. Pretty complex scenarios all performed in a few lines!

Take a look at the docs for the /video/merge Robot yourself and get creative in combining it with the wide array of other features available here at Transloadit.