July 25, 2022

How to automate content moderation using Transloadit (NSFW)

Joseph Grabski

Content Lead · Rochester, United Kingdom · Show bio ·

Security needs to be at the forefront of every developer's mind in the modern day and age. Unfortunately, not every user will have good intentions with their file uploads, and it's important to have measures in place to deal with these kind of files. Today, we're going to take a look at several approaches you can take to stop different types of malicious files from reaching your servers and end-users.

Photo from Michael Geiger on Unsplash

Why is content moderation important?

Content moderation is critical for any online business. Users don't want to be exposed to graphic content unwillingly, and advertisers don't want to be shown next to it. Effective content moderation protects both your brand and your users, yet it can be expensive to hire a full-time team of content moderators. Instead, you can leverage Transloadit to automatically spot and remove unwanted content, as well as viruses and copyrighted content. Let's find out how 👇

Automatically reject NSFW content

You can utilise our /image/describe Robot to generate a list of tags based on the image provided, and then filter out files that contain tags such as gore, hateful symbols or nudity.

You can check out the full list of tags provided by both AWS and GCP.

Now, lets see a template that utilises this:

{
  "steps": {
    ":original": {
      "robot": "/upload/handle",
      "result": true
    },
    "described": {
      "use": ":original",
      "robot": "/image/describe",
      "result": true,
      "explicit_descriptions": true,
      "format": "meta",
      "granularity": "list",
      "provider": "aws"
    },
    "filtered": {
      "use": "described",
      "robot": "/file/filter",
      "result": true,
      "declines": [
        ["${file.meta.descriptions}", "includes", "Hate Symbols"],
        ["${file.meta.descriptions}", "includes", "Explicit Nudity"],
        ["${file.meta.descriptions}", "includes", "Visually Disturbing"]
      ],
      "error_msg": "One file contains explicit content!",
      "error_on_decline": true
    }
  }
}

Here we're generating a list of descriptors for our image, and if our filter detects any of our top-level categories in this list, the file will be rejected and our Assembly stopped.

Stopping copyrighted uploads

Users may also try to upload images which they don't, or more importantly, you don't hold the copyright permission to. We can inspect the file's metadata and check for copyright information, if there is none then the file upload can proceed.

This Template relies on the image containing the copyright information in the metadata, which of course a user can remove. Best practice here would be security through obscurity. It's less likely a user will remove copyright information if they don't know that you're checking for it.

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "filter": {
      "use": ":original",
      "robot": "/file/filter",
      "result": true,
      "accepts": [["${file.meta.copyright}", "empty", ""]],
      "error_on_decline": false
    }
  }
}

Detecting malware

Naturally, the last thing you want your users uploading is malware. Whether it reaches other users, or your system critical machines, it's important that viruses get stopped as early as possible.

Our /file/virusscan Robot detects malware for you, and automatically cancels the Assembly - stopping dangerous files from ever reaching your systems.

However, you'll need a premium plan to use this Robot. If you're interested, you can signup for one below, and get access to a whole array of powerful Robots just like this one 💪

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "virus_scanned": {
      "use": ":original",
      "robot": "/file/virusscan",
      "result": true,
      "error_on_decline": true
    }
  }
}

Excluding known files

In the case that a user is uploading thousands of the same file, and for one reason or another, it's getting through your other layers of defence. As a last resort, you can simply take the file hash, and add it a denylist, and we'll automatically filter it out for you.

Here is a Template that aims to do exactly this.

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "hash_files": {
      "use": ":original",
      "robot": "/file/hash",
      "algorithm": "sha1"
    },
    "filter_files": {
      "use": "hash_files",
      "robot": "/file/filter",
      "result": true,
      "declines": [["${file.meta.hash}", "===", "cea335bf9254c7838ba5d764a5dae5d4dd1f6fe5"]]
    }
  }
}

In short, we generate a hash for each uploaded file. If it equals the hash of a file we're looking to avoid, the file will not pass the filter.

Wrapping up

As you can see, there's many different approaches you can take to stop unwanted file uploads. Whether you use only one method, or a combination of several, you can sleep easy at night knowing your systems are that much safer.

We're constantly looking to help make things more secure at Transloadit, so for your next read maybe take a look at our security page to see some of the steps we're taking to accomplish this.

#file-filtering-service #assembly #workflow #image-describe-robot #file-virusscan-robot #nsfw