Input filtering and sanitization are critical aspects of web application security and data integrity. In PHP, several excellent open-source libraries simplify these tasks, providing robust solutions to common security threats like cross-site scripting (XSS) and malicious file uploads. Let's explore some powerful, actively maintained libraries that can enhance your PHP projects with effective filtering capabilities.

Why input filtering matters

Filtering ensures that only safe and expected data enters your application. Without proper filtering and sanitization, your application could be vulnerable to attacks such as XSS, SQL injection, or processing harmful file uploads. Effective filtering helps maintain data integrity, protects your users, and prevents security breaches.

HTML Purifier for secure HTML content

When dealing with user-submitted HTML (like comments or rich text editor content), HTML Purifier is an essential tool. It's specifically designed to sanitize HTML content, ensuring it's standards-compliant and safe from XSS attacks by removing malicious code. The library is actively maintained, supports modern PHP versions (including PHP 8.x), and is highly configurable.

Installing HTML Purifier

The recommended way to install HTML Purifier is via Composer:

composer require ezyang/htmlpurifier

Using HTML Purifier

Here's a basic example of how to sanitize HTML input:

<?php
require_once 'vendor/autoload.php';

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);

$dirty_html = '<script>alert(\"XSS Attack!\");</script><p style=\"color: blue;\" onclick=\"alert(\'another attack\')\">This is safe content.</p>';
$clean_html = $purifier->purify($dirty_html);

echo $clean_html; // Outputs: <p>This is safe content.</p>
// Note: The script tag and onclick attribute are removed. The style attribute might be removed depending on default config.
?>

Advanced configuration

HTML Purifier offers extensive configuration options to tailor the filtering rules to your specific needs:

<?php
require_once 'vendor/autoload.php';

$config = HTMLPurifier_Config::createDefault();

// Allow only specific HTML elements and attributes
$config->set('HTML.Allowed', 'p,b,i,em,strong,a[href|title],ul,ol,li,br');

// Ensure links open in a new tab and add rel=\"noopener noreferrer\"
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true); // Adds rel=\"nofollow\"
$config->set('HTML.TargetNoreferrer', true); // Adds rel=\"noreferrer\"
$config->set('HTML.TargetNoopener', true); // Adds rel=\"noopener\"


// Allow specific CSS properties (e.g., text-align)
$config->set('CSS.AllowedProperties', 'text-align');

// Create custom definitions if needed (advanced)
// $def = $config->getHTMLDefinition(true);
// $def->addAttribute('a', 'data-custom', 'Text'); // Example: Allow a custom data attribute

$purifier = new HTMLPurifier($config);

$dirty_html = '<a href=\"http://example.com\" onclick=\"badJs()\" title=\"Example\">Click Me</a><p style=\"text-align:center; color:red;\">Centered text</p>';
$clean_html = $purifier->purify($dirty_html);

// Outputs something like:
// <a href=\"http://example.com\" title=\"Example\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Click Me</a><p style=\"text-align:center;\">Centered text</p>
echo $clean_html;
?>

Modern validation libraries for PHP

For general data validation and filtering beyond HTML content (like validating form inputs, API parameters, or file uploads), PHP offers several modern, actively maintained libraries.

Respect\Validation

Respect\Validation is a popular and powerful validation library known for its fluent, chainable interface, making validation rules easy to read and write.

Installation

composer require respect/validation

Basic usage

<?php
require_once 'vendor/autoload.php';
use Respect\\Validation\\Validator as v;
use Respect\\Validation\\Exceptions\\NestedValidationException;

// Basic string validation
$username = 'johndoe123';
try {
    v::alnum()->noWhitespace()->length(3, 15)->assert($username);
    echo \"Username is valid.\\n\";
} catch (NestedValidationException $exception) {
    echo \"Username validation failed: \" . $exception->getFullMessage() . \"\\n\";
}

// Email validation
$email = 'invalid-email';
if (v::email()->validate($email)) {
    echo \"Email is valid.\\n\";
} else {
    echo \"Email is invalid.\\n\";
}

// Numeric validation with range
$age = 17;
if (v::numericVal()->positive()->between(18, 99)->validate($age)) {
    echo \"Age is valid.\\n\";
} else {
    echo \"Age is invalid (must be between 18 and 99).\\n\";
}

// Basic file property validation (checks if path exists and is a file)
$filePath = '/path/to/your/file.txt'; // Replace with an actual path for testing
if (v::file()->validate($filePath)) {
    echo \"File path points to a file.\\n\";
} else {
    echo \"File path is not a valid file.\\n\";
}

// More specific file validation (e.g., check extension, mimetype, size)
// Note: These often require checking properties from $_FILES in a web context
$allowedExtensions = ['jpg', 'png', 'gif'];
$fileExtension = 'jpg';
if (v::in($allowedExtensions)->validate($fileExtension)) {
    echo \"File extension is allowed.\\n\";
}
?>

Filtering arrays of data

Respect\Validation excels at validating structured data like arrays (e.g., $_POST data).

<?php
require_once 'vendor/autoload.php';
use Respect\\Validation\\Validator as v;
use Respect\\Validation\\Exceptions\\NestedValidationException;

$userData = [
    'username' => 'john_doe',
    'email' => 'john@example.com',
    'age' => 28,
    'homepage' => 'invalid-url'
];

$userValidator = v::key('username', v::stringType()->length(3, 32))
                  ->key('email', v::email())
                  ->key('age', v::numericVal()->between(18, 99))
                  ->keyNested('homepage', v::url(), false); // 'false' makes homepage optional

try {
    $userValidator->assert($userData);
    echo \"User data is valid!\\n\";
} catch (NestedValidationException $exception) {
    echo \"User data validation failed:\\n\";
    // Get specific error messages
    print_r($exception->getMessages());
    /* Example Output:
    Array
    (
        [homepage] => \"invalid-url\" must be a valid URL
    )
    */
}
?>

Symfony Validator component

The Symfony Validator component provides a robust and flexible validation framework, particularly well-suited for object-oriented applications and those already using the Symfony ecosystem. It supports validation using attributes (PHP 8+), annotations, YAML, or XML.

Installation

composer require symfony/validator symfony/property-access symfony/property-info
# Property-access and property-info are often needed for attribute/annotation mapping

Basic usage with attributes (PHP 8+)

<?php
require_once 'vendor/autoload.php';

use Symfony\\Component\\Validator\\Constraints as Assert;
use Symfony\\Component\\Validator\\Validation;

class User
{
    #[Assert\\NotBlank(message: \"Username cannot be blank.\")]
    #[Assert\\Length(min: 3, max: 32, minMessage: \"Username must be at least  characters long.\")]
    private string $username = '';

    #[Assert\\NotBlank]
    #[Assert\\Email(message: \"The email '' is not a valid email.\")]
    private string $email = '';

    #[Assert\\Range(min: 18, max: 99, notInRangeMessage: \"Age must be between  and .\")]
    private ?int $age = null; // Use nullable type for optional fields

    // --- Getters and Setters ---
    public function getUsername(): string { return $this->username; }
    public function setUsername(string $username): void { $this->username = $username; }
    public function getEmail(): string { return $this->email; }
    public function setEmail(string $email): void { $this->email = $email; }
    public function getAge(): ?int { return $this->age; }
    public function setAge(?int $age): void { $this->age = $age; }
}

// Create validator instance
$validator = Validation::createValidatorBuilder()
    ->enableAttributeMapping()
    ->getValidator();

$user = new User();
$user->setUsername('jo'); // Too short
$user->setEmail('invalid-email');
$user->setAge(15); // Too young

$violations = $validator->validate($user);

if (count($violations) > 0) {
    echo \"Validation failed:\\n\";
    foreach ($violations as $violation) {
        echo \"- Property '{$violation->getPropertyPath()}': {$violation->getMessage()}\\n\";
    }
} else {
    echo \"User object is valid!\\n\";
}
?>

Validating file uploads

Symfony Validator can also validate uploaded files (typically represented as Symfony\\Component\\HttpFoundation\\File\\UploadedFile objects when used with the framework, but constraints can be applied to file paths or SplFileInfo objects too).

<?php
require_once 'vendor/autoload.php';

use Symfony\\Component\\Validator\\Constraints as Assert;
use Symfony\\Component\\Validator\\Validation;
use Symfony\\Component\\HttpFoundation\\File\\UploadedFile; // Example usage context

// Assume $file is an instance of UploadedFile from a request
// For standalone usage, you might validate SplFileInfo objects or paths directly
class Document
{
    #[Assert\\File(
        maxSize: '5M', // 5 Megabytes
        mimeTypes: ['application/pdf', 'application/x-pdf'],
        mimeTypesMessage: 'Please upload a valid PDF document (max 5MB).'
    )]
    // In a real app, this would likely be an UploadedFile object or SplFileInfo
    public $file;
}

$validator = Validation::createValidatorBuilder()
    ->enableAttributeMapping()
    ->getValidator();

$doc = new Document();
// Simulate an invalid file (e.g., wrong type or too large)
// In a real scenario, you'd pass the actual UploadedFile object or SplFileInfo
// For demonstration, let's validate a path to a non-PDF file:
$doc->file = new \\SplFileInfo(__FILE__); // Using this script file as an example

$violations = $validator->validate($doc);

if (count($violations) > 0) {
    echo \"File validation failed:\\n\";
    foreach ($violations as $violation) {
        echo \"- Property '{$violation->getPropertyPath()}': {$violation->getMessage()}\\n\";
    }
} else {
    echo \"File is valid!\\n\";
}
?>

When to use each library

  • HTML Purifier: The go-to choice specifically for sanitizing user-generated HTML content to prevent XSS attacks. Use it for processing input from rich text editors, comments, or any field where users can submit HTML.
  • Respect\Validation: Excellent for general-purpose input validation (strings, numbers, emails, arrays, etc.) with a clear, fluent API. Great for validating form data, API request parameters, and basic file properties.
  • Symfony Validator: Ideal for applications built with an object-oriented approach, especially those using the Symfony framework or wanting robust validation integrated with objects and attributes/annotations. It offers powerful features for complex validation scenarios, including validation groups and custom constraints.

Practical use cases

Secure file upload validation (using Respect\Validation)

This example demonstrates validating properties of an uploaded file from the $_FILES superglobal.

<?php
require_once 'vendor/autoload.php';
use Respect\\Validation\\Validator as v;
use Respect\\Validation\\Exceptions\\NestedValidationException;

function validateUploadedFile(array $file): bool {
    // Basic checks for upload errors and existence
    if ($file['error'] !== UPLOAD_ERR_OK || !is_uploaded_file($file['tmp_name'])) {
        error_log(\"File upload error code: \" . $file['error']);
        return false;
    }

    $validator = v::keySet(
        // v::key('name', v::stringType()->notEmpty()), // Original filename (less critical for security)
        v::key('type', v::in(['image/jpeg', 'image/png', 'image/gif']), true), // Check MIME type (mandatory) - Note: This can be spoofed!
        v::key('size', v::intVal()->positive()->lessThan(5 * 1024 * 1024), true), // 5MB max size (mandatory)
        v::key('tmp_name', v::file()->readable(), true) // Check if tmp file exists and is readable (mandatory)
    );

    try {
        $validator->assert($file);
        // IMPORTANT: Add server-side MIME type check using fileinfo for better security
        $finfo = finfo_open(FILEINFO_MIME_TYPE);
        if (!$finfo) {
             error_log(\"Failed to open fileinfo database\");
             return false; // Or handle error appropriately
        }
        $mime = finfo_file($finfo, $file['tmp_name']);
        finfo_close($finfo);

        if (!in_array($mime, ['image/jpeg', 'image/png', 'image/gif'], true)) {
            error_log(\"Invalid file type detected by finfo: \" . $mime);
            return false;
        }
        return true;
    } catch (NestedValidationException $exception) {
        error_log(\"Invalid file properties: \" . $exception->getFullMessage());
        return false;
    } catch (\\Exception $e) { // Catch other potential errors
        error_log(\"File validation failed: \" . $e->getMessage());
        return false;
    }
}

// --- Example Usage ---
// Simulate a POST request with a file upload
// In a real script, you'd use $_FILES directly.
/*
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['upload'])) {
    if (validateUploadedFile($_FILES['upload'])) {
        // Securely move the file (e.g., generate a unique name)
        $safe_filename = bin2hex(random_bytes(16)) . '.' . pathinfo($_FILES['upload']['name'], PATHINFO_EXTENSION);
        $destination = 'uploads/' . $safe_filename; // Ensure 'uploads' directory exists and is writable
        if (move_uploaded_file($_FILES['upload']['tmp_name'], $destination)) {
            echo \"File uploaded successfully to: \" . htmlspecialchars($destination);
        } else {
            error_log(\"Failed to move uploaded file '{$_FILES['upload']['tmp_name']}' to '$destination'\");
            echo \"Error processing file upload.\";
        }
    } else {
        echo \"Invalid file upload detected.\";
    }
} else {
     // Handle cases where the form wasn't submitted correctly or file wasn't uploaded
     // echo \"No file uploaded or invalid request.\";
}
*/
?>

User input sanitization for database storage

Combine validation with sanitization before storing data.

<?php
require_once 'vendor/autoload.php';
use Respect\\Validation\\Validator as v;
use Respect\\Validation\\Exceptions\\NestedValidationException;

// Setup HTML Purifier
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,b,i,em,strong,br'); // Allow only basic formatting
$purifier = new HTMLPurifier($config);

// Simulate POST data
$postData = [
    'username' => ' test_user ', // Contains extra whitespace
    'email' => 'invalid email',
    'bio' => '<script>alert(\"bad\")</script>This is a <strong>bio</strong> with <a href=\"#\">link</a>.'
];

// Define validation rules
$validator = v::keySet(
    v::key('username', v::stringType()->alnum('_-')->noWhitespace()->length(3, 20)),
    v::key('email', v::email()),
    v::key('bio', v::stringType()->length(0, 500)) // Validate length before sanitizing
);

try {
    // 1. Validate the raw input
    $validator->assert($postData);

    // 2. Sanitize/Normalize data after validation passes
    $validatedData = [
        'username' => trim($postData['username']), // Trim whitespace
        'email' => $postData['email'], // Already validated format
        'bio' => $purifier->purify($postData['bio']) // Sanitize HTML
    ];

    // 3. Now $validatedData is safe to store in the database
    echo \"User data validated and sanitized successfully!\\n\";
    print_r($validatedData);
    // Example: $db->insert('users', $validatedData);

} catch (NestedValidationException $exception) {
    echo \"Invalid user data:\\n\";
    // Log the detailed errors for debugging, show generic message to user
    error_log(\"Validation errors: \" . $exception->getFullMessage());
    print_r($exception->getMessages()); // Show specific messages for development/debugging
}
?>

Best practices for filtering and sanitization in PHP

  1. Defense in Depth: Combine multiple layers of validation (client-side for UX, server-side for security) and sanitization.
  2. Validate Everything: Treat all external input (POST, GET, headers, cookies, file uploads, API data) as untrusted.
  3. Server-Side is Crucial: Never rely solely on client-side validation; it can be easily bypassed.
  4. Whitelist, Don't Blacklist: Specify exactly what is allowed (e.g., allowed characters, allowed HTML tags) rather than trying to list everything that isn't.
  5. Contextual Output Escaping: Always escape data appropriately for the context where it will be displayed (HTML, JavaScript, SQL, etc.) to prevent XSS and other injection attacks. Use functions like htmlspecialchars() for HTML context. Use prepared statements for SQL.
  6. Use Established Libraries: Leverage well-maintained libraries like HTML Purifier, Respect\Validation, or Symfony Validator instead of rolling your own filtering logic.
  7. Keep Libraries Updated: Regularly update your dependencies (composer update) to patch security vulnerabilities.
  8. Use Type Hints and Strict Typing: Enable PHP's strict typing (declare(strict_types=1);) and use type hints for function arguments and return types to catch errors early.
  9. Handle Errors Gracefully: Provide clear error messages to users upon validation failure but avoid exposing sensitive system details. Log detailed errors for developers.
  10. Log Validation Failures: Monitor logs for repeated validation failures, which could indicate malicious activity or application bugs.
  11. Secure File Uploads: Beyond basic validation, check file types using server-side tools (like finfo), store uploaded files outside the webroot if possible, use non-predictable filenames, and set appropriate permissions.

Transloadit's file filtering

For complex file processing workflows, including robust filtering based on metadata before processing begins, a cloud-based service can be beneficial. Transloadit offers powerful file filtering capabilities through its 🤖 /file/filter Robot. You can define conditions based on file properties like MIME type, size, dimensions, and more within an Assembly.

Here's an example of filtering conditions within a Transloadit Assembly Step:

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "filter_images": {
      "use": ":original",
      "robot": "/file/filter",
      "accepts": [
        ["${file.mime}", "regex", "image"],
        ["${file.meta.width}", ">=", 100]
      ],
      "declines": [["${file.size}", ">", 10485760]]
    }
  }
}

This filter accepts only image files that are at least 100 pixels wide and declines any file larger than 10MB (10 _ 1024 _ 1024 bytes).

You can even use JavaScript-based conditions for more complex logic:

// Example condition string used in 'accepts' or 'declines' array
'${file.meta.width > file.meta.height && file.size < 500000}'

This condition would match files that are wider than they are tall and smaller than 500,000 bytes.

Explore our PHP SDK to easily integrate Transloadit's file processing and filtering capabilities into your PHP applications.