Ensuring file integrity is crucial in software development, especially when automating deployments through CI/CD pipelines. One powerful tool for this purpose is b2sum, a hashing utility based on the BLAKE2 algorithm. Let's explore how you can leverage b2sum in your command-line workflows to enhance security and reliability.

Introduction to B2sum

b2sum is a command-line utility implementing the BLAKE2 hashing algorithm. Compared to traditional hashing algorithms like MD5 or SHA-1, BLAKE2 offers significant advantages:

  • Speed: Faster than MD5, SHA-1, SHA-2, and SHA-3 on 64-bit platforms.
  • Security: Provides security similar to SHA-3, including immunity to length extension attacks and indifferentiability from a random oracle.
  • Versatility: Supports both 256-bit (BLAKE2s) and 512-bit (BLAKE2b) variants.

Blake2 variants

BLAKE2 comes in two main variants, each optimized for different use cases:

  • BLAKE2b: Optimized for 64-bit platforms, producing hash values up to 512 bits. This is the variant implemented by the b2sum utility found in GNU coreutils, making it ideal for modern server environments and CI/CD pipelines.
  • BLAKE2s: Optimized for 32-bit platforms and constrained environments, producing hash values up to 256 bits.

Installation

b2sum comes pre-installed with GNU coreutils on most modern Linux distributions. To verify if it's available and check its version:

b2sum --version

If b2sum is not installed, you can typically install it as part of the coreutils package:

  • Ubuntu/Debian:

    sudo apt-get update
    sudo apt-get install coreutils
    
  • CentOS/RHEL:

    sudo yum install coreutils
    
  • macOS (using Homebrew):

    brew install coreutils
    

    Note: On macOS, coreutils commands are often prefixed with g (e.g., gb2sum) to avoid conflicts with native BSD utilities. You might need to adjust your PATH or use the prefixed command.

Integrating B2sum into ci/cd pipelines

Integrating b2sum into your CI/CD pipeline involves generating hashes for build artifacts and verifying them at deployment time. Here's a practical approach:

Step 1: Generate hashes

After building your artifacts (e.g., a compiled binary, a zipped archive), generate a BLAKE2b hash and save it to a file:

# Example: generate hash for my-app.tar.gz
b2sum my-app.tar.gz > my-app.tar.gz.b2

This command calculates the BLAKE2b hash of my-app.tar.gz and redirects the output (the hash and the filename) to my-app.tar.gz.b2. Store this .b2 file securely alongside your artifact, perhaps in an artifact repository or secure storage.

Step 2: Verify hashes

Before deploying or using the artifact, verify its integrity using the generated hash file:

# Example: verify the integrity of my-app.tar.gz using its hash file
b2sum -c my-app.tar.gz.b2

The -c (or --check) flag tells b2sum to read hash sums from the specified file and check them. If the file my-app.tar.gz matches the hash stored in my-app.tar.gz.b2, b2sum will output:

my-app.tar.gz: OK

If the file has been tampered with, is corrupted, or is missing, b2sum will report an error and exit with a non-zero status code, which can be used to halt the CI/CD pipeline.

Ci/cd integration example

Here's a practical example using GitHub Actions to integrate b2sum into your workflow:

name: Verify Build Artifacts

on: [push, pull_request]

jobs:
  build_and_verify:
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v4 # Use the latest major version

      - name: Set up environment
        # Add steps to set up your build environment (e.g., install Node.js, Java, etc.)
        run: echo "Setting up build environment..."

      - name: Build artifact
        run: |
          echo "Building application..."
          # Replace with your actual build commands
          mkdir -p dist
          echo "Build output" > dist/app.txt
          # Create an archive of the build output
          tar -czf artifact.tar.gz ./dist

      - name: Generate hash
        id: generate_hash # Give the step an ID to reference its output
        run: |
          b2sum artifact.tar.gz > artifact.tar.gz.b2
          echo "Generated hash file artifact.tar.gz.b2"
          # Optionally, output the hash value itself for logging
          HASH_VALUE=$(cut -d' ' -f1 artifact.tar.gz.b2)
          echo "hash_value=$HASH_VALUE" >> $GITHUB_OUTPUT

      - name: Verify hash
        run: |
          echo "Verifying hash for artifact.tar.gz..."
          b2sum -c artifact.tar.gz.b2
          echo "Verification successful!"

      - name: Upload artifact and hash
        uses: actions/upload-artifact@v4 # Use the latest major version
        with:
          name: build-artifact
          path: |
            artifact.tar.gz
            artifact.tar.gz.b2
          retention-days: 7 # Optional: Adjust artifact retention period

This workflow:

  1. Checks out the source code.
  2. Sets up the build environment (placeholder).
  3. Builds the application and creates a tar.gz archive.
  4. Generates a BLAKE2 hash of the archive and saves it to a .b2 file. It also outputs the hash value itself.
  5. Verifies the hash immediately to ensure the artifact wasn't corrupted during the process.
  6. Uploads both the artifact (artifact.tar.gz) and its hash file (artifact.tar.gz.b2) for later use in deployment stages.

Automating integrity checks

For more complex scenarios or reuse across different pipelines, you can create a robust shell script to handle integrity verification:

#!/bin/bash

# Verify-integrity.sh: checks the integrity of a file using its corresponding .b2 hash file.

set -euo pipefail # Exit on error, undefined variable, or pipe failure

ARTIFACT_PATH=$1
HASH_FILE="${ARTIFACT_PATH}.b2"

# Check if artifact path is provided
if [ -z "$ARTIFACT_PATH" ]; then
  echo "Usage: $0 <path/to/artifact>"
  exit 1
fi

# Check if artifact file exists
if [ ! -f "$ARTIFACT_PATH" ]; then
  echo "Error: Artifact file '$ARTIFACT_PATH' not found."
  exit 1
fi

# Check if hash file exists
if [ ! -f "$HASH_FILE" ]; then
  echo "Error: Hash file '$HASH_FILE' not found."
  exit 1
fi

echo "Verifying integrity of '$ARTIFACT_PATH' using '$HASH_FILE'..."

# Perform the check
if b2sum --quiet -c "$HASH_FILE"; then
  echo "Integrity check PASSED for '$ARTIFACT_PATH'."
  exit 0
else
  echo "Integrity check FAILED for '$ARTIFACT_PATH'!"
  # b2sum already prints detailed error messages when check fails
  exit 1
fi

Save this as verify-integrity.sh, make it executable with chmod +x verify-integrity.sh, and use it in your pipeline scripts:

# Example usage in a ci/cd script after downloading the artifact and hash file
./verify-integrity.sh path/to/downloaded/my-app.tar.gz
# The script will exit with 0 on success, non-zero on failure

Error handling best practices

When implementing hash verification in your CI/CD pipeline, consider these error handling best practices:

  1. Fail Fast: Configure your pipeline to stop immediately if a hash verification fails. This prevents potentially corrupted or tampered artifacts from being deployed.
  2. Detailed Logging: Log the verification attempt, the expected hash (if easily available), and the outcome. b2sum -c provides useful error messages on failure; ensure these are captured in your CI/CD logs.
  3. Notification System: Integrate with your notification system (e.g., Slack, email) to alert the relevant team immediately when an integrity check fails.
  4. Secure Hash Storage: Ensure the .b2 hash files are stored securely and cannot be easily tampered with. Storing them alongside artifacts in a repository that tracks versions or has immutability features is a good practice. Consider signing the hash files if extra security is needed.
  5. Handle Missing Files: Ensure your scripts gracefully handle cases where either the artifact or the hash file is missing, providing clear error messages.

Troubleshooting common issues

  • Hash Mismatch: This is the primary failure mode, indicating the file content has changed since the hash was generated. Causes include:
    • File corruption during transfer or storage.
    • Intentional or unintentional modification of the artifact after hashing.
    • Generating the hash on a different version of the file than the one being checked.
    • Solution: Re-download or retrieve the original artifact and hash file. If the issue persists, investigate potential corruption sources or rebuild the artifact from source.
  • b2sum: command not found: The b2sum utility is not installed or not in the system's PATH within the CI/CD execution environment.
    • Solution: Ensure the coreutils package (or equivalent) is installed in your CI/CD runner environment (e.g., Docker image, VM). See the Installation section.
  • Permission Issues: The CI/CD process might lack the necessary file system permissions to read the artifact or the hash file.
    • Solution: Verify the permissions and ownership of the files and the execution context of the b2sum command.
  • Line Ending Differences: For text files, differences in line endings (CRLF vs. LF) between the environment where the hash was generated and where it's verified can cause mismatches.
    • Solution: Ensure consistent line endings, often by configuring Git (core.autocrlf) or build tools appropriately. Hashing binary archives (like .zip or .tar.gz) usually avoids this issue.
  • Incorrect Hash File Format: The .b2 file should contain the hash output exactly as produced by b2sum (hash followed by filename). Manual editing can corrupt this format.
    • Solution: Regenerate the hash file using the standard b2sum artifact > artifact.b2 command.

Integration with Transloadit

Transloadit's /file/hash Robot supports multiple hashing algorithms, including BLAKE2b (b2). You can integrate hashing directly into your file processing workflows. Here's an example Assembly using BLAKE2:

{
  "steps": {
    ":original": {
      "robot": "/upload/handle"
    },
    "hashed": {
      "use": ":original",
      "robot": "/file/hash",
      "algorithm": "b2"
    }
  }
}

After the Assembly executes, the BLAKE2b hash value for the processed file will be available in the Assembly result JSON, typically under the results object for the hashed Step, within the file.meta.hash field. This allows you to generate hashes as part of automated upload and processing pipelines.

Conclusion and best practices

Using b2sum in your CI/CD pipeline significantly enhances security and reliability by providing strong guarantees about file integrity. By generating and verifying BLAKE2 hashes, you can detect accidental corruption or malicious tampering of your build artifacts before they reach production.

Key best practices include:

  • Generate hashes immediately after artifact creation.
  • Store hash files securely alongside or linked to their corresponding artifacts.
  • Automate hash verification as a mandatory step before deployment or artifact consumption.
  • Use BLAKE2b (b2sum) for its speed and security advantages on modern systems.
  • Implement robust error handling and notifications for verification failures.
  • Consider using signed hashes or checksums stored in secure manifests for critical applications.

By implementing these practices, you build a more trustworthy and secure deployment pipeline.

At Transloadit, we leverage robust hashing algorithms like BLAKE2 within our Robot ecosystem as part of our file processing services, helping ensure file integrity for our users.