File archiving is essential in development workflows for backups, deployments, and data transfer. The 'tar' command, a staple tool in Unix-like systems, is commonly used for file archiving. While traditional compression methods like gzip or bzip2 can be slow with large files, modern compression algorithms offer significant performance improvements. This guide explores advanced 'tar' techniques focusing on speed optimization with Zstd and LZ4.

Installing compression tools

Before using Zstd and LZ4, install them on your system:

# Ubuntu/Debian
sudo apt-get install zstd lz4

# Centos/rhel
sudo yum install zstd lz4

# macOS
brew install zstd lz4

Basic 'Tar' commands for file archiving and extraction

Review the fundamental 'tar' commands for archiving and extraction.

Creating an archive

tar -cf archive.tar /path/to/directory

Explanation:

  • -c: Create a new archive
  • -f: Specify the filename of the archive

Extracting an archive

tar -xf archive.tar

Explanation:

  • -x: Extract files from an archive
  • -f: Specify the filename of the archive

Compression algorithm comparison

Different compression methods offer various trade-offs:

  • Zstd: Offers the best balance between speed and compression ratio
  • LZ4: Provides the fastest compression and decompression, albeit with a lower compression ratio
  • Gzip: Delivers good compression but is slower than Zstd or LZ4
  • Bzip2: Achieves high compression but is significantly slower

Using modern compression with 'Tar'

Modern 'tar' versions support Zstd and LZ4 directly, but note that not all versions include this feature. If your version of 'tar' does not support --zstd or --lz4, use the piped command examples below.

Using zstd compression

For newer tar versions with direct Zstd support:

tar --zstd -cf archive.tar.zst /path/to/directory

For older tar versions:

tar -cf - /path/to/directory | zstd > archive.tar.zst

To extract:

# For newer Tar versions
tar --zstd -xf archive.tar.zst

# For older Tar versions
zstd -dc archive.tar.zst | tar -xf -

Using LZ4 compression

For newer tar versions with direct LZ4 support:

tar --lz4 -cf archive.tar.lz4 /path/to/directory

For older tar versions:

tar -cf - /path/to/directory | lz4 > archive.tar.lz4

To extract:

# For newer Tar versions
tar --lz4 -xf archive.tar.lz4

# For older Tar versions
lz4 -dc archive.tar.lz4 | tar -xf -

Excluding files and directories

Exclude specific files or patterns during archiving:

tar -cf archive.tar /path/to/directory \
  --exclude='*.log' \
  --exclude='tmp/*' \
  --exclude='node_modules'

Incremental backups

Perform incremental backups using snapshot files. The snapshot file (e.g., backup.snar) tracks changes between backups, where a Level 0 backup is a full backup and subsequent levels store incremental changes.

# Level 0 (full) backup
tar --create \
    --file=backup-0.tar \
    --listed-incremental=backup.snar \
    /path/to/directory

# Level 1 (incremental) backup
tar --create \
    --file=backup-1.tar \
    --listed-incremental=backup.snar \
    /path/to/directory

Selective archiving with 'find'

Safely archive files matching specific criteria using null-terminated inputs to handle filenames with spaces and special characters:

# Archive files modified in the last 7 days
find /path/to/directory -type f -mtime -7 -print0 | \
tar --null -cf archive.tar --files-from=-

Error handling and verification

Verify archive integrity and handle errors:

# Test archive integrity
tar -tf archive.tar.zst

# Extract with error checking
tar --extract \
    --file=archive.tar.zst \
    --warning=no-timestamp

Remote archiving with ssh

Create and extract archives over SSH:

# Create remote archive
ssh user@remotehost "tar -cf - /path/to/directory" | \
zstd > archive.tar.zst

# Extract remote archive
ssh user@remotehost "zstd -dc archive.tar.zst" | \
tar -xf -

Splitting large archives

Handle large archives by splitting them into manageable chunks:

# Create and split archive (100mb chunks)
tar -cf - /path/to/directory | \
zstd | \
split -b 100M - archive.tar.zst.part-

# Reassemble and extract
cat archive.tar.zst.part-* | \
zstd -d | \
tar -xf -

Automating backups

Create a backup script with error handling:

#!/bin/bash
set -euo pipefail

DATE=$(date +%F)
BACKUP_DIR="/path/to/directory"
ARCHIVE_DIR="/path/to/backup"
ARCHIVE_NAME="backup-$DATE.tar.zst"
LOG_FILE="$ARCHIVE_DIR/backup-$DATE.log"

# Create backup with logging
tar --create \
    --file=- "$BACKUP_DIR" | \
zstd > "$ARCHIVE_DIR/$ARCHIVE_NAME" 2>> "$LOG_FILE"

# Verify archive
tar -tf "$ARCHIVE_DIR/$ARCHIVE_NAME" >> "$LOG_FILE" 2>&1

Schedule with cron:

0 0 * * * /path/to/backup.sh

Conclusion

Modern compression algorithms like Zstd and LZ4 significantly improve 'tar' archiving performance. Combining these tools with proper error handling, verification, and automation can help you build robust and efficient backup solutions for your development workflow.

At Transloadit, we leverage efficient file archiving and modern compression methods in our File Compressing service to handle large volumes of data effectively.