Speed up 'Tar' archiving with zstd and LZ4 compression

File archiving is essential in development workflows for backups, deployments, and data transfer. The 'tar' command, a staple tool in Unix-like systems, is commonly used for file archiving. While traditional compression methods like gzip or bzip2 can be slow with large files, modern compression algorithms offer significant performance improvements. This guide explores advanced 'tar' techniques focusing on speed optimization with Zstd and LZ4.
Installing compression tools
Before using Zstd and LZ4, install them on your system:
# Ubuntu/Debian
sudo apt-get install zstd lz4
# Centos/rhel
sudo yum install zstd lz4
# macOS
brew install zstd lz4
Basic 'Tar' commands for file archiving and extraction
Review the fundamental 'tar' commands for archiving and extraction.
Creating an archive
tar -cf archive.tar /path/to/directory
Explanation:
-c
: Create a new archive-f
: Specify the filename of the archive
Extracting an archive
tar -xf archive.tar
Explanation:
-x
: Extract files from an archive-f
: Specify the filename of the archive
Compression algorithm comparison
Different compression methods offer various trade-offs:
- Zstd: Offers the best balance between speed and compression ratio
- LZ4: Provides the fastest compression and decompression, albeit with a lower compression ratio
- Gzip: Delivers good compression but is slower than Zstd or LZ4
- Bzip2: Achieves high compression but is significantly slower
Using modern compression with 'Tar'
Modern 'tar' versions support Zstd and LZ4 directly, but note that not all versions include this
feature. If your version of 'tar' does not support --zstd
or --lz4
, use the piped command
examples below.
Using zstd compression
For newer tar versions with direct Zstd support:
tar --zstd -cf archive.tar.zst /path/to/directory
For older tar versions:
tar -cf - /path/to/directory | zstd > archive.tar.zst
To extract:
# For newer Tar versions
tar --zstd -xf archive.tar.zst
# For older Tar versions
zstd -dc archive.tar.zst | tar -xf -
Using LZ4 compression
For newer tar versions with direct LZ4 support:
tar --lz4 -cf archive.tar.lz4 /path/to/directory
For older tar versions:
tar -cf - /path/to/directory | lz4 > archive.tar.lz4
To extract:
# For newer Tar versions
tar --lz4 -xf archive.tar.lz4
# For older Tar versions
lz4 -dc archive.tar.lz4 | tar -xf -
Excluding files and directories
Exclude specific files or patterns during archiving:
tar -cf archive.tar /path/to/directory \
--exclude='*.log' \
--exclude='tmp/*' \
--exclude='node_modules'
Incremental backups
Perform incremental backups using snapshot files. The snapshot file (e.g., backup.snar
) tracks
changes between backups, where a Level 0 backup is a full backup and subsequent levels store
incremental changes.
# Level 0 (full) backup
tar --create \
--file=backup-0.tar \
--listed-incremental=backup.snar \
/path/to/directory
# Level 1 (incremental) backup
tar --create \
--file=backup-1.tar \
--listed-incremental=backup.snar \
/path/to/directory
Selective archiving with 'find'
Safely archive files matching specific criteria using null-terminated inputs to handle filenames with spaces and special characters:
# Archive files modified in the last 7 days
find /path/to/directory -type f -mtime -7 -print0 | \
tar --null -cf archive.tar --files-from=-
Error handling and verification
Verify archive integrity and handle errors:
# Test archive integrity
tar -tf archive.tar.zst
# Extract with error checking
tar --extract \
--file=archive.tar.zst \
--warning=no-timestamp
Remote archiving with ssh
Create and extract archives over SSH:
# Create remote archive
ssh user@remotehost "tar -cf - /path/to/directory" | \
zstd > archive.tar.zst
# Extract remote archive
ssh user@remotehost "zstd -dc archive.tar.zst" | \
tar -xf -
Splitting large archives
Handle large archives by splitting them into manageable chunks:
# Create and split archive (100mb chunks)
tar -cf - /path/to/directory | \
zstd | \
split -b 100M - archive.tar.zst.part-
# Reassemble and extract
cat archive.tar.zst.part-* | \
zstd -d | \
tar -xf -
Automating backups
Create a backup script with error handling:
#!/bin/bash
set -euo pipefail
DATE=$(date +%F)
BACKUP_DIR="/path/to/directory"
ARCHIVE_DIR="/path/to/backup"
ARCHIVE_NAME="backup-$DATE.tar.zst"
LOG_FILE="$ARCHIVE_DIR/backup-$DATE.log"
# Create backup with logging
tar --create \
--file=- "$BACKUP_DIR" | \
zstd > "$ARCHIVE_DIR/$ARCHIVE_NAME" 2>> "$LOG_FILE"
# Verify archive
tar -tf "$ARCHIVE_DIR/$ARCHIVE_NAME" >> "$LOG_FILE" 2>&1
Schedule with cron:
0 0 * * * /path/to/backup.sh
Conclusion
Modern compression algorithms like Zstd and LZ4 significantly improve 'tar' archiving performance. Combining these tools with proper error handling, verification, and automation can help you build robust and efficient backup solutions for your development workflow.
At Transloadit, we leverage efficient file archiving and modern compression methods in our File Compressing service to handle large volumes of data effectively.