Speed up 'Tar' archiving with zstd and LZ4 compression
File archiving is essential in development workflows for backups, deployments, and data transfer. The 'tar' command, a staple tool in Unix-like systems, is commonly used for file archiving. However, traditional compression methods like gzip or bzip2 can be slow with large files. In this post, we'll explore advanced 'tar' techniques to improve efficiency, including using modern compression algorithms like Zstd and Lz4, excluding files during archiving, performing incremental backups, archiving over SSH, and automating 'tar' tasks.
Basic 'Tar' commands for file archiving and extraction
Before diving into advanced techniques, let's review the basic 'tar' commands for file archiving and extraction.
Creating an archive
To create an archive of a directory:
tar -cf archive.tar /path/to/directory
Explanation:
-c
: Create a new archive.-f
: Specify the filename of the archive.
Extracting an archive
To extract an archive:
tar -xf archive.tar
Explanation:
-x
: Extract files from an archive.-f
: Specify the filename of the archive.
Listing contents of an archive
To list the files within an archive:
tar -tf archive.tar
-t
: List the contents of an archive.
Advanced compression options with 'Tar'
Traditional compression methods include gzip and bzip2, but newer algorithms offer better performance.
Using gzip compression
tar -czf archive.tar.gz /path/to/directory
-z
: Compress the archive with gzip.
Using bzip2 compression
tar -cjf archive.tar.bz2 /path/to/directory
-j
: Compress the archive with bzip2.
Using xz compression
tar -cJf archive.tar.xz /path/to/directory
-J
: Compress the archive with xz.
Using zstd and LZ4 compression
Modern compression algorithms like Zstd and Lz4 provide faster compression and decompression speeds while maintaining comparable compression ratios.
Using zstd compression
Ensure Zstd is installed on your system.
To create a compressed archive with Zstd:
tar --zstd -cf archive.tar.zst /path/to/directory
To extract a Zstd-compressed archive:
tar --zstd -xf archive.tar.zst
Using LZ4 compression
Ensure Lz4 is installed on your system.
To create a compressed archive with Lz4:
tar --lz4 -cf archive.tar.lz4 /path/to/directory
To extract an Lz4-compressed archive:
tar --lz4 -xf archive.tar.lz4
Excluding files and directories when creating archives
When creating archives, you may want to exclude certain files or directories.
Excluding files using the --exclude
option
tar -cf archive.tar /path/to/directory --exclude='*.log' --exclude='tmp/*'
This command excludes all .log
files and everything in the tmp
directory.
Incremental backups using 'Tar'
Incremental backups allow you to back up only the files that have changed since the last backup.
Creating incremental backups
First, create a full backup:
tar -cf backup-full.tar /path/to/directory
Create a snapshot file to track changes:
tar -cf backup-incremental.tar /path/to/directory --listed-incremental=snapshot.file
For subsequent backups, use the snapshot file:
tar -cf backup-incremental.tar /path/to/directory --listed-incremental=snapshot.file
Archiving over ssh: remote backups with 'Tar' and 'ssh'
You can archive files on a remote server via SSH.
Creating an archive over ssh
ssh user@remotehost "tar -cf - /path/to/directory" > archive.tar
Extracting an archive over ssh
ssh user@remotehost "tar -cf - /path/to/directory" | tar -xf -
Combining 'Tar' with 'find' for selective archiving
You can use find
to select files based on criteria and pass them to tar
.
Example: archiving files modified in the last 7 days
find /path/to/directory -mtime -7 -print0 | tar --null -cf archive.tar --files-from=-
Explanation:
-mtime -7
: Finds files modified in the last 7 days.-print0
: Prints file names separated by null characters.--null
: Reads input separated by null characters.--files-from=-
: Reads file names from standard input.
Splitting large archives into smaller parts
When dealing with large archives, you might want to split them into smaller chunks.
Using split
to divide archives
First, create the archive and compress it:
tar -czf archive.tar.gz /path/to/directory
Then split the archive into 100MB parts:
split -b 100M archive.tar.gz archive.part.
To reassemble the parts:
cat archive.part.* > archive.tar.gz
Automating 'Tar' tasks with scripts and cron Jobs
Automating 'tar' tasks can streamline your backup and archiving processes.
Creating a backup script
Create a script backup.sh
:
#!/bin/bash
DATE=$(date +%F)
BACKUP_DIR="/path/to/directory"
ARCHIVE_DIR="/path/to/backup"
ARCHIVE_NAME="backup-$DATE.tar.zst"
tar --zstd -cf "$ARCHIVE_DIR/$ARCHIVE_NAME" "$BACKUP_DIR"
Make the script executable:
chmod +x backup.sh
Setting up a cron Job
Automate the backup script to run daily at midnight:
crontab -e
Add the following line:
0 0 * * * /path/to/backup.sh
Conclusion and best practices
By mastering advanced 'tar' techniques, you can efficiently manage file archiving and backups in your development workflow. Using modern compression algorithms like Zstd and Lz4 speeds up archiving and extraction, while features like excluding files, incremental backups, and automation streamline your processes.
Always test your archives and scripts to ensure data integrity. Incorporating these techniques will help optimize your workflow and save time.
At Transloadit, we leverage efficient file archiving and modern compression methods in our File Compressing service to handle large volumes of data effectively.