Accelerating compression with 'Tar' and 'pigz'
In the world of software development, efficient file archiving is essential for managing projects
and optimizing workflows. The tar
command is a powerful utility that allows developers to bundle
multiple files and directories into a single archive file. By mastering advanced tar
techniques,
we can enhance our file archiving processes, save time, and streamline our development workflows.
Understanding tar
and its importance
The tar
(tape archive) command is a staple in Unix-like systems for creating and manipulating
archive files. It's widely used for backup purposes, software distribution, and combining multiple
files into one for easier handling.
Basic tar
commands
-
Creating an archive:
tar -cf archive.tar /path/to/directory
-
Extracting an archive:
tar -xf archive.tar
-
Listing contents of an archive:
tar -tf archive.tar
Advanced compression options with tar
Compressing archives reduces storage space and speeds up transfer times. tar
supports various
compression methods, allowing us to balance between compression speed and compression ratio.
Using gzip
The most common compression method with tar
is using gzip
:
tar -czf archive.tar.gz /path/to/directory
The -z
option tells tar
to compress the archive using gzip
.
Using bzip2
For better compression at the cost of speed, use bzip2
:
tar -cjf archive.tar.bz2 /path/to/directory
The -j
option uses bzip2
for compression.
Using xz
For maximum compression ratio:
tar -cJf archive.tar.xz /path/to/directory
The -J
option uses xz
, which provides higher compression ratios than gzip
or bzip2
, albeit
with slower compression speed.
Accelerating compression with pigz
Traditional compression tools like gzip
utilize a single CPU core, which can be a bottleneck on
modern multi-core systems. pigz
(Parallel Implementation of GZip) addresses this by using multiple
cores for compression.
Installing pigz
On Ubuntu/Debian
sudo apt-get update
sudo apt-get install pigz
On macOS (using homebrew)
brew install pigz
On centos/rhel
sudo yum install pigz
Using tar
with pigz
To use pigz
with tar
, specify it as the compression program:
tar -cf archive.tar.gz -I pigz /path/to/directory
This command tells tar
to use pigz
for compression, significantly speeding up the process on
multi-core systems.
Excluding files and directories
When creating archives, you might want to exclude certain files or directories that are unnecessary or too large.
Excluding a single file or directory
tar -czf archive.tar.gz /path/to/directory --exclude='*.log'
This command excludes all files ending with .log
.
Using an exclude file
Create a file exclude.txt
containing patterns to exclude:
*.log
node_modules
.git
Then use the --exclude-from
option:
tar -czf archive.tar.gz /path/to/directory --exclude-from='exclude.txt'
Incremental backups with tar
tar
can perform incremental backups, archiving only files that have changed since the last backup.
Creating a full backup
tar -czf full-backup.tar.gz /path/to/directory --listed-incremental=backup.snar
The --listed-incremental
option uses the snapshot file backup.snar
to keep track of file
changes.
Performing an incremental backup
tar -czf incremental-backup.tar.gz /path/to/directory --listed-incremental=backup.snar
Only files changed since the last backup (recorded in backup.snar
) will be archived.
Archiving over ssh: remote backups
You can create archives on remote systems or transfer archives over the network using SSH.
Archiving a remote directory locally
Create an archive of a remote directory and save it locally:
ssh user@remote_host "tar -cz /path/to/remote/directory" > archive.tar.gz
Archiving a local directory to a remote host
Create an archive of a local directory and save it on a remote host:
tar -cz /path/to/directory | ssh user@remote_host "cat > /path/to/save/archive.tar.gz"
Combining tar
with find
for selective archiving
Using find
, we can selectively include files in our archive based on criteria like modification
time or size.
Example: archiving files modified in the last 7 days
find /path/to/directory -type f -mtime -7 -print0 | tar -czf archive.tar.gz --null -T -
The --null -T -
options tell tar
to read file names from the standard input, separated by null
characters.
Splitting large archives into smaller parts
When dealing with very large archives, you might need to split them into smaller chunks for storage or transfer.
Splitting an archive
tar -cz /path/to/directory | split -b 500M - archive_part_
This command creates compressed archive parts of 500MB each, named archive_part_aa
,
archive_part_ab
, etc.
Reassembling the archive
cat archive_part_* | tar -xz
Automating tar
tasks in your development workflow
Automating archiving tasks saves time and ensures consistency.
Bash script example
Create a script backup.sh
:
#!/bin/bash
TIMESTAMP=$(date +%F)
BACKUP_DIR="/path/to/backup"
SOURCE_DIR="/path/to/directory"
EXCLUDE_FILE="/path/to/exclude.txt"
tar -czf $BACKUP_DIR/backup-$TIMESTAMP.tar.gz --exclude-from=$EXCLUDE_FILE $SOURCE_DIR
Make the script executable:
chmod +x backup.sh
Scheduling with cron
Schedule the script to run daily at midnight:
crontab -e
Add the following line:
0 0 * * * /path/to/backup.sh
Best practices for efficient file archiving
- Regular Backups: Schedule backups regularly to protect against data loss.
- Exclude Unnecessary Files: Use
--exclude
options to avoid archiving unnecessary files. - Monitor Backup Processes: Check logs or set up alerts to ensure backups are successful.
- Store Backups Securely: Save backups in secure, redundant locations.
Conclusion
By mastering advanced tar
techniques, developers can efficiently manage file archiving and backup
processes. Utilizing different compression methods, excluding unneeded files, performing incremental
backups, archiving over SSH, and automating tasks can significantly optimize your development
workflow.
At Transloadit, we value efficient file processing and compression. Our File Compressing service leverages powerful tools to ensure your files are processed quickly and reliably. Give it a try in your next project!