File archiving is essential in development workflows for backups, deployments, and data transfer. The 'tar' command, a staple tool in Unix-like systems, is commonly used for file archiving. However, traditional compression methods like gzip or bzip2 can be slow with large files. In this post, we'll explore advanced 'tar' techniques to improve efficiency, including using modern compression algorithms like Zstd and Lz4, excluding files during archiving, performing incremental backups, archiving over SSH, and automating 'tar' tasks.

Basic 'Tar' commands for file archiving and extraction

Before diving into advanced techniques, let's review the basic 'tar' commands for file archiving and extraction.

Creating an archive

To create an archive of a directory:

tar -cf archive.tar /path/to/directory

Explanation:

  • -c: Create a new archive.
  • -f: Specify the filename of the archive.

Extracting an archive

To extract an archive:

tar -xf archive.tar

Explanation:

  • -x: Extract files from an archive.
  • -f: Specify the filename of the archive.

Listing contents of an archive

To list the files within an archive:

tar -tf archive.tar
  • -t: List the contents of an archive.

Advanced compression options with 'Tar'

Traditional compression methods include gzip and bzip2, but newer algorithms offer better performance.

Using gzip compression

tar -czf archive.tar.gz /path/to/directory
  • -z: Compress the archive with gzip.

Using bzip2 compression

tar -cjf archive.tar.bz2 /path/to/directory
  • -j: Compress the archive with bzip2.

Using xz compression

tar -cJf archive.tar.xz /path/to/directory
  • -J: Compress the archive with xz.

Using zstd and LZ4 compression

Modern compression algorithms like Zstd and Lz4 provide faster compression and decompression speeds while maintaining comparable compression ratios.

Using zstd compression

Ensure Zstd is installed on your system.

To create a compressed archive with Zstd:

tar --zstd -cf archive.tar.zst /path/to/directory

To extract a Zstd-compressed archive:

tar --zstd -xf archive.tar.zst
Using LZ4 compression

Ensure Lz4 is installed on your system.

To create a compressed archive with Lz4:

tar --lz4 -cf archive.tar.lz4 /path/to/directory

To extract an Lz4-compressed archive:

tar --lz4 -xf archive.tar.lz4

Excluding files and directories when creating archives

When creating archives, you may want to exclude certain files or directories.

Excluding files using the --exclude option

tar -cf archive.tar /path/to/directory --exclude='*.log' --exclude='tmp/*'

This command excludes all .log files and everything in the tmp directory.

Incremental backups using 'Tar'

Incremental backups allow you to back up only the files that have changed since the last backup.

Creating incremental backups

First, create a full backup:

tar -cf backup-full.tar /path/to/directory

Create a snapshot file to track changes:

tar -cf backup-incremental.tar /path/to/directory --listed-incremental=snapshot.file

For subsequent backups, use the snapshot file:

tar -cf backup-incremental.tar /path/to/directory --listed-incremental=snapshot.file

Archiving over ssh: remote backups with 'Tar' and 'ssh'

You can archive files on a remote server via SSH.

Creating an archive over ssh

ssh user@remotehost "tar -cf - /path/to/directory" > archive.tar

Extracting an archive over ssh

ssh user@remotehost "tar -cf - /path/to/directory" | tar -xf -

Combining 'Tar' with 'find' for selective archiving

You can use find to select files based on criteria and pass them to tar.

Example: archiving files modified in the last 7 days

find /path/to/directory -mtime -7 -print0 | tar --null -cf archive.tar --files-from=-

Explanation:

  • -mtime -7: Finds files modified in the last 7 days.
  • -print0: Prints file names separated by null characters.
  • --null: Reads input separated by null characters.
  • --files-from=-: Reads file names from standard input.

Splitting large archives into smaller parts

When dealing with large archives, you might want to split them into smaller chunks.

Using split to divide archives

First, create the archive and compress it:

tar -czf archive.tar.gz /path/to/directory

Then split the archive into 100MB parts:

split -b 100M archive.tar.gz archive.part.

To reassemble the parts:

cat archive.part.* > archive.tar.gz

Automating 'Tar' tasks with scripts and cron Jobs

Automating 'tar' tasks can streamline your backup and archiving processes.

Creating a backup script

Create a script backup.sh:

#!/bin/bash

DATE=$(date +%F)
BACKUP_DIR="/path/to/directory"
ARCHIVE_DIR="/path/to/backup"
ARCHIVE_NAME="backup-$DATE.tar.zst"

tar --zstd -cf "$ARCHIVE_DIR/$ARCHIVE_NAME" "$BACKUP_DIR"

Make the script executable:

chmod +x backup.sh

Setting up a cron Job

Automate the backup script to run daily at midnight:

crontab -e

Add the following line:

0 0 * * * /path/to/backup.sh

Conclusion and best practices

By mastering advanced 'tar' techniques, you can efficiently manage file archiving and backups in your development workflow. Using modern compression algorithms like Zstd and Lz4 speeds up archiving and extraction, while features like excluding files, incremental backups, and automation streamline your processes.

Always test your archives and scripts to ensure data integrity. Incorporating these techniques will help optimize your workflow and save time.

At Transloadit, we leverage efficient file archiving and modern compression methods in our File Compressing service to handle large volumes of data effectively.