Automating SFTP file polling with Ruby

Importing files directly from remote SFTP servers can be a repetitive task if handled manually. Instead of running one‑off scripts, you can automate this process by continuously polling a remote directory and downloading new files as they appear. In this post, we’ll walk through building a Ruby script that leverages the Net::SFTP gem for secure, automated file imports.
Prerequisites and setup
To follow along with this guide, ensure that you have the following:
- Ruby 2.0 or later
- Net::SFTP gem (version 4.0.0 or later):
gem install net-sftp -v '~> 4.0.0'
- A valid SSH key for passwordless authentication
- Basic familiarity with Ruby and command‑line operations
Also, make sure that your SFTP server has been configured to accept your SSH key and that you have the necessary permissions to access the target directory.
Building the file poller
The goal is to create a Ruby script that connects to an SFTP server, scans a designated directory for files, downloads them to a local folder, and then waits before polling again. We will incorporate robust error handling, logging, and production-ready features.
Here's a complete example of the Ruby script with modern best practices:
require 'net/sftp'
require 'logger'
require 'json'
require 'tempfile'
require 'fileutils'
# Configuration constants
SFTP_HOST = ENV.fetch('SFTP_HOST')
SFTP_USER = ENV.fetch('SFTP_USER')
SFTP_PORT = ENV.fetch('SFTP_PORT', 22).to_i
REMOTE_DIR = ENV.fetch('REMOTE_DIR')
LOCAL_DIR = ENV.fetch('LOCAL_DIR', './downloads')
SSH_KEY = ENV.fetch('SSH_KEY_PATH')
# Initialize structured logger
logger = Logger.new(STDOUT)
logger.formatter = proc do |severity, datetime, progname, msg|
JSON.dump(
timestamp: datetime.iso8601,
severity: severity,
message: msg,
service: 'sftp-poller'
) + "\n"
end
def download_file(sftp, remote_file, final_path, logger)
temp_file = Tempfile.new('sftp-download')
begin
sftp.download!(remote_file, temp_file.path)
FileUtils.mv(temp_file.path, final_path)
logger.info({ action: 'download_complete', file: remote_file, destination: final_path }.to_json)
true
rescue Net::SFTP::StatusException => e
logger.error({ action: 'download_failed', file: remote_file, error: e.message, code: e.code }.to_json)
false
ensure
temp_file.close
temp_file.unlink
end
end
def with_retries(max_attempts: 3, base_delay: 1, logger:)
attempt = 0
begin
attempt += 1
yield
rescue Net::SSH::AuthenticationFailed => e
logger.error({ action: 'authentication_failed', error: e.message }.to_json)
raise
rescue Errno::ECONNREFUSED, Net::SSH::ConnectionTimeout => e
if attempt < max_attempts
delay = base_delay * (2 ** (attempt - 1))
logger.warn({ action: 'retry_attempt', attempt: attempt, delay: delay, error: e.message }.to_json)
sleep delay
retry
end
logger.error({ action: 'max_retries_reached', error: e.message }.to_json)
raise
end
end
def poll_sftp(logger)
with_retries(logger: logger) do
Net::SFTP.start(SFTP_HOST, SFTP_USER, port: SFTP_PORT, keys: [SSH_KEY]) do |sftp|
logger.info({ action: 'connection_established', host: SFTP_HOST, directory: REMOTE_DIR }.to_json)
sftp.dir.foreach(REMOTE_DIR) do |entry|
next if entry.name.start_with?('.')
remote_file = File.join(REMOTE_DIR, entry.name)
local_file = File.join(LOCAL_DIR, entry.name)
if download_file(sftp, remote_file, local_file, logger)
begin
sftp.remove!(remote_file)
logger.info({ action: 'remote_file_removed', file: remote_file }.to_json)
rescue Net::SFTP::StatusException => e
logger.error({ action: 'remove_failed', file: remote_file, error: e.message }.to_json)
end
end
end
end
end
end
# Ensure the local download directory exists
FileUtils.mkdir_p(LOCAL_DIR)
# Set up signal handling for graceful shutdown
@shutdown = false
Signal.trap('TERM') { @shutdown = true }
Signal.trap('INT') { @shutdown = true }
# Main polling loop with graceful shutdown
until @shutdown
poll_sftp(logger)
logger.info({ action: 'polling_wait', delay: 60 }.to_json)
sleep 60
end
logger.info({ action: 'shutdown_complete' }.to_json)
Understanding the script
This production-ready script implements several important features:
-
Environment-based Configuration: Uses environment variables for sensitive configuration, following security best practices.
-
Structured Logging: Implements JSON-formatted logging for better integration with log aggregation systems.
-
Atomic File Operations: Uses temporary files and atomic moves to ensure file integrity during downloads.
-
Robust Error Handling: Implements specific error types and retries with exponential backoff for transient failures.
-
Graceful Shutdown: Properly handles termination signals for clean process management.
-
File Cleanup: Automatically removes successfully downloaded files from the remote server to prevent duplicate processing.
Production deployment
For production environments, consider these deployment options:
Using systemd
Create a systemd service file for reliable process management:
[Unit]
Description=SFTP File Poller
After=network.target
[Service]
Type=simple
User=sftp-user
Environment=SFTP_HOST=example.com
Environment=SFTP_USER=username
Environment=REMOTE_DIR=/remote/path
Environment=LOCAL_DIR=/local/path
Environment=SSH_KEY_PATH=/path/to/key
ExecStart=/usr/bin/ruby /path/to/sftp_poller.rb
Restart=always
RestartSec=60
[Install]
WantedBy=multi-user.target
Using docker
Create a Dockerfile for containerized deployment:
FROM ruby:3.2-slim
RUN gem install net-sftp -v '~> 4.0.0'
WORKDIR /app
COPY sftp_poller.rb .
CMD ["ruby", "sftp_poller.rb"]
Security best practices
-
SSH Key Management:
- Rotate SSH keys regularly
- Use ed25519 keys for better security
- Store keys securely using environment variables or secure vaults
-
Network Security:
- Restrict SFTP access to specific IP ranges
- Use strong ciphers and key exchange algorithms
- Implement connection timeouts
-
File Access:
- Use minimal permissions for both local and remote files
- Implement file integrity checks
- Clean up temporary files properly
-
Monitoring:
- Set up alerts for failed downloads and connection issues
- Monitor disk space usage
- Track processing metrics
Conclusion
This Ruby script provides a robust foundation for automated SFTP file imports, incorporating modern best practices for security, reliability, and maintainability. The implementation handles common edge cases and provides proper logging for monitoring and debugging.
For those seeking a managed solution, Transloadit offers an SFTP Import Robot that handles these complexities automatically. Learn more about it here.