Synthesize speech in documents using Ruby
Converting text documents into speech can enhance accessibility and offer new ways for users to engage with your content. In this DevTip, we'll explore how to synthesize speech in documents using Ruby, implementing text-to-speech functionality for document narration.
Essential Ruby libraries for speech synthesis
Ruby offers libraries for implementing text-to-speech functionality. We'll use the espeak
gem,
which provides a Ruby interface to the open-source eSpeak speech synthesizer.
Setting up the environment
First, let's set up our Ruby environment with the necessary dependencies. We'll use the espeak
gem
for this example, as it's open-source and works offline:
Add the gem to your Gemfile:
gem 'espeak'
Or install it directly:
gem install espeak
Make sure you have the eSpeak synthesizer installed on your system:
For Ubuntu/Debian:
sudo apt-get install espeak
For macOS:
brew install espeak
Basic document narration script
Here's a simple script that converts text documents to speech:
require 'espeak'
include ESpeak
text = File.read('input.txt')
speech = Speech.new(text)
speech.voice = 'en' # Set the voice/language
speech.speed = 120 # Words per minute
speech.pitch = 50 # Voice pitch
speech.save('output.wav')
This script reads the text from input.txt
, synthesizes speech using the eSpeak engine, and saves
the output to output.wav
. You can adjust the voice
, speed
, and pitch
parameters to modify
the output.
Enhanced multi-language support
Let's extend our script to handle multiple languages and provide more configuration options:
require 'espeak'
require 'nokogiri'
include ESpeak
class DocumentNarrator
SUPPORTED_LANGUAGES = {
'en' => 'English',
'fr' => 'French',
'de' => 'German',
'es' => 'Spanish'
}
def initialize(config = {})
@config = {
voice: 'en',
speed: 120,
pitch: 50
}.merge(config)
end
def narrate_document(input_path, output_path)
text = extract_text(input_path)
speech = Speech.new(text, voice: @config[:voice], speed: @config[:speed], pitch: @config[:pitch])
speech.save(output_path)
end
private
def extract_text(file_path)
case File.extname(file_path)
when '.txt'
File.read(file_path)
when '.html', '.htm'
doc = Nokogiri::HTML(File.read(file_path))
doc.xpath('//text()').map(&:text).join(' ')
else
raise "Unsupported file format: #{File.extname(file_path)}"
end
end
end
# Usage example with configuration
narrator = DocumentNarrator.new(
voice: 'es', # Spanish
speed: 130,
pitch: 60
)
narrator.narrate_document('document.html', 'narration.wav')
In this script, we've created a DocumentNarrator
class that supports multiple languages and can
extract text from both plain text and HTML files. The extract_text
method handles different file
types, and the configuration options allow you to customize the voice settings.
Best practices for speech synthesis
-
Text Preprocessing:
-
Normalize text: Clean up unnecessary whitespace and remove special characters.
-
Handle abbreviations and numbers: Expand abbreviations and spell out numbers for clearer pronunciation.
-
-
Audio Quality:
-
Adjust speech rate and pitch: Fine-tuning these settings can result in more natural-sounding audio.
-
Use high-quality voices: If higher quality voices are needed, consider using premium services or more advanced open-source tools.
-
-
Performance Optimization:
-
Process in chunks: For large documents, process text in smaller chunks to manage memory usage.
-
Caching: Implement caching mechanisms if you're converting the same text multiple times.
-
Error handling and validation
Implement robust error handling to manage common issues:
class DocumentNarratorError < StandardError; end
class DocumentNarrator
# ... [previous code] ...
def narrate_document(input_path, output_path)
validate_file(input_path)
validate_output_path(output_path)
text = extract_text(input_path)
speech = Speech.new(text, voice: @config[:voice], speed: @config[:speed], pitch: @config[:pitch])
speech.save(output_path)
end
private
def validate_file(file_path)
raise DocumentNarratorError, 'File not found' unless File.exist?(file_path)
raise DocumentNarratorError, 'File is empty' if File.zero?(file_path)
# Additional validations as needed
end
def validate_output_path(path)
dir = File.dirname(path)
raise DocumentNarratorError, 'Invalid output directory' unless Dir.exist?(dir)
# Additional validations as needed
end
# ... [rest of the class] ...
end
Conclusion
Implementing speech synthesis in Ruby provides a powerful way to add audio narration capabilities to
your document processing workflows. By leveraging the espeak
gem and the eSpeak synthesizer, you
can create customizable text-to-speech solutions. For production environments requiring high-quality
speech synthesis with support for multiple languages and voices, consider using Transloadit's
Text to Speech Robot.