Implementing real-time text recognition in Android apps with OpenCV and Tesseract

Implementing real-time text recognition in your Android app can greatly enhance user experience by enabling instant text extraction from documents, signs, or any text-containing objects through the device's camera. In this tutorial, we explore how to implement OCR (Optical Character Recognition) using OpenCV and Tesseract4Android, providing you with a step-by-step guide to get started.
Prerequisites
Before you begin, ensure you have the following:
- Android Studio installed on your development machine
- A device running Android 5.0 (Lollipop) or higher for testing
- Basic knowledge of Android development and Java programming
- Familiarity with image processing concepts
Setting up the Android studio project
First, create a new Android Studio project:
- Open Android Studio and select New Project.
- Choose Empty Activity and click Next.
- Set your Application Name, e.g., RealTimeOCR.
- Choose Java as the programming language.
- Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
- Click Finish to create the project.
Adding dependencies: OpenCV and Tesseract
Add the required dependencies to your project.
Project-level build.gradle
allprojects {
repositories {
google()
mavenCentral()
maven { url 'https://jitpack.io' }
}
}
App-level build.gradle
dependencies {
implementation 'org.opencv:opencv-android:4.9.0'
implementation 'cz.adaptech.tesseract4android:tesseract4android:4.8.0'
}
Copying Tesseract trained data files
Download the Tesseract v4.0.0 trained data files from the
official tessdata repository and place them
in src/main/assets/tessdata/
. Then, include the following utility in your project to copy these
files to your app's private directory for runtime access:
private void copyTessDataFiles(File dir) {
try {
AssetManager assetManager = getAssets();
String[] fileList = assetManager.list("tessdata");
if (fileList != null) {
for (String fileName : fileList) {
File file = new File(dir, fileName);
if (!file.exists()) {
InputStream in = assetManager.open("tessdata/" + fileName);
OutputStream out = new FileOutputStream(file);
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
in.close();
out.flush();
out.close();
}
}
}
} catch (IOException e) {
Log.e("OCR", "Error copying tess data files", e);
}
}
Configuring camera access and permissions
Add the necessary permissions to your AndroidManifest.xml
:
<uses-permission android:name="android.permission.CAMERA" />
<uses-feature android:name="android.hardware.camera" android:required="true" />
<uses-feature android:name="android.hardware.camera.autofocus" android:required="false" />
Implement runtime permission handling in your MainActivity.java
:
public class MainActivity extends AppCompatActivity {
private static final int PERMISSIONS_REQUEST_CAMERA = 1;
private CameraManager cameraManager;
private HandlerThread backgroundThread;
private Handler backgroundHandler;
private void requestCameraPermission() {
if (ContextCompat.checkSelfPermission(this, Manifest.permission.CAMERA)
!= PackageManager.PERMISSION_GRANTED) {
ActivityCompat.requestPermissions(this,
new String[]{Manifest.permission.CAMERA},
PERMISSIONS_REQUEST_CAMERA);
} else {
startBackgroundThread();
initializeCamera();
}
}
private void startBackgroundThread() {
backgroundThread = new HandlerThread("CameraBackground");
backgroundThread.start();
backgroundHandler = new Handler(backgroundThread.getLooper());
}
private void stopBackgroundThread() {
if (backgroundThread != null) {
backgroundThread.quitSafely();
try {
backgroundThread.join();
backgroundThread = null;
backgroundHandler = null;
} catch (InterruptedException e) {
Log.e("MainActivity", "Error stopping background thread", e);
}
}
}
@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions,
@NonNull int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
if (requestCode == PERMISSIONS_REQUEST_CAMERA) {
if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
startBackgroundThread();
initializeCamera();
} else {
Toast.makeText(this, "Camera permission required", Toast.LENGTH_LONG).show();
finish();
}
}
}
}
Integrating live camera feed
Create a camera preview layout in activity_main.xml
:
<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent">
<TextureView
android:id="@+id/texture_view"
android:layout_width="match_parent"
android:layout_height="0dp"
app:layout_constraintTop_toTopOf="parent"
app:layout_constraintBottom_toTopOf="@id/text_result" />
<TextView
android:id="@+id/text_result"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:background="#80000000"
android:padding="16dp"
android:textColor="#ffffff"
android:textSize="16sp"
app:layout_constraintBottom_toBottomOf="parent" />
</androidx.constraintlayout.widget.ConstraintLayout>
Implementing real-time OCR functionality
Implement the OCR functionality using Tesseract4Android:
public class MainActivity extends AppCompatActivity {
private TessBaseAPI tessBaseAPI;
private TextureView textureView;
private TextView resultText;
private ExecutorService ocrExecutor;
private static final int PROCESS_FRAME_INTERVAL = 10;
private int frameCount = 0;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
textureView = findViewById(R.id.texture_view);
resultText = findViewById(R.id.text_result);
ocrExecutor = Executors.newSingleThreadExecutor();
initializeOCR();
requestCameraPermission();
}
private void initializeOCR() {
try {
tessBaseAPI = new TessBaseAPI(new TessBaseAPI.ProgressNotifier() {
@Override
public void onProgressValues(TessBaseAPI.ProgressValues progressValues) {
Log.d("OCR", "Progress: " + progressValues.getPercent());
}
});
String dataPath = getFilesDir() + "/tessdata/";
File dir = new File(dataPath);
if (!dir.exists()) {
dir.mkdirs();
copyTessDataFiles(dir);
}
if (!tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng")) {
Log.e("OCR", "Could not initialize Tesseract");
}
tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
} catch (Exception e) {
Log.e("OCR", "Error initializing Tesseract", e);
}
}
private void processFrame(Bitmap bitmap) {
if (++frameCount % PROCESS_FRAME_INTERVAL != 0) return;
ocrExecutor.execute(() -> {
try {
Bitmap processedBitmap = preprocessImage(bitmap);
String ocrResult = performOCR(processedBitmap);
runOnUiThread(() -> resultText.setText(ocrResult));
processedBitmap.recycle();
} catch (Exception e) {
Log.e("OCR", "Error processing frame", e);
}
});
}
private Bitmap preprocessImage(Bitmap source) {
Mat rgba = new Mat();
Utils.bitmapToMat(source, rgba);
Mat gray = new Mat();
Imgproc.cvtColor(rgba, gray, Imgproc.COLOR_RGBA2GRAY);
Imgproc.threshold(gray, gray, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);
Bitmap result = Bitmap.createBitmap(gray.cols(), gray.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(gray, result);
rgba.release();
gray.release();
return result;
}
private String performOCR(Bitmap bitmap) {
try {
tessBaseAPI.setImage(bitmap);
String result = tessBaseAPI.getUTF8Text();
tessBaseAPI.clear();
return result;
} catch (Exception e) {
Log.e("OCR", "Error performing OCR", e);
return "Error processing image";
}
}
@Override
protected void onPause() {
super.onPause();
stopBackgroundThread();
}
@Override
protected void onDestroy() {
super.onDestroy();
if (tessBaseAPI != null) {
tessBaseAPI.recycle();
}
ocrExecutor.shutdown();
}
}
Optimizing performance and memory usage
Implement these optimizations to improve OCR performance:
private static final class OCROptimizer {
private static final int MAX_IMAGE_DIMENSION = 1280;
private static final double SCALE_FACTOR = 0.5;
static Bitmap optimizeImageForOCR(Bitmap source) {
int width = source.getWidth();
int height = source.getHeight();
if (Math.max(width, height) > MAX_IMAGE_DIMENSION) {
float ratio = (float) MAX_IMAGE_DIMENSION / Math.max(width, height);
width = Math.round(width * ratio);
height = Math.round(height * ratio);
}
Mat sourceMat = new Mat();
Utils.bitmapToMat(source, sourceMat);
Mat resized = new Mat();
Imgproc.resize(sourceMat, resized, new Size(width * SCALE_FACTOR, height * SCALE_FACTOR));
Mat processed = new Mat();
Imgproc.GaussianBlur(resized, processed, new Size(3, 3), 0);
Imgproc.threshold(processed, processed, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);
Bitmap result = Bitmap.createBitmap(processed.cols(), processed.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(processed, result);
sourceMat.release();
resized.release();
processed.release();
return result;
}
}
Handling multilingual text recognition
To support recognition in multiple languages, follow these steps:
- Place additional trained data files (e.g.,
fra.traineddata
for French,deu.traineddata
for German) in thesrc/main/assets/tessdata/
directory. - Modify your asset extraction to copy all required files.
- Use the following code to initialize multilingual OCR support:
private void initializeMultilingualOCR() {
try {
String dataPath = getFilesDir() + "/tessdata/";
File dir = new File(dataPath);
if (!dir.exists()) {
dir.mkdirs();
// Assuming all language files are in the assets folder, copy them
copyTessDataFiles(dir);
}
if (!tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng+fra+deu")) {
Log.e("OCR", "Could not initialize Tesseract with multilingual data");
}
tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
} catch (Exception e) {
Log.e("OCR", "Error initializing multilingual OCR", e);
}
}
Testing and debugging common issues
To ensure a robust OCR implementation, consider the following tips when debugging your app:
- Verify that camera permissions are granted and the device's camera is accessible.
- Check that the trained data files are correctly copied to the app's private storage.
- Use Logcat to monitor logs from Tesseract and OpenCV for errors or performance warnings.
- Test under varying lighting conditions and document orientations to evaluate OCR accuracy.
- Trace execution in lifecycle methods using breakpoints or temporary Toast messages to ensure proper thread management.
By systematically testing these components, you can quickly identify and resolve potential issues.
Conclusion
Implementing real-time OCR in your Android app using OpenCV and Tesseract4Android empowers you to extract text from live camera feeds efficiently. By following the steps outlined in this guide—including setting up your project, integrating the necessary dependencies, optimizing performance, and handling multilingual text—you can build robust, on-device text recognition functionality.
For advanced file upload and processing workflows, consider exploring Uppy by Transloadit.
Happy coding!