Implementing real-time text recognition in your Android app can greatly enhance user experience by enabling instant text extraction from documents, signs, or any text-containing objects through the device's camera. In this tutorial, we explore how to implement OCR (Optical Character Recognition) using OpenCV and Tesseract4Android, providing you with a step-by-step guide to get started.

Prerequisites

Before you begin, ensure you have the following:

  • Android Studio installed on your development machine
  • A device running Android 5.0 (Lollipop) or higher for testing
  • Basic knowledge of Android development and Java programming
  • Familiarity with image processing concepts

Setting up the Android studio project

First, create a new Android Studio project:

  1. Open Android Studio and select New Project.
  2. Choose Empty Activity and click Next.
  3. Set your Application Name, e.g., RealTimeOCR.
  4. Choose Java as the programming language.
  5. Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
  6. Click Finish to create the project.

Adding dependencies: OpenCV and Tesseract

Add the required dependencies to your project.

Project-level build.gradle

allprojects {
    repositories {
        google()
        mavenCentral()
        maven { url 'https://jitpack.io' }
    }
}

App-level build.gradle

dependencies {
    implementation 'org.opencv:opencv-android:4.9.0'
    implementation 'cz.adaptech.tesseract4android:tesseract4android:4.8.0'
}

Copying Tesseract trained data files

Download the Tesseract v4.0.0 trained data files from the official tessdata repository and place them in src/main/assets/tessdata/. Then, include the following utility in your project to copy these files to your app's private directory for runtime access:

private void copyTessDataFiles(File dir) {
    try {
        AssetManager assetManager = getAssets();
        String[] fileList = assetManager.list("tessdata");
        if (fileList != null) {
            for (String fileName : fileList) {
                File file = new File(dir, fileName);
                if (!file.exists()) {
                    InputStream in = assetManager.open("tessdata/" + fileName);
                    OutputStream out = new FileOutputStream(file);
                    byte[] buffer = new byte[1024];
                    int read;
                    while ((read = in.read(buffer)) != -1) {
                        out.write(buffer, 0, read);
                    }
                    in.close();
                    out.flush();
                    out.close();
                }
            }
        }
    } catch (IOException e) {
        Log.e("OCR", "Error copying tess data files", e);
    }
}

Configuring camera access and permissions

Add the necessary permissions to your AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA" />
<uses-feature android:name="android.hardware.camera" android:required="true" />
<uses-feature android:name="android.hardware.camera.autofocus" android:required="false" />

Implement runtime permission handling in your MainActivity.java:

public class MainActivity extends AppCompatActivity {
    private static final int PERMISSIONS_REQUEST_CAMERA = 1;
    private CameraManager cameraManager;
    private HandlerThread backgroundThread;
    private Handler backgroundHandler;

    private void requestCameraPermission() {
        if (ContextCompat.checkSelfPermission(this, Manifest.permission.CAMERA)
                != PackageManager.PERMISSION_GRANTED) {
            ActivityCompat.requestPermissions(this,
                    new String[]{Manifest.permission.CAMERA},
                    PERMISSIONS_REQUEST_CAMERA);
        } else {
            startBackgroundThread();
            initializeCamera();
        }
    }

    private void startBackgroundThread() {
        backgroundThread = new HandlerThread("CameraBackground");
        backgroundThread.start();
        backgroundHandler = new Handler(backgroundThread.getLooper());
    }

    private void stopBackgroundThread() {
        if (backgroundThread != null) {
            backgroundThread.quitSafely();
            try {
                backgroundThread.join();
                backgroundThread = null;
                backgroundHandler = null;
            } catch (InterruptedException e) {
                Log.e("MainActivity", "Error stopping background thread", e);
            }
        }
    }

    @Override
    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions,
                                           @NonNull int[] grantResults) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults);
        if (requestCode == PERMISSIONS_REQUEST_CAMERA) {
            if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
                startBackgroundThread();
                initializeCamera();
            } else {
                Toast.makeText(this, "Camera permission required", Toast.LENGTH_LONG).show();
                finish();
            }
        }
    }
}

Integrating live camera feed

Create a camera preview layout in activity_main.xml:

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <TextureView
        android:id="@+id/texture_view"
        android:layout_width="match_parent"
        android:layout_height="0dp"
        app:layout_constraintTop_toTopOf="parent"
        app:layout_constraintBottom_toTopOf="@id/text_result" />

    <TextView
        android:id="@+id/text_result"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:background="#80000000"
        android:padding="16dp"
        android:textColor="#ffffff"
        android:textSize="16sp"
        app:layout_constraintBottom_toBottomOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

Implementing real-time OCR functionality

Implement the OCR functionality using Tesseract4Android:

public class MainActivity extends AppCompatActivity {
    private TessBaseAPI tessBaseAPI;
    private TextureView textureView;
    private TextView resultText;
    private ExecutorService ocrExecutor;
    private static final int PROCESS_FRAME_INTERVAL = 10;
    private int frameCount = 0;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        textureView = findViewById(R.id.texture_view);
        resultText = findViewById(R.id.text_result);
        ocrExecutor = Executors.newSingleThreadExecutor();

        initializeOCR();
        requestCameraPermission();
    }

    private void initializeOCR() {
        try {
            tessBaseAPI = new TessBaseAPI(new TessBaseAPI.ProgressNotifier() {
                @Override
                public void onProgressValues(TessBaseAPI.ProgressValues progressValues) {
                    Log.d("OCR", "Progress: " + progressValues.getPercent());
                }
            });

            String dataPath = getFilesDir() + "/tessdata/";
            File dir = new File(dataPath);
            if (!dir.exists()) {
                dir.mkdirs();
                copyTessDataFiles(dir);
            }

            if (!tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng")) {
                Log.e("OCR", "Could not initialize Tesseract");
            }

            tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
        } catch (Exception e) {
            Log.e("OCR", "Error initializing Tesseract", e);
        }
    }

    private void processFrame(Bitmap bitmap) {
        if (++frameCount % PROCESS_FRAME_INTERVAL != 0) return;

        ocrExecutor.execute(() -> {
            try {
                Bitmap processedBitmap = preprocessImage(bitmap);
                String ocrResult = performOCR(processedBitmap);
                runOnUiThread(() -> resultText.setText(ocrResult));
                processedBitmap.recycle();
            } catch (Exception e) {
                Log.e("OCR", "Error processing frame", e);
            }
        });
    }

    private Bitmap preprocessImage(Bitmap source) {
        Mat rgba = new Mat();
        Utils.bitmapToMat(source, rgba);

        Mat gray = new Mat();
        Imgproc.cvtColor(rgba, gray, Imgproc.COLOR_RGBA2GRAY);
        Imgproc.threshold(gray, gray, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);

        Bitmap result = Bitmap.createBitmap(gray.cols(), gray.rows(), Bitmap.Config.ARGB_8888);
        Utils.matToBitmap(gray, result);

        rgba.release();
        gray.release();

        return result;
    }

    private String performOCR(Bitmap bitmap) {
        try {
            tessBaseAPI.setImage(bitmap);
            String result = tessBaseAPI.getUTF8Text();
            tessBaseAPI.clear();
            return result;
        } catch (Exception e) {
            Log.e("OCR", "Error performing OCR", e);
            return "Error processing image";
        }
    }

    @Override
    protected void onPause() {
        super.onPause();
        stopBackgroundThread();
    }

    @Override
    protected void onDestroy() {
        super.onDestroy();
        if (tessBaseAPI != null) {
            tessBaseAPI.recycle();
        }
        ocrExecutor.shutdown();
    }
}

Optimizing performance and memory usage

Implement these optimizations to improve OCR performance:

private static final class OCROptimizer {
    private static final int MAX_IMAGE_DIMENSION = 1280;
    private static final double SCALE_FACTOR = 0.5;

    static Bitmap optimizeImageForOCR(Bitmap source) {
        int width = source.getWidth();
        int height = source.getHeight();

        if (Math.max(width, height) > MAX_IMAGE_DIMENSION) {
            float ratio = (float) MAX_IMAGE_DIMENSION / Math.max(width, height);
            width = Math.round(width * ratio);
            height = Math.round(height * ratio);
        }

        Mat sourceMat = new Mat();
        Utils.bitmapToMat(source, sourceMat);
        Mat resized = new Mat();
        Imgproc.resize(sourceMat, resized, new Size(width * SCALE_FACTOR, height * SCALE_FACTOR));

        Mat processed = new Mat();
        Imgproc.GaussianBlur(resized, processed, new Size(3, 3), 0);
        Imgproc.threshold(processed, processed, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);

        Bitmap result = Bitmap.createBitmap(processed.cols(), processed.rows(), Bitmap.Config.ARGB_8888);
        Utils.matToBitmap(processed, result);

        sourceMat.release();
        resized.release();
        processed.release();

        return result;
    }
}

Handling multilingual text recognition

To support recognition in multiple languages, follow these steps:

  1. Place additional trained data files (e.g., fra.traineddata for French, deu.traineddata for German) in the src/main/assets/tessdata/ directory.
  2. Modify your asset extraction to copy all required files.
  3. Use the following code to initialize multilingual OCR support:
private void initializeMultilingualOCR() {
    try {
        String dataPath = getFilesDir() + "/tessdata/";
        File dir = new File(dataPath);
        if (!dir.exists()) {
            dir.mkdirs();
            // Assuming all language files are in the assets folder, copy them
            copyTessDataFiles(dir);
        }

        if (!tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng+fra+deu")) {
            Log.e("OCR", "Could not initialize Tesseract with multilingual data");
        }

        tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
    } catch (Exception e) {
        Log.e("OCR", "Error initializing multilingual OCR", e);
    }
}

Testing and debugging common issues

To ensure a robust OCR implementation, consider the following tips when debugging your app:

  • Verify that camera permissions are granted and the device's camera is accessible.
  • Check that the trained data files are correctly copied to the app's private storage.
  • Use Logcat to monitor logs from Tesseract and OpenCV for errors or performance warnings.
  • Test under varying lighting conditions and document orientations to evaluate OCR accuracy.
  • Trace execution in lifecycle methods using breakpoints or temporary Toast messages to ensure proper thread management.

By systematically testing these components, you can quickly identify and resolve potential issues.

Conclusion

Implementing real-time OCR in your Android app using OpenCV and Tesseract4Android empowers you to extract text from live camera feeds efficiently. By following the steps outlined in this guide—including setting up your project, integrating the necessary dependencies, optimizing performance, and handling multilingual text—you can build robust, on-device text recognition functionality.

For advanced file upload and processing workflows, consider exploring Uppy by Transloadit.

Happy coding!