Implementing real-time text recognition in your Android app can greatly enhance user experience by enabling instant text extraction from documents, signs, or any text-containing objects through the device's camera. In this tutorial, we'll explore how to implement OCR (Optical Character Recognition) using OpenCV and Tesseract OCR Android SDKs, providing you with a step-by-step guide to get started.

Prerequisites

Before we begin, ensure you have the following:

  • Android Studio installed on your development machine.
  • A device running Android 5.0 (Lollipop) or higher for testing.
  • Basic knowledge of Android development and Java programming.
  • Familiarity with image processing concepts.

Setting up the Android studio project

First, create a new Android Studio project:

  1. Open Android Studio and select New Project.
  2. Choose Empty Activity and click Next.
  3. Set your Application Name, e.g., RealTimeOCR.
  4. Choose Java as the programming language.
  5. Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
  6. Click Finish to create the project.

Adding dependencies: opencv and tesseract

We need to add the OpenCV and Tesseract libraries to our project.

Adding opencv

Add the OpenCV dependency to your app's build.gradle:

dependencies {
    implementation 'org.opencv:opencv-android:4.8.0'
}

Adding tesseract

Add the Tesseract dependency:

dependencies {
    implementation 'com.rmtheis:tess-two:9.1.0'
}

Download the Tesseract language data files (e.g., eng.traineddata) from the official repository and place them in src/main/assets/tessdata/.

Configuring camera access and permissions

Add the necessary permissions to your AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA" />

Note: Starting from Android 6.0 (API level 23), you need to request permissions at runtime.

Implement runtime permission handling in your MainActivity.java:

private static final int PERMISSIONS_REQUEST_CAMERA = 1;

private void requestCameraPermission() {
    if (ContextCompat.checkSelfPermission(this, Manifest.permission.CAMERA)
            != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this,
                new String[]{Manifest.permission.CAMERA},
                PERMISSIONS_REQUEST_CAMERA);
    } else {
        initializeCamera();
    }
}

@Override
public void onRequestPermissionsResult(int requestCode, String[] permissions,
                                       int[] grantResults) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults);
    if (requestCode == PERMISSIONS_REQUEST_CAMERA) {
        if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
            initializeCamera();
        } else {
            Toast.makeText(this, "Camera permission required", Toast.LENGTH_LONG).show();
            finish();
        }
    }
}

Call requestCameraPermission() in your onCreate method.

Integrating live camera feed

Create a camera preview layout in activity_main.xml:

<?xml version="1.0" encoding="utf-8"?>
<FrameLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <org.opencv.android.JavaCameraView
        android:id="@+id/camera_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:visibility="visible" />

    <TextView
        android:id="@+id/text_result"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_gravity="bottom"
        android:background="#80000000"
        android:padding="16dp"
        android:textColor="#ffffff"
        android:textSize="16sp" />
</FrameLayout>

Implementing real-time OCR functionality

Implement the OCR functionality in your MainActivity.java:

public class MainActivity extends AppCompatActivity implements CameraBridgeViewBase.CvCameraViewListener2 {
    private CameraBridgeViewBase cameraView;
    private TessBaseAPI tessBaseAPI;
    private TextView resultText;
    private Mat currentFrame;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        cameraView = findViewById(R.id.camera_view);
        resultText = findViewById(R.id.text_result);

        initializeOCR();
        requestCameraPermission();
    }

    private void initializeOCR() {
        tessBaseAPI = new TessBaseAPI();
        String dataPath = getFilesDir() + "/tessdata/";
        File dir = new File(dataPath);
        if (!dir.exists()) {
            dir.mkdirs();
            copyTessDataFiles(dir);
        }
        tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng");
    }

    private void copyTessDataFiles(File dir) {
        try {
            AssetManager assetManager = getAssets();
            InputStream in = assetManager.open("tessdata/eng.traineddata");
            OutputStream out = new FileOutputStream(new File(dir, "eng.traineddata"));

            byte[] buffer = new byte[1024];
            int read;
            while ((read = in.read(buffer)) != -1) {
                out.write(buffer, 0, read);
            }
            in.close();
            out.flush();
            out.close();
        } catch (IOException e) {
            Log.e("MainActivity", "Error copying Tesseract data files: " + e.getMessage());
        }
    }

    private void initializeCamera() {
        cameraView.setVisibility(SurfaceView.VISIBLE);
        cameraView.setCvCameraViewListener(this);
        cameraView.enableView();
    }

    @Override
    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        currentFrame = inputFrame.rgba();

        // Preprocess the image for better OCR results
        Mat gray = new Mat();
        Imgproc.cvtColor(currentFrame, gray, Imgproc.COLOR_RGBA2GRAY);
        Imgproc.threshold(gray, gray, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);

        // Convert Mat to Bitmap for Tesseract
        Bitmap bitmap = Bitmap.createBitmap(gray.cols(), gray.rows(), Bitmap.Config.ARGB_8888);
        Utils.matToBitmap(gray, bitmap);

        // Perform OCR
        String result = performOCR(bitmap);

        // Update UI on main thread
        runOnUiThread(() -> resultText.setText(result));

        // Return the original frame
        return currentFrame;
    }

    private String performOCR(Bitmap bitmap) {
        tessBaseAPI.setImage(bitmap);
        String resultText = tessBaseAPI.getUTF8Text();
        tessBaseAPI.clear();
        return resultText;
    }

    @Override
    protected void onPause() {
        super.onPause();
        if (cameraView != null) {
            cameraView.disableView();
        }
    }

    @Override
    protected void onDestroy() {
        super.onDestroy();
        if (cameraView != null) {
            cameraView.disableView();
        }
        if (tessBaseAPI != null) {
            tessBaseAPI.end();
        }
    }

    @Override
    public void onCameraViewStarted(int width, int height) {
        // Initialization if needed
    }

    @Override
    public void onCameraViewStopped() {
        // Cleanup if needed
    }
}

Optimizing performance and memory usage

To improve OCR performance and reduce resource usage:

  1. Resize Images: Process smaller images to reduce computation time.

    Mat resizedFrame = new Mat();
    Imgproc.resize(currentFrame, resizedFrame, new Size(currentFrame.width() / 2, currentFrame.height() / 2));
    
    // Proceed with OCR on resizedFrame
    
  2. Frame Skipping: Process every nth frame.

    private int frameCount = 0;
    private static final int PROCESS_FRAME_INTERVAL = 10;
    
    @Override
    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        currentFrame = inputFrame.rgba();
    
        if (++frameCount % PROCESS_FRAME_INTERVAL == 0) {
            // Preprocess and perform OCR
            Mat gray = new Mat();
            Imgproc.cvtColor(currentFrame, gray, Imgproc.COLOR_RGBA2GRAY);
            // ... rest of OCR processing code
        }
    
        return currentFrame;
    }
    
  3. Region of Interest (ROI): Focus OCR on specific areas.

    Rect roi = new Rect(currentFrame.width() / 4, currentFrame.height() / 4,
                        currentFrame.width() / 2, currentFrame.height() / 2);
    Mat cropped = new Mat(currentFrame, roi);
    
    // Proceed with OCR on cropped region
    

Handling multilingual text recognition

To support multiple languages, initialize Tesseract with the desired languages and ensure corresponding trained data files are available.

private void initializeOCR() {
    tessBaseAPI = new TessBaseAPI();
    String dataPath = getFilesDir() + "/tessdata/";
    File dir = new File(dataPath);
    if (!dir.exists()) {
        dir.mkdirs();
        copyTessDataFiles(dir, new String[]{"eng.traineddata", "fra.traineddata", "deu.traineddata"});
    }
    tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng+fra+deu");
}

private void copyTessDataFiles(File dir, String[] files) {
    try {
        AssetManager assetManager = getAssets();
        for (String filename : files) {
            InputStream in = assetManager.open("tessdata/" + filename);
            OutputStream out = new FileOutputStream(new File(dir, filename));

            byte[] buffer = new byte[1024];
            int read;
            while ((read = in.read(buffer)) != -1) {
                out.write(buffer, 0, read);
            }
            in.close();
            out.flush();
            out.close();
        }
    } catch (IOException e) {
        Log.e("MainActivity", "Error copying Tesseract data files: " + e.getMessage());
    }
}

Testing and debugging common issues

Implement error handling and logging to assist in debugging:

private String performOCR(Bitmap bitmap) {
    try {
        tessBaseAPI.setImage(bitmap);
        String resultText = tessBaseAPI.getUTF8Text();
        tessBaseAPI.clear();
        Log.d("OCR", "Recognized text: " + resultText);
        return resultText;
    } catch (Exception e) {
        Log.e("OCR", "Error performing OCR: " + e.getMessage());
        return "Error processing image";
    }
}

Ensure you have the necessary try-catch blocks and log statements to capture any exceptions that may occur during OCR processing.

Conclusion

By following this guide, you've implemented a functional real-time text recognition system in your Android app using OpenCV and Tesseract OCR Android SDKs. This implementation can be enhanced further with features like text highlighting, language detection, or text-to-speech integration, providing an even richer user experience.

If you're working with media files and need a robust solution for handling file uploads and processing, check out Transloadit, which offers powerful APIs for managing file operations in your applications.