Implementing real-time text recognition in Android apps with opencv and tesseract
Implementing real-time text recognition in your Android app can greatly enhance user experience by enabling instant text extraction from documents, signs, or any text-containing objects through the device's camera. In this tutorial, we'll explore how to implement OCR (Optical Character Recognition) using OpenCV and Tesseract OCR Android SDKs, providing you with a step-by-step guide to get started.
Prerequisites
Before we begin, ensure you have the following:
- Android Studio installed on your development machine.
- A device running Android 5.0 (Lollipop) or higher for testing.
- Basic knowledge of Android development and Java programming.
- Familiarity with image processing concepts.
Setting up the Android studio project
First, create a new Android Studio project:
- Open Android Studio and select New Project.
- Choose Empty Activity and click Next.
- Set your Application Name, e.g.,
RealTimeOCR
. - Choose Java as the programming language.
- Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
- Click Finish to create the project.
Adding dependencies: opencv and tesseract
We need to add the OpenCV and Tesseract libraries to our project.
Adding opencv
Add the OpenCV dependency to your app's build.gradle
:
dependencies {
implementation 'org.opencv:opencv-android:4.8.0'
}
Adding tesseract
Add the Tesseract dependency:
dependencies {
implementation 'com.rmtheis:tess-two:9.1.0'
}
Download the Tesseract language data files (e.g., eng.traineddata
) from the
official repository and place them in
src/main/assets/tessdata/
.
Configuring camera access and permissions
Add the necessary permissions to your AndroidManifest.xml
:
<uses-permission android:name="android.permission.CAMERA" />
Note: Starting from Android 6.0 (API level 23), you need to request permissions at runtime.
Implement runtime permission handling in your MainActivity.java
:
private static final int PERMISSIONS_REQUEST_CAMERA = 1;
private void requestCameraPermission() {
if (ContextCompat.checkSelfPermission(this, Manifest.permission.CAMERA)
!= PackageManager.PERMISSION_GRANTED) {
ActivityCompat.requestPermissions(this,
new String[]{Manifest.permission.CAMERA},
PERMISSIONS_REQUEST_CAMERA);
} else {
initializeCamera();
}
}
@Override
public void onRequestPermissionsResult(int requestCode, String[] permissions,
int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
if (requestCode == PERMISSIONS_REQUEST_CAMERA) {
if (grantResults.length > 0 && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
initializeCamera();
} else {
Toast.makeText(this, "Camera permission required", Toast.LENGTH_LONG).show();
finish();
}
}
}
Call requestCameraPermission()
in your onCreate
method.
Integrating live camera feed
Create a camera preview layout in activity_main.xml
:
<?xml version="1.0" encoding="utf-8"?>
<FrameLayout
xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="match_parent">
<org.opencv.android.JavaCameraView
android:id="@+id/camera_view"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:visibility="visible" />
<TextView
android:id="@+id/text_result"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:layout_gravity="bottom"
android:background="#80000000"
android:padding="16dp"
android:textColor="#ffffff"
android:textSize="16sp" />
</FrameLayout>
Implementing real-time OCR functionality
Implement the OCR functionality in your MainActivity.java
:
public class MainActivity extends AppCompatActivity implements CameraBridgeViewBase.CvCameraViewListener2 {
private CameraBridgeViewBase cameraView;
private TessBaseAPI tessBaseAPI;
private TextView resultText;
private Mat currentFrame;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
cameraView = findViewById(R.id.camera_view);
resultText = findViewById(R.id.text_result);
initializeOCR();
requestCameraPermission();
}
private void initializeOCR() {
tessBaseAPI = new TessBaseAPI();
String dataPath = getFilesDir() + "/tessdata/";
File dir = new File(dataPath);
if (!dir.exists()) {
dir.mkdirs();
copyTessDataFiles(dir);
}
tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng");
}
private void copyTessDataFiles(File dir) {
try {
AssetManager assetManager = getAssets();
InputStream in = assetManager.open("tessdata/eng.traineddata");
OutputStream out = new FileOutputStream(new File(dir, "eng.traineddata"));
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
in.close();
out.flush();
out.close();
} catch (IOException e) {
Log.e("MainActivity", "Error copying Tesseract data files: " + e.getMessage());
}
}
private void initializeCamera() {
cameraView.setVisibility(SurfaceView.VISIBLE);
cameraView.setCvCameraViewListener(this);
cameraView.enableView();
}
@Override
public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
currentFrame = inputFrame.rgba();
// Preprocess the image for better OCR results
Mat gray = new Mat();
Imgproc.cvtColor(currentFrame, gray, Imgproc.COLOR_RGBA2GRAY);
Imgproc.threshold(gray, gray, 0, 255, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);
// Convert Mat to Bitmap for Tesseract
Bitmap bitmap = Bitmap.createBitmap(gray.cols(), gray.rows(), Bitmap.Config.ARGB_8888);
Utils.matToBitmap(gray, bitmap);
// Perform OCR
String result = performOCR(bitmap);
// Update UI on main thread
runOnUiThread(() -> resultText.setText(result));
// Return the original frame
return currentFrame;
}
private String performOCR(Bitmap bitmap) {
tessBaseAPI.setImage(bitmap);
String resultText = tessBaseAPI.getUTF8Text();
tessBaseAPI.clear();
return resultText;
}
@Override
protected void onPause() {
super.onPause();
if (cameraView != null) {
cameraView.disableView();
}
}
@Override
protected void onDestroy() {
super.onDestroy();
if (cameraView != null) {
cameraView.disableView();
}
if (tessBaseAPI != null) {
tessBaseAPI.end();
}
}
@Override
public void onCameraViewStarted(int width, int height) {
// Initialization if needed
}
@Override
public void onCameraViewStopped() {
// Cleanup if needed
}
}
Optimizing performance and memory usage
To improve OCR performance and reduce resource usage:
-
Resize Images: Process smaller images to reduce computation time.
Mat resizedFrame = new Mat(); Imgproc.resize(currentFrame, resizedFrame, new Size(currentFrame.width() / 2, currentFrame.height() / 2)); // Proceed with OCR on resizedFrame
-
Frame Skipping: Process every nth frame.
private int frameCount = 0; private static final int PROCESS_FRAME_INTERVAL = 10; @Override public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) { currentFrame = inputFrame.rgba(); if (++frameCount % PROCESS_FRAME_INTERVAL == 0) { // Preprocess and perform OCR Mat gray = new Mat(); Imgproc.cvtColor(currentFrame, gray, Imgproc.COLOR_RGBA2GRAY); // ... rest of OCR processing code } return currentFrame; }
-
Region of Interest (ROI): Focus OCR on specific areas.
Rect roi = new Rect(currentFrame.width() / 4, currentFrame.height() / 4, currentFrame.width() / 2, currentFrame.height() / 2); Mat cropped = new Mat(currentFrame, roi); // Proceed with OCR on cropped region
Handling multilingual text recognition
To support multiple languages, initialize Tesseract with the desired languages and ensure corresponding trained data files are available.
private void initializeOCR() {
tessBaseAPI = new TessBaseAPI();
String dataPath = getFilesDir() + "/tessdata/";
File dir = new File(dataPath);
if (!dir.exists()) {
dir.mkdirs();
copyTessDataFiles(dir, new String[]{"eng.traineddata", "fra.traineddata", "deu.traineddata"});
}
tessBaseAPI.init(getFilesDir().getAbsolutePath(), "eng+fra+deu");
}
private void copyTessDataFiles(File dir, String[] files) {
try {
AssetManager assetManager = getAssets();
for (String filename : files) {
InputStream in = assetManager.open("tessdata/" + filename);
OutputStream out = new FileOutputStream(new File(dir, filename));
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
in.close();
out.flush();
out.close();
}
} catch (IOException e) {
Log.e("MainActivity", "Error copying Tesseract data files: " + e.getMessage());
}
}
Testing and debugging common issues
Implement error handling and logging to assist in debugging:
private String performOCR(Bitmap bitmap) {
try {
tessBaseAPI.setImage(bitmap);
String resultText = tessBaseAPI.getUTF8Text();
tessBaseAPI.clear();
Log.d("OCR", "Recognized text: " + resultText);
return resultText;
} catch (Exception e) {
Log.e("OCR", "Error performing OCR: " + e.getMessage());
return "Error processing image";
}
}
Ensure you have the necessary try-catch blocks and log statements to capture any exceptions that may occur during OCR processing.
Conclusion
By following this guide, you've implemented a functional real-time text recognition system in your Android app using OpenCV and Tesseract OCR Android SDKs. This implementation can be enhanced further with features like text highlighting, language detection, or text-to-speech integration, providing an even richer user experience.
If you're working with media files and need a robust solution for handling file uploads and processing, check out Transloadit, which offers powerful APIs for managing file operations in your applications.