Optical Character Recognition (OCR) can significantly enhance your Android app by enabling text extraction from images. Google ML Kit offers a robust OCR library that simplifies the implementation of this functionality. In this guide, we'll explore how to integrate OCR capabilities into your Android application.

Prerequisites

Before starting, ensure you have:

  • Android Studio installed (latest version)
  • An Android device or emulator running Android API level 21 or higher
  • Basic knowledge of Android development and the Kotlin programming language

Setting up the Android project

Create a new Android project in Android Studio:

  1. Open Android Studio and select File > New > New Project
  2. Choose Empty Activity and click Next
  3. Set the Name of the project (e.g., TextRecognitionApp)
  4. Select Kotlin as the Language
  5. Set the Minimum SDK to API 21: Android 5.0 (Lollipop)
  6. Click Finish to create the project

Adding ML Kit dependencies

Add the following dependencies to your app-level build.gradle file:

dependencies {
    implementation 'com.google.mlkit:text-recognition:16.0.0'
    implementation 'androidx.camera:camera-core:1.3.1'
    implementation 'androidx.camera:camera-camera2:1.3.1'
    implementation 'androidx.camera:camera-lifecycle:1.3.1'
    implementation 'androidx.camera:camera-view:1.3.1'
}

Note: Always check for the latest versions of these dependencies in the official documentation to ensure compatibility and access to new features.

Sync your project after adding the dependencies.

Configuring permissions

Add the required permissions to your AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

Implementing OCR functionality

Create a layout file (activity_main.xml) to include an image preview and text display:

<LinearLayout
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical"
    android:padding="16dp">

    <Button
        android:id="@+id/btnCapture"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="Capture Image" />

    <Button
        android:id="@+id/btnGallery"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="Select from Gallery" />

    <ImageView
        android:id="@+id/imageView"
        android:layout_width="match_parent"
        android:layout_height="200dp"
        android:layout_marginTop="16dp"
        android:scaleType="centerCrop" />

    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="0dp"
        android:layout_weight="1"
        android:layout_marginTop="16dp">

        <TextView
            android:id="@+id/textView"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:textSize="16sp" />
    </ScrollView>
</LinearLayout>

Implement the OCR functionality in your MainActivity.kt:

class MainActivity : AppCompatActivity() {

    private lateinit var imageView: ImageView
    private lateinit var textView: TextView

    private lateinit var takePictureLauncher: ActivityResultLauncher<Uri>
    private lateinit var selectPictureLauncher: ActivityResultLauncher<String>
    private lateinit var photoUri: Uri

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        imageView = findViewById(R.id.imageView)
        textView = findViewById(R.id.textView)

        findViewById<Button>(R.id.btnCapture).setOnClickListener {
            dispatchTakePictureIntent()
        }

        findViewById<Button>(R.id.btnGallery).setOnClickListener {
            dispatchSelectPictureIntent()
        }

        setupActivityResultLaunchers()
    }

    private fun setupActivityResultLaunchers() {
        takePictureLauncher = registerForActivityResult(ActivityResultContracts.TakePicture()) { success ->
            if (success) {
                val bitmap = BitmapFactory.decodeStream(contentResolver.openInputStream(photoUri))
                imageView.setImageBitmap(bitmap)
                processImage(bitmap)
            }
        }

        selectPictureLauncher = registerForActivityResult(ActivityResultContracts.GetContent()) { uri ->
            uri?.let {
                val bitmap = BitmapFactory.decodeStream(contentResolver.openInputStream(uri))
                imageView.setImageBitmap(bitmap)
                processImage(bitmap)
            }
        }
    }

    private fun dispatchTakePictureIntent() {
        val imageFile = File.createTempFile("IMG_", ".jpg", cacheDir)
        photoUri = FileProvider.getUriForFile(this, "${packageName}.fileprovider", imageFile)
        takePictureLauncher.launch(photoUri)
    }

    private fun dispatchSelectPictureIntent() {
        selectPictureLauncher.launch("image/*")
    }

    private fun processImage(bitmap: Bitmap) {
        val image = InputImage.fromBitmap(bitmap, 0)
        val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

        recognizer.process(image)
            .addOnSuccessListener { visionText ->
                textView.text = visionText.text
            }
            .addOnFailureListener { e ->
                Toast.makeText(this, "Text recognition failed: ${e.message}", Toast.LENGTH_SHORT).show()
            }
    }
}

Note: Ensure you have the appropriate error handling and permissions in place for file access and camera usage.

Add the FileProvider configuration to your AndroidManifest.xml inside the <application> tag:

<provider
    android:name="androidx.core.content.FileProvider"
    android:authorities="${applicationId}.fileprovider"
    android:exported="false"
    android:grantUriPermissions="true">
    <meta-data
        android:name="android.support.FILE_PROVIDER_PATHS"
        android:resource="@xml/file_paths" />
</provider>

Create res/xml/file_paths.xml:

<paths>
    <cache-path name="cache" path="." />
</paths>

Optimizing OCR performance

To improve OCR accuracy and performance:

  1. Ensure good image quality with proper lighting and focus.
  2. Consider image preprocessing techniques like contrast adjustment and noise reduction.
  3. Use appropriate image resolution; overly high or low resolutions can affect accuracy.
  4. Handle device rotation and different screen sizes gracefully.
  5. Implement comprehensive error handling and retry mechanisms.

Testing the application

Test your OCR implementation with various scenarios:

  • Different types of text (printed, handwritten).
  • Various languages.
  • Diverse lighting conditions.
  • Multiple text orientations.
  • Different image sources (camera, gallery).

Conclusion

By integrating Google ML Kit's OCR capabilities, you can enhance your Android app with powerful text recognition features. This implementation allows users to extract text from both captured images and those selected from the gallery.

For robust file processing and transformation solutions in your applications, consider using Transloadit.