Last updated: February 5, 2025

Implementing OCR in Android apps with Google ML Kit

Tim Koschützki

Co-founder · Berlin, Germany · Show bio ·

Optical Character Recognition (OCR) enhances your Android app by enabling automatic text detection and extraction from images. Google ML Kit provides a powerful yet simple OCR library that allows you to integrate text recognition with minimal setup. In this guide, you will learn how to add OCR capabilities to your Android application.

Prerequisites

Before starting, ensure you have:

Android Studio Hedgehog (2023.1.1) or newer
Gradle 8.2 or newer
An Android device or emulator running Android API level 21 (Android 5.0) or higher
Target Android SDK 34 (Android 14)
Basic knowledge of Android development and the Kotlin programming language

Setting up the Android project

Create a new Android project in Android Studio:

Open Android Studio and select File > New > New Project.
Choose Empty Activity and click Next.
Set the Name of the project (e.g., TextRecognitionApp).
Select Kotlin as the Language.
Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
Click Finish to create the project.

Adding ML Kit dependencies

Add the following to your project-level build.gradle to include the necessary plugins with explicit versions:

plugins {
    id 'com.android.application' version '8.2.2'
    id 'org.jetbrains.kotlin.android' version '1.9.22'
}

Then, add these dependencies to your app-level build.gradle file. Note the inclusion of view binding to simplify UI interactions:

android {
    buildFeatures {
        viewBinding true
    }
}

dependencies {
    // ML Kit Text Recognition
    implementation 'com.google.mlkit:text-recognition:16.0.1'

    // CameraX Dependencies
    implementation 'androidx.camera:camera-core:1.4.0-alpha04'
    implementation 'androidx.camera:camera-camera2:1.4.0-alpha04'
    implementation 'androidx.camera:camera-lifecycle:1.4.0-alpha04'
    implementation 'androidx.camera:camera-view:1.4.0-alpha04'
}

Sync your project after adding the dependencies.

Configuring permissions

Add the required permissions to your AndroidManifest.xml to enable camera access and external storage reading:

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

Implementing OCR functionality

Create a layout file (activity_main.xml) that includes buttons for capturing or selecting an image, an ImageView for preview, and a TextView within a ScrollView to display recognized text:

<LinearLayout
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical"
    android:padding="16dp">

    <Button
        android:id="@+id/btnCapture"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="Capture Image" />

    <Button
        android:id="@+id/btnGallery"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="Select from Gallery" />

    <ImageView
        android:id="@+id/imageView"
        android:layout_width="match_parent"
        android:layout_height="200dp"
        android:layout_marginTop="16dp"
        android:scaleType="centerCrop" />

    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="0dp"
        android:layout_weight="1"
        android:layout_marginTop="16dp">

        <TextView
            android:id="@+id/textView"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:textSize="16sp" />
    </ScrollView>
</LinearLayout>

Implement the OCR functionality in your MainActivity.kt as follows:

class MainActivity : AppCompatActivity() {
    private lateinit var binding: ActivityMainBinding
    private lateinit var takePictureLauncher: ActivityResultLauncher<Uri>
    private lateinit var selectPictureLauncher: ActivityResultLauncher<String>
    private lateinit var photoUri: Uri
    private lateinit var recognizer: TextRecognizer

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        binding = ActivityMainBinding.inflate(layoutInflater)
        setContentView(binding.root)

        // Initialize ML Kit Text Recognizer
        recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

        binding.btnCapture.setOnClickListener {
            dispatchTakePictureIntent()
        }

        binding.btnGallery.setOnClickListener {
            dispatchSelectPictureIntent()
        }

        setupActivityResultLaunchers()
    }

    private fun setupActivityResultLaunchers() {
        takePictureLauncher = registerForActivityResult(ActivityResultContracts.TakePicture()) { success ->
            if (success) {
                contentResolver.openInputStream(photoUri)?.use { stream ->
                    val bitmap = BitmapFactory.decodeStream(stream)
                    binding.imageView.setImageBitmap(bitmap)
                    processImage(bitmap)
                }
            }
        }

        selectPictureLauncher = registerForActivityResult(ActivityResultContracts.GetContent()) { uri ->
            uri?.let {
                contentResolver.openInputStream(it)?.use { stream ->
                    val bitmap = BitmapFactory.decodeStream(stream)
                    binding.imageView.setImageBitmap(bitmap)
                    processImage(bitmap)
                }
            }
        }
    }

    private fun dispatchTakePictureIntent() {
        val imageFile = File.createTempFile("IMG_", ".jpg", cacheDir)
        photoUri = FileProvider.getUriForFile(this, "${packageName}.fileprovider", imageFile)
        takePictureLauncher.launch(photoUri)
    }

    private fun dispatchSelectPictureIntent() {
        selectPictureLauncher.launch("image/*")
    }

    private fun processImage(bitmap: Bitmap) {
        val image = InputImage.fromBitmap(bitmap, 0)

        recognizer.process(image)
            .addOnSuccessListener { visionText ->
                binding.textView.text = visionText.text
            }
            .addOnFailureListener { e ->
                if (e is MlKitException && e.errorCode == MlKitException.UNAVAILABLE) {
                    // The model is still downloading. Inform the user accordingly.
                    showMessage("Text recognition model is still downloading")
                } else {
                    showMessage("Text recognition failed: ${e.message}")
                }
            }
    }

    private fun showMessage(message: String) {
        Toast.makeText(this, message, Toast.LENGTH_SHORT).show()
    }

    override fun onDestroy() {
        super.onDestroy()
        recognizer.close()
    }
}

Add the FileProvider configuration to your AndroidManifest.xml within the <application> tag:

<provider
    android:name="androidx.core.content.FileProvider"
    android:authorities="${applicationId}.fileprovider"
    android:exported="false"
    android:grantUriPermissions="true">
    <meta-data
        android:name="android.support.FILE_PROVIDER_PATHS"
        android:resource="@xml/file_paths" />
</provider>

Create the file res/xml/file_paths.xml with the following content:

<paths>
    <cache-path name="cache" path="." />
</paths>

Optimizing OCR performance

To improve OCR accuracy and performance, consider these recommendations:

Image Resolution: Aim for 16–24 pixels per character height and keep the total image size below 1920x1080 pixels.
Image Quality: Ensure proper lighting, focus, and minimal motion blur. Use image preprocessing techniques such as contrast adjustment, noise reduction, and sharpening when necessary.
Memory Management: Release resources in onDestroy() and recycle large bitmaps if they are no longer needed.
Image Orientation: Adjust the image rotation based on metadata (e.g., using Exif information) to ensure the text is correctly oriented during recognition.
Model Download: If you receive a model download warning, display a loading indicator until the model is fully available.

Testing the application

Test your OCR implementation under various conditions:

Text Variations:
- Printed text
- Handwritten text
- Different fonts and sizes
Multiple Languages and Scripts:
- Latin script
- Chinese characters
- Devanagari
- Japanese
- Korean
Environmental Factors:
- Varying lighting conditions
- Different text orientations
- Images from the camera and gallery
- Device rotations

Troubleshooting

If you encounter issues with OCR functionality, check the following:

Verify that the image is clear, well-lit, and within the recommended resolution.
Ensure that all required permissions (CAMERA and READ_EXTERNAL_STORAGE) are granted.
Confirm that the FileProvider is correctly configured in the AndroidManifest.xml and that the temporary image file is accessible.
If you see a message indicating the text recognition model is still downloading, make sure you have a stable network connection and allow some time for the model to download.
Use Logcat to inspect any errors, and refer to the ML Kit documentation for further troubleshooting tips.

Conclusion

Integrating Google ML Kit's OCR capabilities enables your Android app to extract text from captured images or those selected from the gallery. This guide covered project setup, dependency management, implementation details, performance optimizations, and troubleshooting tips.

For robust file processing and transformation solutions in your applications, consider using Transloadit.

#android #ocr #google-ml-kit #android-ocr #artificial-intelligence-service