Implementing OCR in Android apps with Google ML Kit

Optical Character Recognition (OCR) enhances your Android app by enabling automatic text detection and extraction from images. Google ML Kit provides a powerful yet simple OCR library that allows you to integrate text recognition with minimal setup. In this guide, you will learn how to add OCR capabilities to your Android application.
Prerequisites
Before starting, ensure you have:
- Android Studio Hedgehog (2023.1.1) or newer
- Gradle 8.2 or newer
- An Android device or emulator running Android API level 21 (Android 5.0) or higher
- Target Android SDK 34 (Android 14)
- Basic knowledge of Android development and the Kotlin programming language
Setting up the Android project
Create a new Android project in Android Studio:
- Open Android Studio and select File > New > New Project.
- Choose Empty Activity and click Next.
- Set the Name of the project (e.g.,
TextRecognitionApp
). - Select Kotlin as the Language.
- Set the Minimum SDK to API 21: Android 5.0 (Lollipop).
- Click Finish to create the project.
Adding ML Kit dependencies
Add the following to your project-level build.gradle
to include the necessary plugins with
explicit versions:
plugins {
id 'com.android.application' version '8.2.2'
id 'org.jetbrains.kotlin.android' version '1.9.22'
}
Then, add these dependencies to your app-level build.gradle
file. Note the inclusion of view
binding to simplify UI interactions:
android {
buildFeatures {
viewBinding true
}
}
dependencies {
// ML Kit Text Recognition
implementation 'com.google.mlkit:text-recognition:16.0.1'
// CameraX Dependencies
implementation 'androidx.camera:camera-core:1.4.0-alpha04'
implementation 'androidx.camera:camera-camera2:1.4.0-alpha04'
implementation 'androidx.camera:camera-lifecycle:1.4.0-alpha04'
implementation 'androidx.camera:camera-view:1.4.0-alpha04'
}
Sync your project after adding the dependencies.
Configuring permissions
Add the required permissions to your AndroidManifest.xml
to enable camera access and external
storage reading:
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
Implementing OCR functionality
Create a layout file (activity_main.xml
) that includes buttons for capturing or selecting an
image, an ImageView
for preview, and a TextView
within a ScrollView
to display recognized
text:
<LinearLayout
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
android:padding="16dp">
<Button
android:id="@+id/btnCapture"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:text="Capture Image" />
<Button
android:id="@+id/btnGallery"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:text="Select from Gallery" />
<ImageView
android:id="@+id/imageView"
android:layout_width="match_parent"
android:layout_height="200dp"
android:layout_marginTop="16dp"
android:scaleType="centerCrop" />
<ScrollView
android:layout_width="match_parent"
android:layout_height="0dp"
android:layout_weight="1"
android:layout_marginTop="16dp">
<TextView
android:id="@+id/textView"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:textSize="16sp" />
</ScrollView>
</LinearLayout>
Implement the OCR functionality in your MainActivity.kt
as follows:
class MainActivity : AppCompatActivity() {
private lateinit var binding: ActivityMainBinding
private lateinit var takePictureLauncher: ActivityResultLauncher<Uri>
private lateinit var selectPictureLauncher: ActivityResultLauncher<String>
private lateinit var photoUri: Uri
private lateinit var recognizer: TextRecognizer
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
binding = ActivityMainBinding.inflate(layoutInflater)
setContentView(binding.root)
// Initialize ML Kit Text Recognizer
recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
binding.btnCapture.setOnClickListener {
dispatchTakePictureIntent()
}
binding.btnGallery.setOnClickListener {
dispatchSelectPictureIntent()
}
setupActivityResultLaunchers()
}
private fun setupActivityResultLaunchers() {
takePictureLauncher = registerForActivityResult(ActivityResultContracts.TakePicture()) { success ->
if (success) {
contentResolver.openInputStream(photoUri)?.use { stream ->
val bitmap = BitmapFactory.decodeStream(stream)
binding.imageView.setImageBitmap(bitmap)
processImage(bitmap)
}
}
}
selectPictureLauncher = registerForActivityResult(ActivityResultContracts.GetContent()) { uri ->
uri?.let {
contentResolver.openInputStream(it)?.use { stream ->
val bitmap = BitmapFactory.decodeStream(stream)
binding.imageView.setImageBitmap(bitmap)
processImage(bitmap)
}
}
}
}
private fun dispatchTakePictureIntent() {
val imageFile = File.createTempFile("IMG_", ".jpg", cacheDir)
photoUri = FileProvider.getUriForFile(this, "${packageName}.fileprovider", imageFile)
takePictureLauncher.launch(photoUri)
}
private fun dispatchSelectPictureIntent() {
selectPictureLauncher.launch("image/*")
}
private fun processImage(bitmap: Bitmap) {
val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
.addOnSuccessListener { visionText ->
binding.textView.text = visionText.text
}
.addOnFailureListener { e ->
if (e is MlKitException && e.errorCode == MlKitException.UNAVAILABLE) {
// The model is still downloading. Inform the user accordingly.
showMessage("Text recognition model is still downloading")
} else {
showMessage("Text recognition failed: ${e.message}")
}
}
}
private fun showMessage(message: String) {
Toast.makeText(this, message, Toast.LENGTH_SHORT).show()
}
override fun onDestroy() {
super.onDestroy()
recognizer.close()
}
}
Add the FileProvider configuration to your AndroidManifest.xml
within the <application>
tag:
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="${applicationId}.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths" />
</provider>
Create the file res/xml/file_paths.xml
with the following content:
<paths>
<cache-path name="cache" path="." />
</paths>
Optimizing OCR performance
To improve OCR accuracy and performance, consider these recommendations:
- Image Resolution: Aim for 16–24 pixels per character height and keep the total image size below 1920x1080 pixels.
- Image Quality: Ensure proper lighting, focus, and minimal motion blur. Use image preprocessing techniques such as contrast adjustment, noise reduction, and sharpening when necessary.
- Memory Management: Release resources in
onDestroy()
and recycle large bitmaps if they are no longer needed. - Image Orientation: Adjust the image rotation based on metadata (e.g., using Exif information) to ensure the text is correctly oriented during recognition.
- Model Download: If you receive a model download warning, display a loading indicator until the model is fully available.
Testing the application
Test your OCR implementation under various conditions:
-
Text Variations:
- Printed text
- Handwritten text
- Different fonts and sizes
-
Multiple Languages and Scripts:
- Latin script
- Chinese characters
- Devanagari
- Japanese
- Korean
-
Environmental Factors:
- Varying lighting conditions
- Different text orientations
- Images from the camera and gallery
- Device rotations
Troubleshooting
If you encounter issues with OCR functionality, check the following:
- Verify that the image is clear, well-lit, and within the recommended resolution.
- Ensure that all required permissions (CAMERA and READ_EXTERNAL_STORAGE) are granted.
- Confirm that the FileProvider is correctly configured in the
AndroidManifest.xml
and that the temporary image file is accessible. - If you see a message indicating the text recognition model is still downloading, make sure you have a stable network connection and allow some time for the model to download.
- Use Logcat to inspect any errors, and refer to the ML Kit documentation for further troubleshooting tips.
Conclusion
Integrating Google ML Kit's OCR capabilities enables your Android app to extract text from captured images or those selected from the gallery. This guide covered project setup, dependency management, implementation details, performance optimizations, and troubleshooting tips.
For robust file processing and transformation solutions in your applications, consider using Transloadit.