Implementing OCR in Android apps with Google ML Kit
Optical Character Recognition (OCR) can significantly enhance your Android app by enabling text extraction from images. Google ML Kit offers a robust OCR library that simplifies the implementation of this functionality. In this guide, we'll explore how to integrate OCR capabilities into your Android application.
Prerequisites
Before starting, ensure you have:
- Android Studio installed (latest version)
- An Android device or emulator running Android API level 21 or higher
- Basic knowledge of Android development and the Kotlin programming language
Setting up the Android project
Create a new Android project in Android Studio:
- Open Android Studio and select File > New > New Project
- Choose Empty Activity and click Next
- Set the Name of the project (e.g.,
TextRecognitionApp
) - Select Kotlin as the Language
- Set the Minimum SDK to API 21: Android 5.0 (Lollipop)
- Click Finish to create the project
Adding ML Kit dependencies
Add the following dependencies to your app-level build.gradle
file:
dependencies {
implementation 'com.google.mlkit:text-recognition:16.0.0'
implementation 'androidx.camera:camera-core:1.3.1'
implementation 'androidx.camera:camera-camera2:1.3.1'
implementation 'androidx.camera:camera-lifecycle:1.3.1'
implementation 'androidx.camera:camera-view:1.3.1'
}
Note: Always check for the latest versions of these dependencies in the official documentation to ensure compatibility and access to new features.
Sync your project after adding the dependencies.
Configuring permissions
Add the required permissions to your AndroidManifest.xml
:
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
Implementing OCR functionality
Create a layout file (activity_main.xml
) to include an image preview and text display:
<LinearLayout
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
android:padding="16dp">
<Button
android:id="@+id/btnCapture"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:text="Capture Image" />
<Button
android:id="@+id/btnGallery"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:text="Select from Gallery" />
<ImageView
android:id="@+id/imageView"
android:layout_width="match_parent"
android:layout_height="200dp"
android:layout_marginTop="16dp"
android:scaleType="centerCrop" />
<ScrollView
android:layout_width="match_parent"
android:layout_height="0dp"
android:layout_weight="1"
android:layout_marginTop="16dp">
<TextView
android:id="@+id/textView"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:textSize="16sp" />
</ScrollView>
</LinearLayout>
Implement the OCR functionality in your MainActivity.kt
:
class MainActivity : AppCompatActivity() {
private lateinit var imageView: ImageView
private lateinit var textView: TextView
private lateinit var takePictureLauncher: ActivityResultLauncher<Uri>
private lateinit var selectPictureLauncher: ActivityResultLauncher<String>
private lateinit var photoUri: Uri
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
imageView = findViewById(R.id.imageView)
textView = findViewById(R.id.textView)
findViewById<Button>(R.id.btnCapture).setOnClickListener {
dispatchTakePictureIntent()
}
findViewById<Button>(R.id.btnGallery).setOnClickListener {
dispatchSelectPictureIntent()
}
setupActivityResultLaunchers()
}
private fun setupActivityResultLaunchers() {
takePictureLauncher = registerForActivityResult(ActivityResultContracts.TakePicture()) { success ->
if (success) {
val bitmap = BitmapFactory.decodeStream(contentResolver.openInputStream(photoUri))
imageView.setImageBitmap(bitmap)
processImage(bitmap)
}
}
selectPictureLauncher = registerForActivityResult(ActivityResultContracts.GetContent()) { uri ->
uri?.let {
val bitmap = BitmapFactory.decodeStream(contentResolver.openInputStream(uri))
imageView.setImageBitmap(bitmap)
processImage(bitmap)
}
}
}
private fun dispatchTakePictureIntent() {
val imageFile = File.createTempFile("IMG_", ".jpg", cacheDir)
photoUri = FileProvider.getUriForFile(this, "${packageName}.fileprovider", imageFile)
takePictureLauncher.launch(photoUri)
}
private fun dispatchSelectPictureIntent() {
selectPictureLauncher.launch("image/*")
}
private fun processImage(bitmap: Bitmap) {
val image = InputImage.fromBitmap(bitmap, 0)
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
recognizer.process(image)
.addOnSuccessListener { visionText ->
textView.text = visionText.text
}
.addOnFailureListener { e ->
Toast.makeText(this, "Text recognition failed: ${e.message}", Toast.LENGTH_SHORT).show()
}
}
}
Note: Ensure you have the appropriate error handling and permissions in place for file access and camera usage.
Add the FileProvider configuration to your AndroidManifest.xml
inside the <application>
tag:
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="${applicationId}.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths" />
</provider>
Create res/xml/file_paths.xml
:
<paths>
<cache-path name="cache" path="." />
</paths>
Optimizing OCR performance
To improve OCR accuracy and performance:
- Ensure good image quality with proper lighting and focus.
- Consider image preprocessing techniques like contrast adjustment and noise reduction.
- Use appropriate image resolution; overly high or low resolutions can affect accuracy.
- Handle device rotation and different screen sizes gracefully.
- Implement comprehensive error handling and retry mechanisms.
Testing the application
Test your OCR implementation with various scenarios:
- Different types of text (printed, handwritten).
- Various languages.
- Diverse lighting conditions.
- Multiple text orientations.
- Different image sources (camera, gallery).
Conclusion
By integrating Google ML Kit's OCR capabilities, you can enhance your Android app with powerful text recognition features. This implementation allows users to extract text from both captured images and those selected from the gallery.
For robust file processing and transformation solutions in your applications, consider using Transloadit.