11 June 2018

Handwritten Digit Recognition on Android


Machine Learning and Deep Learning algorithms are now everywhere! Every day, when you use your laptop, tablet or phone, you either unconsciously donate data to big companies or consume information prepared by massive data centres and millions of other users.

At some point, a mobile developer may become intrigued and decide to try preparing their own smart applications. After all, you can search your Google Photos by context – look for named objects, specific locations or people. Don’t believe me? Type “vacation” or “work” into Google Photos search and be amazed.

Many problems need to be solved using machine learning. Among many: classification – getting the right label for an item class, regression – predicting value based on the previous input, or clustering – grouping elements automatically by relevance to each other.

In this article, I would like to focus on the Classification problem – especially with regard to the hot topic of image recognition. How about we create an application together, that allows you to draw a digit on screen with your finger and immediately identify its correct name offline?

Digit Android recognition
Digit recognition in Android

To start our adventure with machine learning, we need 3 things:

  1. Android app with drawing panel
  2. Pre-trained mathematical model that recognizes digits
  3. Machine learning library that allows these distinct worlds to meet.

To do this, we can use Tensorflow. It’s a beautiful tool for experimenting with, building and exporting models to Android, iOS or Cloud.

I’d like to focus on integrating Tensorflow on Android so that we won’t train our neural network today, but instead, we will use the existing, powerful net in our app. My weapon of choice is the convolutional neural net from this tutorial:

Integrating Tensorflow with an Android project is now super easy. Create a new project, add NDK support, and add to your Gradle the code below:

And we are ready to go.

As a theoretical introduction, I’d like to describe what a training set looks like.

We are using MNIST – a popular training and testing set of handwritten numbers

How is each number represented?

It’s a bitmap with a size of 28x28px, with values of 0-255 to represent the blackness of a pixel.

But our model requires a format that is a bit different. We need to flatten it into a single vector – a 1d array of 784 elements, of float type. 0.0 represents a white pixel, and 1.0 represents a black pixel. Everything in between is anti-aliased borders. What do we know about our model right now?

Model Representation
Model Representation

To sum up, we need a named input – afloat array of 784 elements, and our output is a float array of 10 elements – a representation of the probability of each class. There is no “unknown” class – even when you draw something inappropriate, the classifier will find the closest match. By the way, I tried. It’s “five”.

So if we know what the model is expecting, let’s give it something to work with.

Our activity is not complicated because we only need:

  • Classify button
  • Reset button
  • Squared canvas to draw
  • ImageView to preview preprocessed bitmap
Machine Learning - Image View
Image View

Here is the full code of main_activity.xml:

As you have noticed, I decided to create a custom Drawing View to encapsulate the drawing logic. We need to extend the View and initialize fields:

Importantly, we need to make it squared to match phone width in portait mode:

We initialize our canvas and bitmap inside the onSizeChanged callback:

To draw on canvas, we need to set up a Paint object:

And drawing in real time is handled by the Path object:

But how to detect whether our finger has touched the canvas?
We need to override the onTouch method, where we can detect gesture type and location:

When the finger touches the screen, the current path metadata is cleared, and instead, we ask the path to move into the current touch point. We have to remember x and y of the gesture, we’ll need it later:

Once the move effect is detected, we calculate movement delta for xand y. If it’s big enough, we can extend the current quadratic bezier line that represents our drawing. Also, we have to keep track of the current location, because we’ll need it once the drawing is done.

When the finger is lifted from the screen, we can drive the final version of the line and reset the current path, so no interferences between gestures occur. In this implementation, we don’t need to draw the digit with a single gesture, we can take our time and prepare our masterpiece carefully.

To reset, we just need to reinitialize canvas and bitmap when calling the already known onSizeChanged method.

Drawing on canvas will affect our bitmap immediately, which is awesome. Let’s share this bitmap with other classes by making it available via getter.
Of course, the bitmap is as big as your screen, so we need to scale it down:

Houston, we may have a problem now. We already know what input format our Tensorflow model is expecting, but how will the bitmap be stored in memory?

Storing bitmaps in Android
How Android Stores Bitmap?

Every pixel is represented by a 32-bit integer in ARGB format. This means that we need to convert it to 8-bit grayscale first.

Then make it a fraction instead of a 0-255 value. And multiply it by the fraction of alpha channel to make sure that antialiased borders are also taken into consideration by the model. Finally, let’s invert it, because Tensorflow is expecting 1 when a pixel is black and 0 when a pixel is white.

Once our input data is ready, let’s initialise the classifier object.
We prepare a factory method that initialises named input, output and TensorflowInterferenceInterface – our bridge between Android and the C++ world

We’d better call it from the background thread using a simple executor:

And what are our constants?

As you can see, we can put our pre-trained model into an asset folder, and our Bridge will take it from there. It’s also lovely to declare a file with class output labels, so we can rename our digits to whatever we want.
Now we are approaching the climax. The Classifier is ready, the input is transformed, so let’s start the show.

What’s under the hood? Communicating with the model requires just three method calls:

Inside the feed method, we declare the input name, which must match the input name inside the trained model. The second param is an array of our preprocessed pixels. And finally – we need to let Tensorflow know how many elements are expected in the input.
From tensorflow’s perspective, a vector is a one-dimensional data structure with 784 elements in it’s only dimension.
The run method makes floats flow through digital neurons. Once that process is finished, we are ready to receive the final results. The fetch method requires an output name and an array of elements. This array represents confidence regarding the recognized digit. The probability that this number is zero is stored as the 0th element of the array, one – the 1st element, et cetera.
Finally, we just need to iterate over the returned array to look for the best result:

And return it to MainActivity, where a sweet Toast is displayed:

And that’s it!
You can download the full sample from my Github:

And have fun with it.
Let me know what do you think about this article, and share it if you loved it!

Check out my articles on our SH blog:

Bartosz Kraszewski
Software Developer

Software Engineer specialised in Mobile Applications development. Focused on code quality and standards, experienced working in fast paced, product-oriented environment - Silicon Valley startups. Co-founder of Mobile Bialystok - local mobile technology enthusiasts group. Also an amateur squash player.