We confront face detection and image recognition algorithms every day – in mobile phones, cameras, on Facebook or Snapchat.
Because of that, maybe it’s worth to think about the way in which those algorithms work and how can you implement them in your application.
I will try to answer these questions below.
Face detection algorithm
This algorithm recognises unique attributes such as eyes, lips or a nose. Thanks to all gathered data during image processing, it can learn how to decide if individual recognised characteristics are placed with each other in a proper position.
An image can be processed iteratively, piece by piece, and the code rejects those that don’t have characteristics of a human face.
In the end, it recognises more detailed features with every iteration that goes by, and it confirms its opinion that remaining, unrecognised image fragments contain a human face.
Processing the image
Let’s take a look at a piece of the image we want to process, marked with a red frame. Within this frame, characteristics are being scanned, and they are marked with black and white rectangles. When the algorithm detects a sufficient number of characteristics within given area, it marks this area and remembers it. You can see those areas highlighted with green frames.
So we already know, in basic terms, how those algorithms work. On the other side, their implementation from scratch would be very interesting but also time-absorbing and problematic. Because of that, I have used ready-to-go libraries. OpenCV library, created by Intel, is the most popular library in the world.
A community is still developing it as an open source library. It was written in C language, but there is a plugin called Emgu.CV, written in C#, which is a wrapper mapping almost everything one-to-one.
Teaching an algorithm
The first step is to load set of attributes – algorithm needs to learn how to recognise faces. We don’t have to teach it because we can use many ready-to-go learning sets, available on the Internet in the form of XML files. I have used two of them – the first set teaches how to recognise a face from the front, the second one how to identify a face from the side.
To start detecting, we need to call a DetectMultiScale method on a classifier object, which we have created and trained before.
As the first argument, we give an image, and the next parameter is a scaling size of a frame, which is coloured in red. The value of 1.1 means that with every next iteration, this frame gets bigger by 10%. The next argument defines the minimal amount of recognised areas, which recognise face after grouping. During next iterations, the algorithm marks and remembers (green frames) an area, on which it will detect sufficient amount of face attributes.
When the iteration process is finished, the algorithm groups those selected areas, rejecting those groups, in which there are fewer areas than that minimal amount, given by us as a function argument before. So in our case, the algorithm will reject every group with less than ten areas.
bool faceFound = frontalfaces.Length > 0
The DetectMultiScale method returns a board of Rectangle objects, which indicates those image areas, which may contain a face. Naturally, in a case of returning empty board, it means that no face has been recognised on the image.
Applying face recognition
So if we know how does face detection work, let’s learn something about face recognition. At first, we have to create an object, which will be responsible for that.
var recognizer = new EigenFaceRecognizer(80)
In that case, we give a number of components in a constructor. Those components are crucial face attributes, and analysing them will help algorithms to compare. In that case, it means, that algorithm has to take 80 most important attributes into consideration.
In the next step, we have to train our object, which recognises faces. We need to prepare two boards – the first one that contains face images that the algorithm needs to learn to recognise, and the second one is a board with identifiers of those faces.
After that, we call Predict method on an object that recognises faces by giving an image, on which we want to recognise a face as a parameter.
In a result, we receive the object of class PredictionResult. It contains of two fields:
- Label (face identifier),
- and Distance (numeric value, which defines with what certainty has given face been recognised by the algorithm).
As you can see, if Label field is set to -1, it means that the algorithm couldn’t match given face to any of known faces. But if it could, we can check what certainty do we have that it is a correct hit, and we return the label of this face only if that confidence is higher than threshold we’ve set.
Face detection in action
In the end, I would like to describe a practical way of using those algorithms in the application.
Let’s talk about a situation when you or one of your coworkers walks away from the computer without locking the screen. Amazement after coming back can be enormous, especially, if creativity of our coworkers is above the average.
I usually remember to lock the screen but I don’t want to take the risk, and I wrote a simple application which works in the background. It downloads the image from the laptop’s webcam from time to time, and it tries to detect user’s face.