We confront face recognition algorithms every day – in mobile phones, cameras, on Facebook or Snapchat. Because of that, maybe it’s worth to think about the way in which those algorithms work and how can you implement them in your application. I will try to answer these questions below.
This algorithm recognises unique attributes such as eyes, lips or a nose. Thanks to all gathered data during image processing, it can learn how to decide if individual recognised characteristics are placed with each other in a proper position. Analysed image is being processed iteratively, piece by piece, and it rejects those who don’t have characteristics of a human face.
In the end, it recognises more detailed features with every iteration that goes by, and it confirms its opinion that remaining, unrecognised image fragments contain a human face.
Let’s take a look at a piece of the image that is being processed actually, which is marked with a red frame. Within this frame, characteristics are being scanned, and they are marked with black and white rectangles. When the algorithm detects a sufficient number of characteristics within given area, it marks this area and remembers it. Those areas are marked with green frames.
So we already know, in basic terms, how those algorithms work. On the other side, their implementation from scratch would be very interesting but also time-absorbing and problematic. Because of that, I have used ready-to-go libraries. OpenCV library, created by Intel, is the most popular library in the world.
A community is still developing it as an open source library. It was written in C language, but there is a plugin called Emgu.CV, written in C#, which is a wrapper mapping almost everything one-to-one.
The first step is to load set of attributes – algorithm needs to learn how to recognise faces. We don’t have to teach it because we can use many ready-to-go learning sets, available on the Internet in the form of XML files. I have used two of them – the first set teaches how to recognise a face from the front, the second one how to identify a face from the side.
To start detecting, we need to call a DetectMultiScale method on a classifier object, which we have created and trained before.
As the first argument, we give an image, and the next parameter is a scaling size of a frame, which is coloured in red. The value of 1.1 means that with every next iteration, this frame gets bigger by 10%. The next argument defines the minimal amount of recognised areas, which recognise face after grouping. During next iterations, the algorithm marks and remembers (green frames) an area, on which it will detect sufficient amount of face attributes.
When the iteration process is finished, the algorithm groups those selected areas, rejecting those groups, in which there are fewer areas than that minimal amount, given by us as a function argument before. So in our case, the algorithm will reject every group with less than ten areas.
bool faceFound = frontalfaces.Length > 0
The DetectMultiScale method returns a board of Rectangle objects, which indicates those image areas, which may contain a face. Naturally, in a case of returning empty board, it means that no face has been recognised on the image.
So if we know how does face detection work, let’s learn something about face recognition. At first, we have to create an object, which will be responsible for that.
var recognizer = new EigenFaceRecognizer(80)
In that case, we give a number of components in a constructor. Those components are crucial face attributes, and analysing them will help algorithms to compare. In that case, it means, that algorithm has to take 80 most important attributes into consideration.
In the next step, we have to train our object, which recognises faces. We need to prepare two boards – the first one that contains face images that the algorithm needs to learn to recognise, and the second one is a board with identifiers of those faces.
After that, we call Predict method on an object that recognises faces by giving an image, on which we want to recognise a face as a parameter.
In a result, we receive the object of class PredictionResult, which contains two fields: Label (face identifier) and Distance (numeric value, which defines with what certainty has given face been recognised by the algorithm). As it can be seen in the code, if Label field is set to -1, it means, that the algorithm couldn’t match given a face to any of known faces. But if it could, we can check what certainty do we have that it is a correct hit, and we return the label of this face only if that confidence is higher than threshold given by us.
In the end, I would like to describe a practical way of using those algorithms in the application. Let’s talk about a situation when you or one of your coworkers walks away from the computer without locking the screen. Amazement after coming back can be enormous, especially, if creativity of our coworkers is above the average.
I usually remember to lock the screen but I don’t want to take the risk, and I wrote a simple application which works in the background. It downloads the image from the laptop’s webcam from time to time, and it tries to detect user’s face.
Image downloading and analysing is being done every few seconds. If the program won’t detect user’s face few times in a row, it automatically locks the screen. The source code of this application is available on my GitHub.