In this post we will review the principle proposed by Paul Viola & Michael Jones on “Object Detection”. Despite it is not a new technique (2001), it remains a great approach to detect faces (it is embedded on most of today’s camera) and a perfect example to understand machine learning principles.
- The famous article proposed by P. Viola and M. Jones: Robust Real-Time Object Detection.
- Some slides of the talk.
- A more complete approach based that uses tilted Haar features.
- A post of mine on Face Tracking. Notice that before being able to track faces, we need to locate and “identify” them using a face detection approach.
- Rectangular features based on Haar wavelets (see picture). Features A, B, C and D can have variable size and variable position in the main rectangular frame. As a consequence, it exists thousands of different possibilities.
- Sum of pixels that belong to black regions ($S_b$) are subtracted to the sum of pixels that lie in the white regions ($S_w$). If $S_b$-$S_w$>T (or $S_w$-$S_b$>T) we have a feature of interest.
- Integral images are used to compute efficient summations over a features.
- These features are called “weak features” since using just few of these features can’t be discriminant. So,
- Plenty of these features are required to construct a “strong classifier”,
- We have to select the most relevant features in order to increase the recognition rate (see later Adaboost section). Combining weak features leads to a strong classifier.
Principle of the Object Detector Classification
- Given a certain amount of images (eg 1000 images), some of them containing faces (that were located before use) and some other randomly chosen,
- Create as much A, B, C or D features as possible in the main rectangular frame (few thousands),
- Apply Adaboost using those Haarlike features and Given images to learn a strong classifier.
- Adaboost algorithm selects the best Haar features amongst all the possibilities.
- So we train given images such that each chosen feature maximize the detection rate with images containing faces and minimize the false positive rate with non-face images.
Real-Time Object Detection
Once the strong classifier learnt, the runtime algorithm is straightforward:
- Loop trough the input image using a rectangular frame,
- Compute selected Haar features over the current rectangular frame
- Decide if the current rectangular frame contains “the” object when a linear combination of selected features (plus associated weights) are greater than a computed threshold (see Adaboost paper for further information).
A Flash Sample
This sample is proposed in Flash (so you don’t need to install any plug-in). It detects face, eye, and mouth approximately (because i had to turn off some expensive functions to stay close to real-time).