In this article, I’m trying to give technical clues on how to implement a simple marker detection engine similar to the technology provided in the well-known ARtoolkit software.
We are going to focus on the simplest case: a marker that contains black and white squares such as presented below.
Digging the web, I found few months ago this tremendous piece of library: ARUCO. It is a tool based on a free to use BSD license, that provides a basic marker detection implementation. What’s interesting is that it demonstrates that one can simply use OpenCV to make augmented reality projects.
According to the site explanations and my own knowledge, I would like to summarize common general steps to achieve a frame to frame marker detection
- Access to the camera image: video capture is not simple and OS dependent. However, gracefully provided by OpenCV is a video capture module named “cvCapture” . It is based on ffmpeg on one side and on windows ™ direct show on the other side.
- Find camera calibration intrinsic parameters: a focal distance and an optical center should be enough, but one can also determine distortions and other terms.
- Provide an algorithm that can detect image edges/borders. Usually we can select one of the following option (according to the robustness we want to reach) :
- Simple Threshold method (fast but not robust to lightening variations),
- Canny Threshold method (quite expensive in terms of CPU usage, but more robust)
- Adaptive Threshold method (which is an extension of the simple threshold principle using neighboring pixels to find the correct value of the threshold).
- … Others
- These functions are available in OpenCV (cvAdaptiveThreshold, cvCanny, …) and quite well optimized.
- Convert bordering pixels into polygons (OpenCV : cvFindContour), and find outer polygons (inner polygons should be inside the black and white marker) that contains 4 segments exactly.
- For each polygon in the list of candidates:
- Project the current polygon into a square (projective reprojection).
- Verify that there is a black border around the center area of the marker.
- Try to identify the code contained inside the marker. ARUCO uses a hamming code so that errors can be detected.
- Once the marker is identified, it is possible to estimate its position and orientation (also named pose) according to the camera reference. To do that, there exists multiple possibilities:
- Estimate the exact marker pose by solving a polynomial equation. This equation needs 3 points and can result in zero, ideally one or up to four solutions. Usually, the 4th point is used to find the best solution by projecting its 3D coordinates on the image plane using previous pose estimation and camera calibration estimation. The distance between this projected 2D point and the real coordinates measure an error that we wish to minimize. The problem with this technic is that there is no exact solution because the calibration is always an approximation of the real world projective information. Thus the resulting pose becomes a bad approximation and a “jitter” effect appears on the frame to frame process (an example of object tracking with jitter effects on youtube.com).
- As the first solution only give an exact solution considering 3 points. We explore here a non-exact solution that can be obtained by optimizing a projection distance using the 4 points given by the principle. To do that we can use a standard Gauss Newton algorithm (or a Gauss Newton based Levenberg-Marquardt algorithm to avoid strange parallax inversions when the previous 4 possible exact solutions are close to each other). Notice that once again, OpenCV provides functions to resolve this kind of equations (see cvFindExtrinsicParams for more information).
- Last but not least, you have to integrate a rendering module to your library since it is mandatory (is it?) to display virtual objects to make augmented reality applications.
- How to create markers for ARUCO?
- Some reference publications on the subject: http://www.hitl.washington.edu/artoolkit/publications/