# [math] Gauss Newton Optimization

In this post, i will remind the reader about the famous Gauss Newton optimization technique. We will see how Jacobian matrices are required to achieve such a kind of optimization and how it applies to our common computer vision problems.

# Main goal

I wish to remind the principles of the Gauss Newton approach to optimize a function. I’m going to explain what are the Jacobian matrices and how they can be used to minimize a projective function that projects points from the 3D space to the camera CCD plane. Optimizing this function is very useful to estimate the pose configuration (affine transformation represented with six degrees of freedom) of an object regarding to the camera’s optical center frame.

# Jacobian Matrix

• Contain partial derivatives of a general function $F: \mathbb{R}^n \rightarrow \mathbb{R}^m$,
• Notations : $F’$, $J_F$.
• A very good explanation and examples are given in related wikipedia’s page,
• Given a point $p$ in $\mathbb{R}^n$, if $F$ is differentiable near $p$, we can write for points $x$ close to $p$ : $F(x) = F(p) + F’(p).(x-p) + \epsilon(\|x-p\|)$. The error term $\epsilon$ depends notably on the distance between the two points.
• Simple case when $f:\mathbb{R} \rightarrow \mathbb{R}$. This is the famous Taylor  series truncated to the first order.

# Function of interest

here is presented the projective function we want to optimize in order to retrieve a pose configuration (representing an affine transformation). This function projects space points $Q_i$ into image points $q_i$ using few parameters:

• $f_x$, $f_y$ (or almost equivalently field of view and image aspect ratio) representing image focal lengths: these values are computed or approximated priorly and are viewed as constants.
• $c_x$, $c_y$ pointing image center (in pixels): these values are also computed or approximated priorly and are viewed as constants.
• image skew: not taken into account here,
• image distortions: neglected as well,
• pose configuration $x$, $y$, $z$, $r_x$, $r_y$, $r_z$ corresponding to the real unknown of the projective function. All other parameters are constant (including 3D points $Q_i$).

We wish to optimize the projective function $F$ by picking best values for $P = (x, y, z, r_x, r_y, r_z)$ (also called parameter vector) such that applying $F$ to given 3D points $Q_i$ leads to image points $q_i$ that are as close as possible to observed (or measured by some image processing functions) image points $q_i$.

Formally, we wish to find the vector $\hat{P}$, such that $q = F(\hat{P})-\epsilon$ for which  $\|\epsilon\|$ is minimum.

We know from a previous article that in our case, $F$ is not linear (because of sinus and cosinus in the rotation matrix), and has the form : $F_i = K(RQ_i + t)$. That’s the reason why we need to optimize the function using derivatives.

Note that here, we denote $F_i$, the projective function that corresponds to the i-th 3D point $Q_i$.

The Jacobian matrix $F_i’ = (u/w, v/w)_i’$ for each given observation is computed as follow:

$$F_i’ = \begin{pmatrix} \frac{\partial{\frac{u}{w}}}{\partial{x}} & \frac{\partial{\frac{u}{w}}}{\partial{y}} & \frac{\partial{\frac{u}{w}}}{\partial{z}} & \frac{\partial{\frac{u}{w}}}{\partial{r_x}} & \frac{\partial{\frac{u}{w}}}{\partial{r_y}} & \frac{\partial{\frac{u}{w}}}{\partial{r_z}} \\ \frac{\partial{\frac{v}{w}}}{\partial{x}} & \frac{\partial{\frac{v}{w}}}{\partial{y}} & \frac{\partial{\frac{v}{w}}}{\partial{z}} & \frac{\partial{\frac{v}{w}}}{\partial{r_x}} & \frac{\partial{\frac{v}{w}}}{\partial{r_y}} & \frac{\partial{\frac{v}{w}}}{\partial{r_z}} \end{pmatrix}$$

with $\frac{\partial{\frac{u}{w}}}{\partial{x}} = \frac{\frac{\partial{u}}{\partial{x}}.w – \frac{\partial{w}}{\partial{x}}.u}{w^2}$, a composition of partial derivatives.
These partial derivatives $\frac{\partial{(u,v,w)^T}}{\partial{.}}$ are estimated as follow:
$$\begin{matrix} \frac{\partial{(u,v,w)^T}}{\partial{x}} = K\frac{\partial{t}}{\partial{x}}; & \frac{\partial{(u,v,w)^T}}{\partial{y}} = K\frac{\partial{t}}{\partial{y}}; & \frac{\partial{(u,v,w)^T}}{\partial{z}} = K\frac{\partial{t}}{\partial{z}}; & \end{matrix}$$
$$\begin{matrix} \frac{\partial{(u,v,w)^T}}{\partial{r_x}} = K\frac{\partial{R.Q}}{\partial{r_x}}; & \frac{\partial{(u,v,w)^T}}{\partial{r_y}} = K\frac{\partial{R.Q}}{\partial{r_y}}; & \frac{\partial{(u,v,w)^T}}{\partial{r_z}} = K\frac{\partial{R.Q}}{\partial{r_z}}; & \end{matrix}$$

# Gauss-Newton iteration

We start with an initial estimation of vector $P$: $P_0$. This estimation can be obtained using a previous estimation that corresponds to a previous frame, or using a direct estimation (eg. with the P3P principle, or the DLT principle).

Reprojection error can be mesured as the sum of the reprojection errors of each 3D point $Q_i$: $\sum_i(F_i(P_0) – q_i) = E_0$. In our case $E_0$, the reprojection error, is a 2-vector.

If we consider a small displacement $\delta$ at $P_0$, we approximate this displacement by $F_i(P_0+\delta) = F_i(P_0) + F_i’(P_0).\delta = F_i(P_0) + J_i.\delta$ (for simplification purpose).

We wish to update $P_0$ to $P_1=P_0+\delta$ such that $P_1$ minimizes:
$$\sum_i(F_i(P_1)-q_i) = \sum_i(F_i(P_0)+J_i.\delta – q_i) = E_0 + \sum_iJ_i.\delta$$
Equivalently, we wish to minimize $\| E_0 + \sum_iJ_i.\delta\|$. $\delta$ is obtained by solving normal equations, leading us to the system:
$$\sum_iJ_i^T.(\sum_iJ_i.\delta – E_0) = 0 \\ \sum_iJ_i^T.\sum_iJ_i.\delta = \sum_iJ_i^T.E_0$$
We can then invert $\sum_iJ_i^T.\sum_iJ_i$ for example using a SVD decomposition and find the (locally) optimum displacement $\delta = (\sum_iJ_i^T.\sum_iJ_i)^{-1}J_i^TE_0$.

This $\delta$ displacement is the result of a Gauss-Newton iteration. You can reiterate the same operations to converge to a local minimum. Only very few iterations are necessary for our computer vision problem.
If one want to avoid local minimum convergence, he should try the extension provided by the Levenberg-Marquardt principle.