In this post, i will remind the reader about the famous Gauss Newton optimization technique. We will see how Jacobian matrices are required to achieve such a kind of optimization and how it applies to our common computer vision problems.

# Main goal

I wish to remind the principles of the Gauss Newton approach to **optimize a function**. I’m going to explain what are the Jacobian matrices and how they can be used to minimize a *projective* function that projects points from the 3D space to the camera CCD plane. Optimizing this function is very useful to** estimate the pose configuration** (affine transformation represented with six degrees of freedom) of an object regarding to the camera’s optical center frame.

# Jacobian Matrix

- Contain partial derivatives of a general function $F: \mathbb{R}^n \rightarrow \mathbb{R}^m$,
- Notations : $F’$, $J_F$.
- A very good explanation and examples are given in related wikipedia’s page,
- Given a point $p$ in $\mathbb{R}^n$, if $F$ is differentiable near $p$, we can write for points $x$ close to $p$ : $F(x) = F(p) + F’(p).(x-p) + \epsilon(\|x-p\|)$. The error term $\epsilon$ depends notably on the distance between the two points.
- Simple case when $f:\mathbb{R} \rightarrow \mathbb{R}$. This is the famous Taylor series truncated to the first order.

# Function of interest

here is presented the *projective* function we want to optimize in order to retrieve a pose configuration (representing an affine transformation). This function projects space points $Q_i$ into image points $q_i$ using few parameters:

- $f_x$, $f_y$ (or almost equivalently field of view and image aspect ratio) representing image focal lengths: these values are computed or approximated priorly and are viewed as
**constants**. - $c_x$, $c_y$ pointing image center (in pixels): these values are also computed or approximated priorly and are viewed as
**constants**. **image skew**:**not taken into account**here,**image distortions: neglected**as well,**pose configuration**$x$, $y$, $z$, $r_x$, $r_y$, $r_z$ corresponding to the**real unknown**of the*projective*function. All other parameters are constant (including 3D points $Q_i$).

We wish to optimize the projective function $F$ by picking best values for $P = (x, y, z, r_x, r_y, r_z)$ (also called *parameter vector*) such that applying $F$ to given 3D points $Q_i$ leads to image points $q_i$ that are as close as possible to observed (or measured by some image processing functions) image points $q_i$.

Formally, we wish to find the vector $\hat{P}$, such that $q = F(\hat{P})-\epsilon$ for which $\|\epsilon\|$ is minimum.

We know from a previous article that in our case, $F$ is not linear (because of sinus and cosinus in the rotation matrix), and has the form : $F_i = K(RQ_i + t)$. That’s the reason why we need to optimize the function using derivatives.

Note that here, we denote $F_i$, the projective function that corresponds to the i-th 3D point $Q_i$.

The Jacobian matrix $F_i’ = (u/w, v/w)_i’$ for each given observation is computed as follow:

$$

F_i’ =

\begin{equation}

\begin{pmatrix}

\frac{\partial{\frac{u}{w}}}{\partial{x}} & \frac{\partial{\frac{u}{w}}}{\partial{y}} & \frac{\partial{\frac{u}{w}}}{\partial{z}} & \frac{\partial{\frac{u}{w}}}{\partial{r_x}} & \frac{\partial{\frac{u}{w}}}{\partial{r_y}} & \frac{\partial{\frac{u}{w}}}{\partial{r_z}} \\

\frac{\partial{\frac{v}{w}}}{\partial{x}} & \frac{\partial{\frac{v}{w}}}{\partial{y}} & \frac{\partial{\frac{v}{w}}}{\partial{z}} & \frac{\partial{\frac{v}{w}}}{\partial{r_x}} & \frac{\partial{\frac{v}{w}}}{\partial{r_y}} & \frac{\partial{\frac{v}{w}}}{\partial{r_z}}

\end{pmatrix}

\end{equation}

$$

with $\frac{\partial{\frac{u}{w}}}{\partial{x}} = \frac{\frac{\partial{u}}{\partial{x}}.w – \frac{\partial{w}}{\partial{x}}.u}{w^2}$, a composition of partial derivatives.

These partial derivatives $\frac{\partial{(u,v,w)^T}}{\partial{.}}$ are estimated as follow:

$$

\begin{matrix}

\frac{\partial{(u,v,w)^T}}{\partial{x}} = K\frac{\partial{t}}{\partial{x}}; &

\frac{\partial{(u,v,w)^T}}{\partial{y}} = K\frac{\partial{t}}{\partial{y}}; &

\frac{\partial{(u,v,w)^T}}{\partial{z}} = K\frac{\partial{t}}{\partial{z}}; &

\end{matrix}

$$

$$

\begin{matrix}

\frac{\partial{(u,v,w)^T}}{\partial{r_x}} = K\frac{\partial{R.Q}}{\partial{r_x}}; &

\frac{\partial{(u,v,w)^T}}{\partial{r_y}} = K\frac{\partial{R.Q}}{\partial{r_y}}; &

\frac{\partial{(u,v,w)^T}}{\partial{r_z}} = K\frac{\partial{R.Q}}{\partial{r_z}}; &

\end{matrix}

$$

# Gauss-Newton iteration

We start with an initial estimation of vector $P$: $P_0$. This estimation can be obtained using a previous estimation that corresponds to a previous frame, or using a direct estimation (eg. with the P3P principle, or the DLT principle).

Reprojection error can be mesured as the sum of the reprojection errors of each 3D point $Q_i$: $\sum_i(F_i(P_0) – q_i) = E_0$. In our case $E_0$, the reprojection error, is a 2-vector.

If we consider a small displacement $\delta$ at $P_0$, we approximate this displacement by $F_i(P_0+\delta) = F_i(P_0) + F_i’(P_0).\delta = F_i(P_0) + J_i.\delta$ (for simplification purpose).

We wish to update $P_0$ to $P_1=P_0+\delta$ such that $P_1$ minimizes:

$$

\sum_i(F_i(P_1)-q_i) = \sum_i(F_i(P_0)+J_i.\delta – q_i) = E_0 + \sum_iJ_i.\delta

$$

Equivalently, we wish to minimize $\| E_0 + \sum_iJ_i.\delta\|$. $\delta$ is obtained by solving normal equations, leading us to the system:

$$

\sum_iJ_i^T.(\sum_iJ_i.\delta – E_0) = 0 \\

\sum_iJ_i^T.\sum_iJ_i.\delta = \sum_iJ_i^T.E_0

$$

We can then invert $\sum_iJ_i^T.\sum_iJ_i$ for example using a SVD decomposition and find the (locally) optimum displacement $\delta = (\sum_iJ_i^T.\sum_iJ_i)^{-1}J_i^TE_0$.

This $\delta$ displacement is the result of a Gauss-Newton iteration. You can reiterate the same operations to converge to a local minimum. Only very few iterations are necessary for our computer vision problem.

If one want to avoid local minimum convergence, he should try the extension provided by the Levenberg-Marquardt principle.