Introduction
The Stanford Graphics Lab is building an array of 128 cameras for high performance
imaging applications [1]. Many applications we are targetting require very precise
calibration of the cameras. Previously, we were using third-party implementations
[2][3] of Zhang's algorithm [4] to calibrate each
camera independently. But this does not let us model all the constraints in our calibration procedure,
and yields model parameters that have some inconsistancies. For example, the calibration process
involves moving a patterned grid (of known geometry) in front of the cameras. The computed motion of the
grid in the world should be the same (upto a rigid transform) with respect to each camera. But calibrating
each camera independently does not let us incorporate such constraints in the optimization. Consequently,
the camera parameters we get fit the observed data well, but are not mutually consistent - a disaster for
multi-view algorithms we wish to implement with our array.
In this document, I describe how I implemented an optimization routine using the MATLAB optimization toolbox [5]. The aim was to get accurate, consistent values for the calibration parameters while keeping the system scalable to 128 cameras. Here, I talk primarily about optimization; motivation for the problem can be found in my project proposal.
The next section describes implementation decisions I had to make, and issues in parametrizing the problem with its constraints. Thereafter, I discuss making best use of the MATLAB API for the particular optimization problem I wish to solve. I present inital results on a data set acquired from a four-camera prototype (the array is still under construction), and conclude with analysis and thoughts on how these techniques could be applied to other problems in my research.
The matlab function
The results are encouraging: the number of function evaluations drops to 19 per iteration. This is consistent
with the fact that there are at most 18 non-zero entries in any row of the Jacobian. Adding this option lets
the minimization proceed till the limit on iterations is reached, we even see a slightly smaller residual in
the MATLAB output.
Since the difference between the analytic derivatives and those estimated by finite differencing was of the
order of 10-5, I was not suprised there was not any increase in accuracy. The good news was that
having programmer-supplied Jacobians permits us to speedup computation by exploiting its sparsity. This will
be important as the size of the problem increases with more cameras.
This is a tremendous win, both in terms of time and space. The full Jacobian is MxN, where M depends on the
number of observations we are fitting a model to, and N is a constant plus 12 times the number of cameras.
The sparse Jacobian, however, has no more than 18M non-zero entries. This is very encouraging, as it
does not increase if we add more cameras - beyond the fact that we will have more observations.
Since
Unfortunately, the completion of the camera array took longer than I anticipated and so I was not able to run
the optimization on 128 cameras. However, when the construction of the array is finished, the calibration code
will be ready for it.
Parametrizing the Problem
Objective Function
We wish to find the camera parameters that accurately predict the pixel coordinates of a point in 3D
from the point's world coordinates. As input, we have a large number of points with whose 3D coordinates
(possibly as a function of some optimiation parameters) in the owrld and pixel coordinate the camera images
are known. Our objective function simply computes the "pixel error": the difference between the observed
images of 3D points and what is predicted by the current estimate of model parameters.lsqnonlin
is the ideal candidate to minimize this in the L2
norm. There is an advantage in the special structure of a nonlinear least squares problem versus a general
minization (fminunc, fmincon
): the gradient and hessian of the 2-norm of the objective function
can be easily computed using the jacobian and previously computed hessians [6, Pg 134].
The drawback - at least in MATLAB - is that we cannot specify additional constraints but have to solve an
unconstrained problem. Moreover, we cannot eliminate potential outliers during the minimization. We stick
with lsqnonlin
, however, because we can encode the important constraints in the parametrization
described below, and our feature detector conservatively filters out outliers.Model Parameters
We use the standard camera model of computer vision: a perspective camera, with some nonlinear lens
distortion. A detailed description of the parameters may be found in [4]. We take care to
model everything in the calibration process consistently, this enables us to overcome the
drawback of the
previous approach cited above. For example, to ensure that the grid motion (in the global world coordinate
frame) is same with respect to all cameras, we add additional parameters to describe the grid motion and use
these same set of parameters for all cameras.
Initialization
We use the output of the previous version of our system as an initial guess for our system.
Termination Conditions
MATLAB has several parameters to terminate the minimization, including limiting of iterations, function
evaluations, thresholding step size, etc. Although we experimented briefly with varying these, we found
that the defaults work very well after the enhancements described below.
Exploiting the MATLAB API
Here I describe the various stages in the development of our optimization procedure. The data set used to
illustrate our experiments was from a four-camera protoype, with 2940 data points and 78 model parameters.
Since each observation point has two coordinates (x and y), the Jacobian is 5880 x 78.
Details of the experiments and the actual MATLAB output are tabulated in the next
section.
Naive Implementation
As the first implementation, I simply coded up the objective function, and handed it off to lsqnonlin
, without any information on sparsity or partial derivatives. This worked better than I expected it
to, although the optimization was slow, and terminated due to the threshold on number of function evaluations,
the residual was significantly minimized by that stage. The main observation from the
MATLAB output was that most of the function evaluations were beind used to estimate the Jacobian.
Indeed, there were 79 function evaluations per iteration, of which 78 were being used for finite differences.
This inspired the first enhancement.
Structure of the Jacobian
The pixel error for any one observed 3D point depends only on the camera which observed it. It is
independent of parameters of other cameras, or parameters describing other images seen by the same camera.
Therefore, it has very few non-zero partial derivatives, and the Jacobian is sparse, with well-structured
blocks of non-zero elements. We can increase efficiency by passing the structure of the Jacobian to MATLAB
so it doesn't waste function evaluations trying to estimate partial derivatives known to be zero. This is
accomplished by enabling the JacobPattern
option in the options passed to lsqnonlin
.
Analytic Derivatives
Encouraged by this improvement, I decided to go the whole way and provide analytic expressions for the
partial derivatives. While the objective function is relatively straightforward to code up, its partial
derivatives certainly are not! This was by far the most time consuming portion of the project: it took
about 75% of the total time and hogs 50% of the total code. The 'DerivativeCheck'
option, which
compares programmer-supplied derivatives to those estimted via finite differencing was extremely helpful in
debugging, but even with this it often took hours to track down the sources of error. Fortunately, I
managed to end up with a reasonably modular, bug-free implementation.Exploiting Sparsity
As a final improvement, I decided to make MATLAB exploit the sparsity of the Jacobian in matrix multiplication.
The minimization requires the Jacobian to compute products of the type J*Y, JT*Y,
JT*J*Y
. MATLAB accepts a user specified function for evaluating these products. Hence, the
full Jacobian need not ever be formed during the minimization. I implemented a sparse representation of the
Jacobian and function to compute the required products, and passed them to the minimization by enabling the
'JacobMult'
option. As expected, this runs much faster.flop counts
are no longer available in MATLAB, I'm not able to provide an accurate
measure of increase in speed. However, it is very likely this will be a major source of speedup when we
calibrate the entire array of 128 cameras.
Experiments and Results
The following table summarizes our results and how they were affected by the improvements described above.
The data set was from a four-camera prototype. Each camera took six images, there were a total of 2940
points. The time taken per iteration is approximate.
Implementation
Normalized Residual Error
Time per Iteration
Termination
MATLAB Output
Naive, unoptimized
0.3929 pixels
7.1 sec
Exceeded number of function evaluations
naive.txt
Used Jacobian structure
0.3908 pixels
2.5 sec
Exceeded maximum number of iterations
jpattern.txt
Jacobian evaluated analytically
0.3905 pixels
0.53 sec
Exceeded maximum number of iterations
sparsity.txt
Conclusions
The primary objective of this project to have a system that could do an accurate, scalable global calibration,
which has largely been accomplished. In the process, I learnt much about writing useful nonlinear optimizations
and some good enhancements. The accuracy of this camera calibration is reasonable, I believe most of the residual
error is due to imperfect feature detection and localization. Our automated feature detector, which provides
the input data to which we fit, has an accuracy of about 0.3 pixels.References
Jean-Yves Bouget.
Zhenghyou Zhang, MSR Tech. Report MSR-TR-98-71
The MathWorks
Philip Gill, Walter Murray, Margaret Wright, Academic Press 1999