Interactive Modeling with an Indycam
and Complex Domain-Specific Constraints

Lucas Pereira
Olaf Hall-Holt
CS223B - Intro to Computer Vision
348c: Sensing for Graphics
March, 1997


Contents:


Quick Links:


Results:

Here are a few of the models that we generated. You can click on the picture to download an Inventor 2.0 scene-description file. Note that we saved only the geometry, not the instance of the domain, so the model is not editable (beyond basic operations).


Overview:

The goal of this project was to try to expand the class of real-life objects that can be modeled relatively easily, without expensive equipment such as a a laser scanner. For this reason, we chose to limit ourselves to two (fairly) commonly available input devices: an Indycam, and an intelligent user.

Our intent was to develop a proof-of-concept program that would explore the issues, challenges, and decisions involved in writing such a program. As such, our program is missing quite a few basic features that would eventually be desirable, such as saving an instance of a domain, rather than just the Inventor file. However, we think we've built a subset large enough to demonstrate that this approach may have some merit, especially for classes of objects that lend well to procedural description.

Our basic idea is that we break the modeling process into two distinct phases:

This project does not really introduce any new vision algorithms. Rather, it provides a framework in which such algorithms can be applied. In fact, the programmer of the domain-development phase has almost unlimited freedom in his choice of actions. We hope that virtually any kind of vision algorithm or scanning technique (such as laser scanning) could be incorporated in this framework. Furthermore, since the domain developer can add any type of data structure to the domain, the domain developer could even add such things as confidence values, standard deviations, and interval arithmetic. The only requirement is that the developer can then map this data (in whatever form) to some instance as an Inventor file.


Background:

Our paper goes into more detail about background work done by others. Paul Debevec's work on image & geometry-based rendering is particularly relevant. His implementation could be considered an example of one (well-built) domain.


Domains:

The domains are written as C++ classes. We hope that this would encourage the re-use of helpful code through inheritance. As an example, we developed four domains:


User Interface:


The SKETCH, modcam, and Debevec interfaces.

Modcam has a simple user interface based on the examiner viewer in the SGI's Open Inventor package. As you can see in the above (middle) snapshot, there are three windows. The top window is used for interacting exclusively with the developing model, while the lower two windows allow interaction with pictures and/or the model. Each window has an associated set of standard widgets, which allow for rotation, translation, zoom, and other operations.

The current mode of interaction is shown in the top left of the top window. The default mode is "none", in which all manipulations change the camera. Some modes are domain-independant, like the Texture mode, used to assign textures to faces of the model, and the Edge mode, used to specify edges in the picture(s). Domain specific modes also exist for actions that only make sense in particular contexts. For example, a mode exists for directly modifying how open an open book should appear. Finally, there are a couple menu options, such as for saving an Inventor file. We expect a more sophisticated interface could be valuable for more complex domains, but this approach has served us well so far.

The avaliable commands are listed in the modcam user's guide.


Calibration:

, ==>

We employed the techniques presented in Prof. Tomasi's Introduction to Computer Vision course to calibrate the Indycam. We first took a picture of a series of concentric circles on a flat paper at a known distance from the camera. The circles were exactly .5 cm apart, and we estimated the distance to the camera's focal point to be 23 cm. We examined this picture to ascertain that the aspect ratio was close to 1 (it was within 1% of 1) and to calculate the field of view, which is roughly 42 by 30 degrees. We then found the pixel coordinates of the intersections of the circles with the horizontal centerline, and used a least-squares fit to approximate the fifth-order polynomial that could best map uniformly spaced pixel coordinates to the coordinates found along the centerline. With the assumption that geometric distortion is radially symmetric, we used this fifth-order polynomial to "unwarp" subsequent images. Specifically, given a pixel location in the unwarped image, we mapped it back using our polynomial to the corresponding position on a ray out from the center on the original image, and interpolated between the nearest four pixels to obtain the desired color value.


Edge Alignment:

We are experimenting with a variety of techniques to make picking edges easy. We would like to be able to draw along an edge, and have a more precise calculation refine the position of the edge to some high accuracy. At the moment, our algorithm proceeds in three steps, detailed below.

First, we simplify the problem by assuming that we are looking for an edge that is straight, and looking for a straight segment that well approximates the user input. We draw a segment connecting the first and last points of the user's curve, then translate it orthogonally to balance the "weight" of the points on either side.

Second, we define a cost function on each of the points of our initial image. We are using a simplification of the cost function used by Mortensen and Barrett for their Intelligent Scissors program, namely a linear combination of the Laplacian zero-crossing function and the gradient magnitude. The latter of these is scaled to lie in the range [0,1], and then we use weights of .7 and .3 for the linear combination. This defines a cost on each grid point. We then extend these samples to a planar region by bilinear interpolation, and define the cost function for a segment to be the line integral of this function over the length of the segment, divided by the length of the segment.

Third, we make local modifications to the endpoints until a local minimum of the cost function is reached. So far, we have implemented an approach that only examines segments whose end-points lie on the grid. We consider all the neighboring gridpoints of our initial endpoints, and pick the pair of neighbors which reduces the cost function by the greatest amount. We then iterate, but limit movement to the a single general direction after the first iteration. That is, we only allow an endpoint to move toward one of the neighboring pixels of the initial pixel "direction" after the first iteration.

The above technique for edge location has several drawbacks. First of all, the cost function has many local minima for typical images, so many that the edge can easily "get stuck" due to variations in image intensity that seem invisible to the casual observer. One way to address this problem would be to define our cost function using a pyramid of resampled images, so that more distant, but strong, edge features can have a larger region of influence. Second, it is not that uncommon on very strong edges for this technique to shrink the length of the segment to almost zero. It would be preferable for the segment to instead have a propensity for growth, so that only a section of an edge needs to be indicated in order for the entire edge to be recovered, as was done in the Debevec's facade program. For this, we could redefine our cost function to reward greater lengths, but we would need to do so carefully to avoid making them overlong. Finally, we would like to use gradient descent methods in order to allow edge location to sub-pixel accuracy. At the moment, however, the quality of the images we get from the camera may not justify any further enhancement in this direction.


Parameter Optimization

In order to recover the camera position and more accurately estimate the proportions of our modelled shapes, we followed the work of Debevec et. al. to create a non-linear local optimizer. For the moment, our implementation focuses on simple boxes. Given sufficient edge correspondences in the image, we calculate the positions of the projections of four or more corners of the box in the picture. We then define our cost function to be the sum of the squared distances to corresponding points in our box model when projected using the current camera and object parameters. Using a standard conjugate gradient method (Newton-Raphson and Fletcher-Reeves) we quickly obtained our desired transform matrices. We hope to extend this to handle more complex domains, and at least allow the positioning of edges relative to the current best guess for the camera position.


Future Work:

Here's a few items left on our to-do list at the end of the quarter:


Code Organization:


References

Note:
= New references (not mentioned in our paper)
bold = major references


  1. Curless, Brian and Marc Levoy, "A Volumetric Method for Building Complex Models from Range Images," Siggraph 1996, 303-312.

  2. Debevec, Paul E., Camillo J. Taylor, and Jitendra Malik, "Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image- based approach," Siggraph 1996, 11-20.

  3. Galyean, Tinsley, "Sculpting: An Interactive Volumetric Modeling Technique," Siggraph 1991, 267-274.

  4. Gortler, Steven J., Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen, "The Lumigraph," Siggraph 1996, 43-54.

  5. Hanrahan, Pat and Paul Haeberli, "Direct WYSIWYG Painting and Texturing on 3D Shapes", Siggraph 1990, 215-223.

  6. Mortensen, Eric N., and William A. Barrett, "Intelligent Scissors for Image Composition," Siggraph 1995, 191-198.

  7. Sato, Yoichi and Katsushi Ikeuchi, "Reflectance analysis for 3D computer graphics model generation," technical report CMU-CS-95-146, 1995.

  8. Shewchuk, Jonathan R., "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain," technical report CMU-CS-94-125, March 1994. (postscript available)

  9. Szeliski, Richard, "Rapid Octree Construction from Image Sequences," CVGIP: Image Understanding 58, 1 (July 1993), 23-32.

  10. Taubin, Gabriel, "A signal processing approach to fair surface design," Siggraph 1995, 351-358.

  11. Tsai, Roger Y., "A Versatile Camera Calibration Technique for High Accuracy 3D Machine Vision Metrology Using Off-the-shelf TV Cameras and Lenses," IEEE Journal of Robotics and Automation, 3(4):323-344, August 1987. (Software available.)

  12. Wang, Sidney W., and Arie E. Kaufman, "Volume Sculpting," 1995 Symposium on Interactive 3D Graphics, 151-156.

  13. Zeleznik, Robert C., Kenneth P. Herndon, and John F. Hughes, "SKETCH: An Interface for Sketching 3D Scenes," Siggraph 1996, 163-170.