Scrutinizing Real-World Geometry: the Next Best View

When the Greeks were modeling the real world, they used the simplest, most beautiful polyhedra they could find. With the advent of commercially available range-scanning devices, not only have geometric representations become more complex, but methods for obtaining geometric data have as well. This paper reviews the development of algorithms which compute the best location for a range scanning device relative to a real world object, in order to obtain the next in a series of automatic scans. Section I summarizes the problem and categorizes known approaches at a high level, section II reviews the various papers individually, and section III briefly summarizes directions for further study.

I. The Problem and General Background

An example of a range scanning device is a laser triangulation scanner, like the one in our lab, which projects a plane of light onto an object and views the narrow band of an object so illuminated with a digital camera. The information from the camera is processed to automatically recover the positions of several points on the object surface. This procedure can then be repeated from more viewpoints to gather more positional data. In general, many iterations of scanning are required in order to build a reasonably complete model of the visible surface of a real world object. During intermediate stages, however, it is desirable to use the data obtained so far in order to compute what scan should be taken next. This Next Best View problem (NBV) is interesting because it is an intuitively compelling geometry problem with a definite real-world application.

NBV is intuitive, in the sense that it is fairly easy to describe, and also fairly easy to propose strategies for solving it. NBV is also hard for several reasons. A good solution has to work well in an applied context, that is, it will have to be quickly computable in the presence of megabytes of raw data. The geometric constraints are complex, since the objects to be scanned are often non-convex. In the midst of these requirements, it is still not at all clear what representation should be used for the available geometric information, nor even how to describe the output of a general algorithm. NBV is also related to other problems in robot sensing that are notoriously difficult due to the size of the search space. A good NBV solution may incorporate computational geometry, computer graphics, and robotics techniques all at once. As such, it is an exciting problem for study.

After a careful literature search, I believe I have found all the papers on NBV in general circulation, which is not many. The approaches published so far can be categorized in a variety of ways. Each paper reviewed here took a different approach to representing geometry information, using octrees [C85], feature based geometry [HK89], superquadrics [WF90], edges found with edge detectors [MB93], a simple occupancy grid [BZW+95], BSP trees among others [TAT95], and a decimated triangle mesh [P96]. Similarly, a wide variety of approaches have been taken for representing the parts of the surface where there is no data yet: facets of octree nodes [C85], logical hypotheses [HK89], a shell of uncertainty [WF90], individual pixels [MB93], and quadrilaterals [BZW+95]. These choices of representations have in some sense determined the algorithms that are then applied. For those that relied on simple computer graphics primitives, brute-force raytracing usually enters at some stage[C85,WF90,BZW+95,P96], whereas those containing hypothesis testing or probability distributions used more artificial intelligence or vision techniques [HK89,MB93]. Also, for the majority of these papers, the solution space is considered to be so large and complex that the most popular approach is to generate possible solutions with regular or semi-random sampling techniques, and then to compare these solutions in order to obtain a favorite. The space used to carry out this final comparison also varies among the different approaches. In most, the domain of integration is over a space that includes a projection of the given geometry [WF90,BZW+95,P96], but others consider the object itself [C85], a separate measure of ambiguity [HK89], or the configuration space of the scanner [MB93]. Different approaches also make use of different types of assumptions about the object and scanner models: about half make no assumptions about the object geometry or topology [C85,BZW+95,P96], and a similar group of papers [C85,WF90,BZW+95,P96] model the scanner as a device that can measure distance along any vector. This is despite references to triangulation based scanner technology, which requires visibility from two well separated directions at once. In the most recent paper [P96] it would be trivial to include the latter element.

The Next Best View problem can also be broken down into a "begin game", in which one wants to quickly obtain an overall shape for an object, a "middle game", where large unseen areas can be scanned, and an "end game", in which difficult concavities are explored. None of the literature so far makes explicit reference to these stages, and instead attempts to address them all simultaneously. Although some papers are not usefully classified this way, three approaches [C85,WF90,BZW+95] can be considered to apply most to the begin game, one [P96] applies best to the middle game, and none (except perhaps [MB93]) apply to the end game. The lack of work on the end game perhaps corresponds to the increased difficulty of finding just the right view when there are many obstacles in the way, especially since computational efficiency is a criterion.

Most of the papers used fair to good prose, a consistent level of detail, and decent figures. The bibliographies also appeared to cover the relevant literature, relative to the time they were published. Few however included a critical evaluation of their own work, and similarly few tried to test their algorithms in any kind of demanding environment. The latter is certainly partly due to the high cost of the relevant technology and the sophistication required in order to efficiently process the data needed before NBV becomes relevant, but that would not preclude attempts to use more complex objects and/or scanner configurations.

II. Review of Individual Papers

The earliest paper [C85] on this topic is short and sweet: it presents two algorithms, and observations on the pros and cons of each of these algorithms, in just two pages, not counting figures. The algorithms are what one might expect in an early paper, namely the most direct approach of ray-tracing a large number of samples, and a fast but inaccurate heuristic. The former is noted to be too slow (30 minutes on a VAX 11/780) even for the size of the author's dataset. The latter algorithm adds up the boundary area of unseen octree nodes which border empty octree nodes, separately in each of the six axial directions, and then picks a vector based on three of these values. This clearly makes no allowance for geometric occlusions, and produces only a vague guess as to how to maximize the amount of new data that will be obtained in the next scan. However, both these approaches include ideas that have been used repeatedly by other authors. This paper could have been improved by making better figures (they are hand-shaded, and not terribly carefully at that) and considering more possibilities for NBV algorithms. The strengths of using an octree model could have been summarized better as well.

The second paper [HK89] in the series reviewed here is an example from a larger body of literature on robot sensing, in particular, planning through the use of object models and hypothesis testing. I have included it here because it takes such a different approach than the majority of the NBV papers proper, which may lead to a better understanding about some aspect of the problem. The essential idea of this paper is to try to find a sensor location that will minimize an ambiguity metric in the space of hypotheses about what type of object is in front of the sensor. This is done in part with the help of an aspect graph, which is used to group similar sensor placement operations relative to particular hypothesized objects. Ambiguity is measured using concepts from information theory, which can lead to a more sophisticated approach in defining the value of a new scan. Unfortunately, these types of approaches do not appear to be practical for direct use in NBV due to their differing assumptions about objects and their huge computational complexity.

The next paper [WF90] is related to the above in that it tries to take an intermediate stand on issues of object modeling and ambiguity measurement. The authors begin by assuming that the given object can be segmented into pieces that can in turn be modeled with superquadrics. Since superquadrics involve only two more parameters than ellipsoids, a "shell of uncertainty" can be computed in which a partially seen superquadric is most likely to be found. The ambiguity reduction obtained from a given viewpoint can then be measured as the average thickness of the shell of uncertainty in the viewing region. Unfortunately, this approach seems to already be computationally challenging for a very small number of superquadrics, and the initial assumptions are probably too restrictive in the majority of cases. This paper would have been strengthened by rewriting it to be more concise, adding definitions for terms used, and considering the applicability of ambiguity measures with other geometric primitives.

The fourth paper [MB93] begins with a very nice review of the literature, a review mentioned in the sensor planning literature survey [TAT95]. This paper is clearly a work of great dedication by the authors, which includes a clear definition of the problem, lots of nice figures, careful consideration of a variety of special cases in the algorithm, results from a real scanner, and a fairly direct appraisal of some of the drawbacks of their methods. Unfortunately, the easier part of their algorithm is exponential in the number of occluding edges, and relies heavily on heuristic vision techniques. One gets the impression that the general approach is to try to make the computer simulate what a human being might do if faced with a similar situation. This paper does however consider the case of a triangulation scanner outright: NBV is split into two parts, the first of which is to obtain all possible data when the laser is pointed in a particular direction, and the second part is to find the directions in which the laser should be pointed. The advantages and disadvantages of this separation is not explored. This paper could have been strengthened through consideration of less ambitious techniques and perhaps a greater familiarity with ray-tracing and/or aspect graph algorithms.

The fifth paper [BZW+95], by Banta, Zhien, Wang, Zhang, Smith, and Abidi, may unfortunately represent inefficiencies in committee approaches to scholarly writing. Despite a fairly lengthy bibliography, this paper uses the least elegant approach to representing geometry (an occupancy grid) and a correspondingly simple methodology, in which an approximation to the surface curvature of the geometry obtained so far is used to choose three possible viewpoints, which are then evaluated relative to each other by ray tracing the entire model. The level of detail in this paper wanders, and the terminology is imprecise. On the bright side, this paper considers some issues that are not addressed in other papers, in particular, characteristics of the path followed by the scanner, and the rate of information recovery per scan over time. The figures are decent, and the abstract is well written. This paper would be strengthened by making it more concise, perhaps focusing to a greater extent on a particular part of the range scanning process, and adding a conclusion.

The paper by Tarabanis, Allen, and Tsai [TAT95] surveys sensor planning techniques, primarily for feature detection in environments where the overall geometry is already known. Although this problem set differs from NBV, when NBV is recast as a problem in scanning well defined areas of the current geometry, similarities become apparent. This survey goes into some detail describing the HEAVEN, VIO, ICE, SRI, and MVP systems, each with its own particular problem focus and methods. The paper distinguishes between generate-and-test approaches, such as the majority of those seen for NBV, and analytic approaches, in which the complete solution space for a particular constraint is characterized and then synthesized with other solution spaces, or searched using a multi-parameter optimization function. The latter are lionized as being well suited for the case of multiple constraints, including, for example, end effector pose, zoom, and focus of the camera. The geometric calculations, however, are very interesting since they appear to use a variety of techniques for sampling (e.g. an adaptively subdivided tiling of a surrounding sphere) and ray calculation (BSP tree preprocessing, convex hulls of pre-images of point sets). In general the geometry seems to be more simple than that usually considered in NBV, which makes slower, more precise approaches available. This paper was well written, even if it did wander a little in the level of detail, and includes an extensive bibliography in sensor planning.

The most recent paper [P96] on NBV contains the best algorithm yet published. The author, Richard Pito, has actually published several other papers on NBV, of which this last one is a refinement. An intermediate space between the object and the scanner is constructed to hold information about discretely sampled views of the object, which then can be efficiently traversed in order to calculate a measure of the value of a particular scan. The separation between calculating viewing information from the object geometry, and calculating the value of views that will be obtained during a scan, seems very useful. Unfortunately, it takes work for a reader to determine exactly how useful the approach as a whole can be, since the author avoids any critical evaluation of his own algorithm. In addition, no quantitative data is presented to characterize the performance of this algorithm. It appears that each triangle of the decimation is used to send a ray to each of the cells of the sample points on the intermediate surface, doing ray tracing on each one. In this case, if either the number of triangles grows, or the number of samples grows, this calculation will be prohibitive. Thus it appears that this algorithm is best suited to the "mid-game", where small features and precise positioning of the scanner are not required. Indeed, the example given at the end of this paper appears rather trivial, since it generates six evenly spaced scans for the input geometry. The text is concise to the point of being vague at points, and several assumptions are not made explicit. Nevertheless, this paper presents the state of the art right now.

III. Further Development

After reading these papers, it appears that many possibilities exist for novel research. Most notably, there does not seem to be an algorithm devised yet that can handle the end game efficiently, nor that can handle large amounts of data. These requirements may in fact be the most useful in applications. The search for appropriate representations for the input geometry, and for the areas which remain to be scanned, is apparently wide open, and hybrid models or multi-resolution techniques may lead the way as the size and speed requirements converge. Similarly, many algorithmic avenues remain to be investigated, with new representations or old. More careful sampling, randomized approaches, and avoidance of ray-tracing may be useful. Also, the research to date does not attempt quantitative assessments of performance, much less error analyses on the results. Benchmarks for these algorithms remain to be invented, as do simulations with more complex datasets. In addition, new scanner configurations continue to be invented, which challenge the assumptions of previous models. In the long term, the framework for good solutions to NBV may provide the key to efficient, useful representations of the world around us, perhaps even capturing elements of complexity and beauty from the ancients.

Bibliography

[BZW+95]: Banta, J.E., et al. "A "best-next-view" algorithm for three-dimensional scene reconstruction using range images." Intelligent Robots and Computer Vision XIV: Algorithms, Techniques, Active Vision, and Materials Handling. Held: Philadelphia, PA, USA, 23-26 Oct. 1995. PROCEEDINGS OF THE SPIE - THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING (1995) vol.2588, p. 418-29.
[C85]: Connolly, C.I. "The Determination of Next Best Views." PROCEEDINGS OF THE 1985 IEEE INT. CONF. ON ROBOTICS AND AUTOMATION (1985) p. 432-435.
[HK89]: Hutchinson, S.A., et al. "Planning sensing strategies in a robot work cell with multi-sensor capabilities." IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION (Dec. 1989) vol.5, no.6, p. 765-83.
[MB93]: Maver, J., et al. "Occlusions as a guide for planning the next view." IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (May 1993) vol.15, no.5, p. 417-33.
[P96]: Pito, R. "A sensor-based solution to the "next best view" problem." Proceedings of 13th International Conference on Pattern Recognition. Held: Vienna, Austria, 25-29 Aug. 1996. (USA: IEEE Comput. Soc. Press, 1996. p. 941-5 vol.1)
[TAT95]: Tarabanis, K.A., et al. "A survey of sensor planning in computer vision." IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION (Feb. 1995) vol.11, no.1, p. 86-104.
[WF90]: Whaite, P., et al. "From uncertainty to visual exploration." PROCEEDINGS. THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION. Held: Osaka, Japan, 4-7 Dec. 1990. (USA: IEEE Comput. Soc. Press, 1990. p. 690-7)