CS348b Final Project

Kurt Akeley, 6 June 2002


The understanding of ray tracing that I am developing in this course suggests that it breaks down into two fundamental operations:
  1. Ray Casting. Determining the intersection of a ray with the first object in the scene that it meets. This involves acceleration structures, intersection calculations, transformations, etc. And
  2. Ray Generation. Deciding, based on optics, statistics, probability, sampling theory, surface characteristics, etc. what rays to generate. Some of the calculations required to generate these rays are fairly deterministic (e.g. reflection, refraction), some are regular even if not determined specifically by the geometry (e.g. camera aperture), and some are completely statistical in nature (e.g. diffuse and semi-specular surface parameters).
In general the calculations of Ray Casting (as defined above) are much more deterministic and regular than those of Ray Generation, especially in the case of scenes composed entirely of triangles or height fields. Given the rapid increase in the performance of rendering hardware, and the trend toward implementing special-purpose rendering hardware that supports ray tracing, it is interesting to consider implementing algorithms that emphasize the deterministic operation of Ray Casting, and deemphasize relatively complex forms of statistical Ray Generation.


I identify two goals for this project:
  1. Height field Acceleration. Develop height field representations and ray casting algorithms that are efficient and very high performance.
  2. Surface Simulation. Implement as a height field a microfacet-like surface that can be shaded by simply casting and following reflected rays. (No diffuse or semi-specular surface characteristics.) Attempt to match renderings of this surface with those of the model microfacet-shaded surface.

Technical Challenges

Much of the challenge here falls in the domain of computer engineering: minimizing data storage, identifying algorithm sequences with early-out optimizations, making good use of hardware caches, and trading off setup and execution time effectively. A lesser challenge is tuning a high-resolution geometric surface to match the characteristics of a microfacet shader.

Code Optimizations

For height field optimization:
  1. Optimized triangle intersection. Starting with the algorithm described in "Fast, Minimum Storage Ray/Triangle Intersection", I implemented intersection code that is optimized for a single cell in a height field. Such a cell comprises two triangles that share an edge. This shared edge is used as "edge2" in the algorithm, allowing much duplication in computation to be avoided. Furthermore the intersection code's flow of control is organized as an expanding tree, allowing computation for each triangle to be performed only as needed. (But requiring some duplication of code to implement all the paths.)
  2. Object coordinate intersection. The "walking" of the height field almost has to be done in object coordinates, as is the bounding box test. The triangle intersection code now also operates in object coordinates. No precomputation is done, and only the provided height field values are stored, so setup is very fast. Normals are computed only when an intersection is detected.
  3. Fast above/below rejection. Even the accelerated quad intersection test is expensive, so it makes sense to do trivial rejection of cells when possible. When "MINMAX" is defined, the code compiles to a version that computes entry and exit height values for the ray as it is traced. If the ray's height is increasing, each cell first tests if the ray's entry height is above the maximum for the cell, or the ray's exit height is below the minimum for the cell. Alternately, if the ray's height is decreasing, each cell first tests if the entry ray height is below the minimum for the cell, or the ray's exit height is above the maximum for the cell. Separate code loops are implemented for the increasing height and decreasing height cases, to minimize the code length within the loops. Rather than storing a minimum and maximum value for each cell, which would triple the storage requirement for the height field, two 2-bit indexes are stored for each cell. (Thus the storage increase is 25% for a single byte.) Since the height field is contiguous, the ray must intersect a triangle to go from above to below the height field surface. So an additional optimization could be made to reduce the min/max test to a single test per cell. I have not done this, though.
  4. Tiled storage. Storing large arrays in the obvious 2D manner can result in poor memory coherence when tracing a path. Tiling is a well-known solution to this problem. When "TILED" is defined, the code compiles to a version that does simple bit arithmetic on all cell X and Y addresses, allowing 2D locality of data reference. I made no attempt to align these 2D tiles with cache lines, however.
  5. Height field replication. I also modified the heightfield code and the the API module to allow a single height field to be replicated in X and Y. In RIB the replication factor for a dimension replaces the text string "nonperiodic" in the PatchMesh primitive. Thus extremely large tiled fields can be defined with the data of a single tile. The data size in a dimension that is replicated must be a power of 2, however.


Performance of various code versions
Setup (sec) Intersect (sec) Total (sec) Code (lines)
Original 35 1179 1213.9 75
HW2 16 18 33.9 450
Quad Test 4 10 13.8 750
Min/Max 4 6 10.3 1175
Tiled 4 6 10.3 1200

Surface Model

A simple C program generates a height field that is a hexagonally packed array of hemispheres. One tile of its output, represented as a 1024x1024 height field, looks like this:

The hemispheres cover roughly 90% of the area in this tight packing. If smaller hemispheres are inserted into the interstitial spaces, the coverage can be increased to roughly 95%:

All the subsequent renderings were done using this interstitial packing, though it likely has a very small effect on the results. Because unmodified versions of LRT cannot handle height field replication, I ran the benchmark tests using the following 1024x1024 height field. Note that the single height field defines a 4-tile by 4-tile surface:


I rendered images of the reflection of three small colored spheres of a near-horizontal planar surface. The "traditional" images use a truly flat surface whose material parameter is set to

Surface "shinymetal" "Ka" 0 "Ks" .25 "Kr" 0 "roughness" rough

where rough is varied. To make the lighting work properly, I added point light sources at the centers of the spheres. It is these light sources, rather than the spheres themselves, that actually are "reflected" off the surface.

The microfacet renderings all replace the flat metalic surface with a replicated height field based on the tile described above. The tile replication is 256x256 in each image, resulting in over 100 billion triangles. The surface material is specified as purely reflective

Surface "shinymetal" "Ka" 0 "Ks" 0 "Kr" 1

No lights are included, so the images simply show the reflection of the small colored spheres off the microfacet surface. To match variations in the roughness of the traditional images, the microfacet surfaces are scaled vertically, resulting in highly warped hemispheres.

The first image comparison is at low roughness (high specularity) and is a fairly good match:

Traditional Rendering Microfacet Rendering
(Roughness = 0.01) (Vertical Scale = 0.1)
(4 Samples / Pixel) (256 Samples / Pixel)

The second image comparison is at higher roughness, hence less sharp, and here the match isn't as good. Note that even with 256 samples per pixel the Microfacet rendering is undersampled in terms of shading.
Traditional Rendering Microfacet Rendering
(Roughness = 0.1) (Vertical Scale = 0.3)
(4 Samples / Pixel) (256 Samples / Pixel)


I am happy with the results of tuning and extending the height field capability of LRT. The performance increase of optimized intersection, reduced initialization and storage, and min/max testing are significant, and came with reasonable code complexity. I was surprised that tiling the memory access had no positive effect on performance, but I also didn't investigate this very carefully. It is possible that careful alignment with cache lines would make a difference.

The image matches are close enough to demonstrate the feasibility of modeling surface reflectance as pure geometry, though there is certainly room for improvement here. While it is instructive to see that a straightforward ray tracer can handle huge surface complexity (100's of billions of triangles served) it is more instructive to see just how much complexity is required to brute force shading this way. At 256 samples per pixel the shading is just adequate, compared to 4 samples (and it could be one) for the typical shading approach. It sure pays to do this stuff right!

Thus, regarding the motivation of this project, I'm convinced that brute force geometric rendering (emphasizing Ray Casting over Ray Generation), while perhaps attractive in a simplistic hardware sense, is almost certainly a losing proposition for even the simplest of scenes. Looks like statistics, probability, and all the rest of the theory are here to stay.

LRT Code