The design of most graphics architectures involves quantitative study of one or more aspects of the graphics pipeline. We have carried out several such studies and have developed a set of tools for making such studies easier to perform. We are currently working on generalizing the tools to simulate a broader range of architectures.
To date, we have completed five studies:
A study on texture compression looked at the use of vector quantization techniques to allow texture memory to be used more efficiently and concluded that compression ratios of up to 35:1 were possible with acceptable loss of image quality. Vector quantization is amenable to efficient hardware implementation.
A study on texture caching examined locality of reference with respect to texture accesses and explored the use of caches to exploit this locality and reduce memory bandwidth requirements. The study concluded that the use of a texture cache could reduce texture memory bandwidth requirements by three to fifteen times.
A study on texture prefetching proposed a specialized prefetching architecture for hiding the latency of texture cache misses without introducing additional conflict misses. A cycle-accurate simulation showed that, even in a system with high memory latencies, the architecture was over 97% efficient.
A study on overlap revealed that the impact of overlap in bucket rendering systems was limited in scope, despite the observation that large triangles have large overlap factors. The paper concluded that, if current trends continue, the efficient use of smaller and smaller tile sizes would be possible.
Finally, a study on parallel texture caching examined the efficiency of texture caching in a variety of current and proposed parallel rasterization architectures. We quantify inefficiencies due to redundant work, inherent parallel load imbalance, insufficient memory bandwidth, and resource contention and demonstrate that in general parallel texture caching works well.
In addition to the studies we have completed is an ongoing effort to acquire and analyze scenes.
All of our most recent architectural studies have made use of command stream traces gathered from real OpenGL applications. Such a trace captures all of the OpenGL functions calls made by an application, allowing the entire graphics portion of the application to be replayed at a later time. Capturing and replaying a trace has two important advantages over rerunning the original application:
Capturing a trace imposes a small per-command overhead on the running application. Depending on the application, this overhead may or may not have an effect on the commands stored in the trace. Whether or not this matters depends on what the trace is used for. For most architectural studies, the effect does not matter.
In order to collect our traces, we assembled a set of tools for capturing, replaying, and analyzing command streams from real applications. The tools are currently center around two command stream file formats: GLS, a file format for storing streams of OpenGL commands, and FGLS, a similar file format for storing streams of FruGL commands. FruGL is a light-weight graphics API based on OpenGL and used to interface to our research rendering system.
Figure 1 illustrates the tools we use and the flow of scene data between them.
Most traces are gathered by running real OpenGL applications and tracing them with glstrace, a utility written by Phil Lacroute of SGI. The resulting GLS streams are then converted into FGLS streams using glstofgls. FGLS streams may also be gathered by tracing FruGL applications; however, we rarely collect traces in this manner.
Once a command stream is in the FGLS format, it can be replayed using the Argus rendering library, analyzed by a scene analyzer, or run through an architectural simulator.
Figure 1: Scene data processing. |
While the tool flow shown in Figure 1 serves us well, particular aspects were cumbersome enough to warrant the development of a second, simpler set of tools.
Our new tracing tools store traces as GLT streams. GLT streams are similar to GLS streams in that both formats store streams of OpenGL commands; however, the implementation details of GLT streams are such that we find GLT streams to be more efficient and easier to use than GLS streams. Moreover, because we developed GLT from scratch, we were able to port the tracing software from IRIX to Windows NT, allowing us to collect traces on both platforms.
Figure 2 shows the data flow for our second generation of tools.
Using the new tools, traces may be gathered only by tracing OpenGL applications. Once in the GLT format, traces may be replayed using any OpenGL rendering library. In addition, traces may be analyzed using a scene analyzer or studied with an architectural simulator.
Figure 2: Second generation scene data processing. |
Some parts of our rendering system are still based on FruGL and FGLS. We support these parts of our rendering system using a tool, not shown above, that converts GLT streams into FGLS streams in a manner similar to glstofgls.
[Update: Our GLT tools are now in the public domain, and can be downloaded.]
Texture Caching
Texture Compression
Texture Prefetching
Overlap and Bucket Rendering
FruGL Tracing and FGLS
OpenGL Tracing and GLS
The Argus Rendering Library