Argus Performance Analysis

Performance data are collected on both real machines and Simos, a cycle accurate machine simulation environment. Simos not only can simulate real systems like the SGI Challenge, it is also capable of simulating hypothetical machines.

Execution times measured on real machines provide good summary of Argus performance. PC sampling and basic-block counting are two tools we often use to give a first order impression of performance bottlenecks. While sufficient in many circumstances, these tools are often insufficient in examining the complicated behavior of Argus because the implementation uses an internal multithreading library with its own scheduling routine across multiple processes. Using Simos, we divide processor cycles into various cataglories such as operating system vs. user program, computation vs. communication, and algorithm overhead vs. actual rendering work. This approach quantifies varies tasks of parallel rendering and the differences between graphics algorithms.

Programming for good memory behavior is extremely time consuming and machine specific. However, not tuning Argus on a specific system will yield performance numbers clouded by poor memory access patterns. We circumvent this problem by simulating a machine with perfect memory. This allows us to measure the theoritical limit of computation for different algorithms.

Predicting how Argus will perform on future platform is accomplished by running Argus on hypothetical machines with advanced memory and processor characteristics.

miltchen@graphics.stanford.edu