3D Virtual Sound Project: System Overview

ejhong@cs.stanford.edu

System Overview

There are several significant challenges involved in implementing a 3D virtual sound system that can spatialize multiple sound sources in real time. Most real-time sound systems will utilize a set of HRTFs and convolve these in real time with the input source to produce the perception of spatialized sound. However, these systems ignore the reverberation effects caused by echoes when a sound is heard in an environmental context. Systems which model these effects generally use some form of ray-tracing to calculate reflections. This process is costly though and is difficult to do in real-time. This project focused on achieving real time audio spatialization and thus did not model any complex reverberation effects.

The system implemented for this project allows for spatialization of several simultaneous sound sources and used a head tracking device to provide the listener's head position.

Two methods were used for providing localized sound sources. The first is real-time convolution of the input sound with the appropriate HRTF. The second is pre-convolving the sound with a sample of the HRTFs distributed evenly around the listener's head and storing these filtered samples in an interleaved sound file. At run-time, the four (or three if the HRTF for the position directly overhead was used) closest filtered sound samples stored in the interleaved sound file are linearly interpolated to produce the output sample.

The perception of distance is implemented by using the inverse squared law to calculate the gain to be applied to an input sound based on the distance between listener and source. In addition, if a preprocessed sound file containing the reverberation of a sound is provided, this is mixed in at a constant level with the sound to produce the effect of increasing the R/D ratio as the listener moves away from the sound source.

An event driven manager is used to specify the times that a sound source will occur in the virtual simulation.

The system allows for input from line in ports (so that CD sounds could be localized) as well as from files and can output to both line out and files.

The MIT Kemar samples used were sampled at 32 Khz and were used with a filter length of 128 samples. The set consists of 710 HRTFs sampled in a range between -40 degrees and 90 degrees elevation.

Back to overview


e-mail: ejhong@cs.stanford.edu

Last modified: March 20, 1996 by Eugene Jhong