3D Virtual Sound Project: Implementation

ejhong@cs.stanford.edu

Note

Please note that these sound files are rather large and that you will need to get headphones (the best are the kind that actually go inside your ear - walkman style) to listen to these sounds.

Equipment

The sound system was run on a Silicon Graphics Indigo 2 150 Mhz R4400 with 64 Mb of RAM. A Polhemus Fastrack head tracker was used to obtain head position information which was sent by network to the sound system. Earplug style headphones were worn by the listener.

Parameters

The sound inputs used were 16 bit 2s complement big endian sound files sampled at 32 Khz. A lower sampling rate would have provided for faster convolution and smaller sound files but in turn would have degraded localization since higher frequencies provide important localization cues. A filter length of 128 samples for the discrete fourier transform was used to perform the convolution.

Convolution

Real-time convolution of the input sound source is achieved by performing an overlap-add method with the SGI fast fourier transform library (this code is based on the MIT sample program provided with the HRTF samples). If N is the input sequence length, this operation requires 2NlogN + N complex number operations. On the SGI, simultaneous convolution of two sound sources was achieved. The following are examples of convolved sounds. The first is one source moving around and then up and over your head. The second are two sources moving around you.

Interleaved Sound Files

The interleaved sound files are produced by convolving the input sound with a limited set of the HRTFs and interleaving the samples in a pre-processed sound file. For each input sequence of length 128 samples, the sequence is convolved with each HRTF in the set of HRTFs chosen and placed consecutively in the input sound file. This was implemented using a total of 10 locations in one hemisphere around the head (20 HRTFs since each position corresponds to a left-right HRTF pair) and for also for 24 locations. During playback, a linear interpolation is applied to the four closest samples (or three if the sample for the 90 degree elevation is used) to the desired elevation and azimuth. For the 10 location interleaving I chose the 3 equally spaced samples on elevations -40, 0, 40 and the 90 degree sample. For the 24 location interleaving I chose the 5 equally spaced samples on elevations -40, -20, 0, 30, 70, and the 90 degree sample. This operation at run-time requires 6N multiplications and 3N additions. However, it has the disadvantage of having to load into memory 4 times as much data and of increasing file size by the number of HRTF samples used. On the SGI, simultaneous convolution of four sound sources was achieved. The following is the same sound as heard in the first convolution sample except it is using an interleaved sound file with 24 HRTF positions.

Distance

The perception of distance is implemented by scaling the intensity of sounds according to the inverse square law. The user specifies the maximum distance at which a sound may be heard and the intensity of the sound is scaled according to the inverse square law such that the minimum gain for the system (0.01 used) occurs when the distance between the source and the listener is at the maximum distance. Specification in this manner is convenient since a sound source can be ignored if it is beyond the maximum distance.

Reverberation

The effect of reverberation can be achieved by convolving an input with noise that has been shaped with an exponentially-decaying amplitude envelope which models the early reflections and late reverberations of a room or with an actual room impulse response. This is a costly computation since it requires convolution with a very large filter length. The system provides a means for creating pre-processed reverberation files by specifying length of the impulse response (most significant reverberation occurs within 0.4 seconds of the sound), the decay rate of the reverberation, and the magnitudes and times of early reflections. This pre-processed reverberation file can then be used either directly as a sound input in the system, or can be added at a constant level to the direct sound while the direct sound is scaled according to distance in order to change the R/D ratio according to distance from source.

Event Driven Sound System

The system provides an event driven interface to specify when sounds occur and can be used to provide a repeating sound with a delay. Sounds can also be placed at random positions around the user when repeated (which is useful for example in a forest scene where bird noises may come from varying directions). In one processing round, the system checks for an update in listener position, checks for new events at the given time - which includes starting to play new sound sources, and stopping playing sound sources - and then processes all sound sources that are currently playing and within the audible range of the user. The addition of moving sound sources is currently not implemented but only involves the addition of a new event to update a source position at specified intervals.

Back to overview


e-mail: ejhong@cs.stanford.edu

Last modified: March 20, 1996 by Eugene Jhong