
3D Virtual Sound Project: Implementation
ejhong@cs.stanford.edu

Note
Please note that these sound files are rather large and that you will need
to get headphones (the best are the kind that actually go inside your ear -
walkman style) to listen to these sounds.
Equipment
The sound system was run on a Silicon Graphics Indigo 2 150 Mhz R4400 with
64 Mb of RAM. A Polhemus Fastrack head tracker was used to obtain head
position information which was sent by network to the sound system.
Earplug style headphones were worn by the listener.
Parameters
The sound inputs used were 16 bit 2s complement big endian sound files
sampled at 32 Khz. A lower sampling rate would have provided for faster
convolution and smaller sound files but in turn would have degraded
localization since higher frequencies provide important localization cues.
A filter length of 128 samples for the discrete fourier transform was used
to perform the convolution.
Convolution
Real-time convolution of the input sound source is achieved by performing
an overlap-add method with the SGI fast fourier transform library (this code
is based on the MIT sample program provided with the HRTF samples).
If N is the input sequence length, this operation requires 2NlogN + N
complex number operations. On the SGI, simultaneous convolution of
two sound sources was achieved. The following are examples of convolved
sounds. The first is one source moving around and then up and over your
head. The second are two sources moving around you.
Interleaved Sound Files
The interleaved sound files are produced by convolving the input
sound with a limited set of the HRTFs and interleaving the samples in
a pre-processed sound file. For each input sequence of length 128 samples,
the sequence is convolved with each HRTF in the set of HRTFs chosen and
placed consecutively in the input sound file. This was implemented using
a total of 10 locations in one hemisphere around the head
(20 HRTFs since each position corresponds to a left-right HRTF pair)
and for also for 24 locations. During playback, a linear interpolation is
applied to the four closest samples (or three if the sample for the 90
degree elevation is used) to the desired elevation and azimuth.
For the 10 location interleaving I chose the 3 equally spaced samples on
elevations -40, 0, 40 and the 90 degree sample. For the 24 location
interleaving I chose the 5 equally spaced samples on elevations -40, -20, 0,
30, 70, and the 90 degree sample. This operation at run-time requires 6N
multiplications and 3N additions. However, it has the disadvantage of
having to load into memory 4 times as much data and of increasing file
size by the number of HRTF samples used. On the SGI, simultaneous
convolution of four sound sources was achieved. The following is
the same sound as heard in the first convolution sample except it is
using an interleaved sound file with 24 HRTF positions.
Distance
The perception of distance is implemented by scaling the intensity of
sounds according to the inverse square law. The user specifies the
maximum distance at which a sound may be heard and the intensity of
the sound is scaled according to the inverse square law such that
the minimum gain for the system (0.01 used) occurs when the distance
between the source and the listener is at the maximum distance. Specification
in this manner is convenient since a sound source can be ignored if it
is beyond the maximum distance.
Reverberation
The effect of reverberation can be achieved by convolving an input with
noise that has been shaped with an exponentially-decaying amplitude envelope
which models the early reflections and late reverberations of a room or
with an actual room impulse response. This is a costly computation since
it requires convolution with a very large filter length. The system
provides a means for creating pre-processed reverberation files by
specifying length of the impulse response (most significant reverberation
occurs within 0.4 seconds of the sound), the decay rate of the
reverberation, and the magnitudes and times of early reflections. This
pre-processed reverberation file can then be used either directly as
a sound input in the system, or can be added at a constant level to
the direct sound while the direct sound is scaled according to distance
in order to change the R/D ratio according to distance from source.
Event Driven Sound System
The system provides an event driven interface to specify when sounds
occur and can be used to provide a repeating sound with a delay. Sounds
can also be placed at random positions around the user when repeated
(which is useful for example in a forest scene where bird noises may
come from varying directions). In one processing round, the system
checks for an update in listener position, checks for new events at the
given time - which includes starting to play new sound sources,
and stopping playing sound sources - and then processes all sound sources
that are currently playing and within the audible range of the user. The
addition of moving sound sources is currently not implemented but only
involves the addition of a new event to update a source position at
specified intervals.
Back to overview
e-mail:
ejhong@cs.stanford.edu
Last modified: March 20, 1996 by Eugene Jhong