Summary

Challenges to the Successful Implementation of 3-D Sound
by Durand R. Begault

Begault begins by explaining that "3-D audio technology" is simply the next step for audio, just as stereophonic and quadraphonic devices were in the past. He explains that at the heart of this new idea is a digital filter based on the head related transfer function (HRTF) and that three main obstacles remain to be overcome before this technology will be competed. These include: 1.) Eliminating front-back reversals, and minimizing localization error 2.) Reducing the amount of data necessary to represent the most perceptually salient features of HRTF measurements 3.) Resolving conflicts between desired frequency and phase response characteristics and measured HRTFs. He addresses each of these problems in depth and outlines advances and problems that still need to be overcome for each of these obstacles. He believes that many of these challenges will be overcome in the near future. This article will provide great direction for anyone who is interested in doing research in 3D audio.






Modeling the Elevation Characteristics of the Head-Related Impulse Response,
by C. P. Brown

This is a Master's Thesis from San Jose State University, by a student of Richard Duda's. The thesis concentrates on identifying the elevation-dependent components of the head related impulse response (HRIR). The HRIR captures the phsyical effects of the diffraction of sound waves by the torso, shoulders, head, and pinnae. The thesis attempts to improve the computational complexity of current HRIR's by using a simpler data-driven model to approximate the HRIR. The results of the thesis were mixed. However, the explanations and background in the thesis are clear, which make it worth reading.






A Spatial Feature Extraction and Regularization Model for the Head-Related Transfer Function
by J. Chen, B. D. Van Veen, K. E. Hecox






Technologies for Three Dimensional Sound Presentation Issues in Subjective Evaluation of the Spatial Image
by Elizabeth A. Cohen

In Cohen's own words this article addresses: "some of the current technology available for creating the illusionion of three dimentional sound." Examples of systems designed for headphone and loudspeaker presentation are discussed. In addition, she addresses the topics of idealized pinnae functions, audition environment, reporduction media, image robustness, localization, and spaciousness. All of these topics are presented at an introductory level, without much technical insight. Nevertheless this article provides insight into who is currently conducting research in this field of study and what exactly they focus on.






Calculator Program for Head-Related Transfer Function
by D. H. Cooper

This article briefly discusses methods of calculating the HRTF, based on two models of the head. The first is simply two sensors a certain width apart, with no acoustical blocking effects of the head, and the second a spherical head model developed initially by Rayleigh. The author used the spherical head model HRTF to make stereo imaging calculations and compared these to data-driven calculations. The author then desribes his method for performing such calculations on an HP calculator.






Head-Related Two-Channel Stereophony with Loudspeaker Reproduction
by P. Damaske

This is an early paper on crosstalk cancellation building on the results of Schroeder and Atal. The author attempts to eliminate phantom sound sources resulting from the sound diffraction of the head. He presents two methods, which have been used in the past to do this, and introduces a new method.

The tester adjusts the delay and attenuation of the right channel so that the person tested can hear the noise from the left. From the horizontal location of the phantom sources a filter was defined. Then in a recording room, a sentence was repeated from several directions and the signals at the ear of a dummy head recorded. Test subjects were then instructed to define the position of the talker from the recorded sound signals after they had been adjusted using the designed filter.

In conclusion Damaske claims that under ideal listening conditions, which he enforced rigorously during his experiments, all directional information may be reproduced.

This article still has analog filter specifications, which are worth taking a look at.






Multiprocessor 3D Sound System
by Mohamed El-Sharkawy, Newton Guillen, Waleed Eshmawy, Brad Langhorst, and Harry Gundrum.

The authors propose a simple multiprocessor 3-D sound system that allows the listener freedom in speaker placement without restricting the listener's position. Their implementation is based on a combination of a binaural spatializer and a crosstalk canceller.

A short but thorough introduction to a binaural spatializer and crosstalk cancellation is provided as well as an architectural overview of the designed system. The details of the system are hidden in Guillen's Master's Thesis, which we are going to do our best to get our hands on.

Several volunteers evaluated the performance of the system. Although the tests do not appear to have been performed very rigorously, phenomenal results were found. These including placing the virtual source 120 degrees behind the listener and being able to change the virtual position in real time using a graphical user interface.






A Realtime Multichannel Room Simulator
by Bill Gardner

Gardner describes a six channel real-time audio system that takes a monophonic input signal and renders a reverberant field using minimal processing power and speakers. The described system requires six DSPs and six speakers, and the algorithm used to determine the virtual source is referenced. A model for air absorption is provided and an indepth discussion about Diffuse Reverberation is given. In the end Gardner uses three different Reverberators for different room sizes. A brief summary of his simulation proceedures is provided as well as a short summary of his results.






Head Tracked 3-D Audio Using Loudspeakers
by William Gardner

Gardner attempts to create a virtual acoustic display using conventional loudspeakers rather than headphones. His discussion relies on the use of an accurate head tracker to customize the equalization zone created by the speakers to the listener.

Gardner's two key ideas are: 1.) The ear signals corresponding to the target scene are synthesized by appropriately encoding directional cues, a process known as "binaural synthesis," and 2) these signals are delivered to the listener by inverting the transmission paths that exist from the speakers to the listener, a process known as "crosstalk cancellation."

In a very short paragraph, he lays out the implementation of the system that compensates for head motion using a head tracker. He tests his implementation on only a couple of subjects and reports skewed results. He concludes that steering the localization zone greatly improves localization performance especially in the case that it is horizontally displaced.






Fundamental and Technological Limitations of Immersive Audio Systems
by C. Kyriakakis

This paper claims a need for an accurate reproduction of audio, because "a mismatch between the aurally-perceived and visually-observed positions of a particular sound causes a cognitive dissonance that can limit the desired suspension of belief." They focus on the need for this technology for the specific case of integrated media workstation users.

The setup of a typical media workstation is outlined as well as the problems associated with this setup. The two solutions, direct-path dominant design, and correct low-frequency response are introduced. With the use of crosstalk concellation the authors seem confident that an accurate sound-field can be created in one exact position. This system relies on the use of a head-tracking system, which is used on a standard video camera connected to a PC.






Immersive Audio for Desktop Systems
by C. Kyriakakis and T. Holman

This article traces the history of immersive audio systems up to the present day and then describes the basic limitations to these systems. The author then proposes a new system that he has developed that addresses these issues. Kyriakakis groups the fundamental limitations into two categories: those arising from physical laws and those arising from technological considerations such as computational power. Since computational capability will no doubt increase, addressing the limitation imposed by the laws of acoustics, human auditory perception, etcetera remains an area of tremendous research effort.

There is a brief summary of the history of immersive audio, including two-channel stereo, a four channel matrixed quadraphonic system, and multichannel surround sound.

The remainder of the article focuses on Spatial Audio. Basic physiological signal processing principles are discussed. The work done to date in headphone rendering is discussed, and Kyriakakis summarizes the main drawbacks of these methods. Loudspeakers, with cross-channel cancellation, attempts to remedy some of these problems. The author then outlines the requirements and difficulties associated specifically with 3-D audio rendering in a desktop environment.

Improvements to these systems are presented in the form of future research directions that the author has taken. Systems proposed include a desktop audio system with head tracking that follows the listener and adjusts the filter outputs accordingly, and a system that performs listener pinna classification to fit the listener's pinna shape to the best one available.

Review 2

This brief conference paper presents a short overview of the unique requirements for creating 3-D audio in a desktop environment. The author outlines a set of design criteria for desktop audio production including direct path dominant design, where the sound field created around the listener is more dominant than the refle ted and reverberant sound, and correct low-frequency response, by capitalizing on the fact that in a desktop environment the speakers and the user are both fixed and therefore do not have to problems associated with different responses associated with different room locations. While the author acknowledges that with crosstalk cancellation, and proper speaker placement, it is possible to create a spatially accurate soundfield, he is quick to note that is it effective for one exact position only. He proposes listener tracking and pinna classification as methods to generalize these systems to multiple users and mutltiple user positions.






Surrounded by Sound
by C. Kyriaskakis, P. Tsakalides, T. Holman

This article offers an overview of recent developments in both the acquisition and rendering of immersive audio. The audio acquisition section describes recent advances in microphone array processing, while the immersive audio rendering section highlights methods to synthesize a 3-D audio environment based on the HRTF. It discusses the method to move from headphone rendering to rendering with two loudspeakers using crosstalk cancellation. However, this article is of a very high level and the details of the implementation are not present.





Unexamined Assumptions in the Commercialization of 3D Audio: Does KEMAR Sleep in a Procrustean Bed?
by William L. Martens

Martens begins this article with the following description: Procrustes was the mythical Greek character who cut his victims or stretched them to fit his bed. He goes on explaining that audio research has become too fixated at representing audio signals using a head-related transfer function. His retoric is redundant, but his point is essentially that the following "assumptions" are incorrect:
1.) sound spacialization works better over stereo headphones than stereo loudspeakers
2.) if spatial sound processing is HRTF-based, then it is probably adequate
3.) spatial sound processing is not adequate unless it is HRTF-based
4.) HRTF add front/back and above/below discrimination to the imagery associated with interaural differences
5.) the gain applied to the HRTF-processed sound is an adequate cue to distance
In conclusion Martens offers "Three Tell-Tale Experiments." These experiments are quite interesting and worth examining. In fact we might attempt to implement one of them for the second half of this semester.






A Computational Model of Spatial Hearing
by Keith Dana Martin

This is a Master's thesis from Cornell University. Martin attempts to implement a model which can successfully determine the location of a sound source in a free-field by detecting onset, interaural differences, interaural phase delay, and interaural envelope delays. After testing his model, he believes to have found a good model except that the perception of interaural phase and envelope delay estimation is not as good as a human's. Both the background and the form of the model are well laid out.






The Use of 3-D Audio in a Synthetic Environment: An Aural rendering for a Distributed Virtual Reality System
by Stephen Travis Pope and Lennart K. Fahl'en

Pope and Fahl'en claim to build "'3-D audio' systems that are robust, listener-independent, real-time, multi-source and able to give stable sound localization." They begin by Outlining human hearing and giving a simple geometric model for simulated effects such as distance, loudness, Left/Right ratio, and reverb ratio. A brief introduction to virtual worlds follows, including a discussion about the architecture, interconnections, and rendering process.

Finally they focus on the auralizer itself. It is a separate process that has a table of all sounds available in the world. When a specific sound is needed a request for it is issued to the auralizer. These requests must include the coordinates and orientation of the target. The auralizer responds as needed to the requests rendering the sound according to a geometric model.

Evaluating the final product, the authors rate their real-time output as "reasonably good". They do however propose several alternative algorithms that are to be implemented in the future, and that they expect will perform better. They clarify that the system was designed so that it would be easily expandable in this regard.






Basic Principles of Stereophonic Sound
by William B. Snow

Snow attempts to review the basic principles of stereophonic sound for the engineers in the motion picture and sound recording industries. In 1953 these industries were very interested in this new technology, since it added much needed realism to their products.

He begins by explicitly defining several critical terms and then slowly builds his discussion to the different system types: monaural, diotic, binaural, monophonic, and stereophonic. Then he focuses on stereophonic reproduction, including topics such as operating conditions, angular perception, and depth perception. Next several miscellaneous topics are covered including number of channels, loudspeakers, microphones, and amplifiers. Finally, several issues which are mainly concerned with recording of stereophonic audio data are discussed: distortion, channel differences, dubbing, and disk recording.

Although the specifics of this article may be severely outdated the principles definitely are not. The basic principles of stereophonic sounds are laid out and are an easily understood.






Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Representations
by B. L. Vercoe, W. G. Gardner, E. D. Scheirer

This article presents an alternative method for mathematically modeling the HRTF, where the HRTF is expressed as the weighted combination of a set of complex valued eigentransfer functions (EF's). The EF's are an orthogonal set of frequency-dependent functions, while the weights are determined from spatial location through a proposed spatial characteristic function (SCF). The SCF is extracted from HRTF data, presumably taken from a KEMAR or other acoustic source. The author claims a mean squared error of typically 1% between the extracted feature HRTF model and the measured HRTF, with an increased error at the contralateral ear, possibly due to head shadowing.






Headphone simulation of free-field listening. I: Psychophysical validation
by F. Wightman, D. J. Kistler

This is a detailed paper describing a 3-D audio rendering experiment done in the Deptartment of Psychology of the University of Wisconsin. The paper describes in detail their methods used for collecting data from real subjects using microphones that sat in the subject's ear canal. The environmental setup is also described in great detail. The article discusses the digital filters constructed from the data and then extensively describes their listening test. The authors then compare their results against other HRTF data in the literature. This is a very useful article to read, especially if one is interested in collecting his/her own HRTF data.






Headphone simulation of free field listening. II: Psychophysical validation
by Frederic L. Wightman and Doris J. Kistler

Wightman and Kistler believe their "procedures provide potential solution to the technical problems associated with manipulation of pinna cues." They begin by arguing that it is necessary to determine the similarity between simulated and real pinna cues and explain that little testing has been done concerning the difference between real and simulated (headphone) sound sources.

Their rigorously controlled test procedure is explained to the last detail. The possible answers of the test subjects and the experimenter's interaction with them are outlined.

The results of their testing include: the apparent position of real sources and virtual sources (while using headphones) are very similar, the number of front-back confusion increases substantially while using headphones, and the elevation is often misjudged as lower, while wearing headphones.






Localization using nonindividualized head-related transfer functions
by E. M. Wenzel, M. Arruda, D. J. Kistler, F. Wightman

This article also from the Deptartment of Psychology of the University of Wisconsin describes an experiement they performed on localization of spatial audio using nonindividualized HRTFs. The article discusses some common localization errors made with current HRTF systems, especially front-back confusions. The claim is made that "good localizers", that is, listeners normally able to localize sound in space accurately also perceive a sense of 3D from audio generated from another person's HRTF data, provided that the HRTF data was derived from another "good localizer". However, with the use of non-individualized HRTF's, increased front-back confusion was reported. There were a number of explanations proposed as to why this occurs.




Back | Index | Next