Psychoacoustics

Psychoacoustics and Binaural Sound

Let's begin our discussion of psychoacoustics with the term "binaural," which originally referred simply to two-channel audio recording (just as binocular refers to two-eyed vision). When the music distribution industry began marketing "stereo" and "stereophonic" recordings, the word fell into disuse, and eventually re-emerged as a specialized term for recordings made using a dummy head with a pair of mics placed in the dummy's ear canals. Unfortunately, dummy head recordings proved impractical for general use — they sound great on headphones, but they don’t mix to mono very well, and they sound inferior to conventional mic placement when reproduced on speakers.

[Curiously, the word "stereo" originally meant "solid," and had nothing to do with any particular kind of 3D imaging, sonic or visual. See stereoscope on this site for more info.]

The purpose of binaural (dummy-head) recording is to improve the reproduction of spatial information, which is generally either not captured, or severely mangled by conventional multi-mic recording practices. The characteristics of a recording that influence our perception of space and overal "realism" are collectively referred to as "psychoacoustics." Stereo was an early attempt to improve the psychoacoustics of consumer recordings by confining sounds from the left to the left speaker and sounds from the right to the right speaker. Binaural recording was a more subtle (and unfortunately less universal) attempt that works much better than stereo on headphones, but much worse than stereo on speakers. Many other psychoacoustic techniques have been explored, and research continues to find new ways to make audio recordings more real.

Ultimately, we may dream of full virtual reality recordings that magically recreate the entire auditory experience of an original event. To do this, we will have to capture subtle differences in the signals received by each ear, and somehow reproduce these subtleties dynamically, as the listener moves his head in the virtual sound field. Probably this will require headphones and head position tracking, with suitable ARTF processing in realtime (see below).

Binaural (dummy head) recordings are exciting to listen to (on headphones) because they are created in a unique manner that captures more of the "sound space" than any other approach to date. Ordinary stereo recreates positioning of virtual sound sources mainly by altering the relative loudness of each source (panning) on the two speakers, which can only simulate some degree of spatiality. In fact, our perception of space is much more sophisticated than that.

Binaural tracks are generally recorded using a realistic sculpted head (often with neck and upper torso) that simulates the size and density of the human body. The pinnae (ear flaps) are faithfully sculpted as well, because they are extremely influential in the way sound is acoustically processed before entering the ear canals. Small microphones are mounted inside the ear canals, ideally at about the location of the ear drum, although many dummy heads mount the microphones at the opening of the canal. The result is uncannily realistic, provided that the listener uses headphones, and also that his or her ear flaps are roughly the same size and shape as the dummy’s. If the listener’s pinnae are quite different from the dummy’s, the psychoacoustic effects can be annoying rather than fulfilling.

Fortunately, other psychoacoustic approaches to capturing the subtleties of spatial information have been developed. One of the most effective spatial recording systems is DSM (Dimensional Stereo Microphones), in which excellent spatial realism can be captured using small condenser mics placed at the temples, somewhat forward of the ear. This latter position affords several specific advantages to binaural recording, including better "stereo" reproduction on speakers, the ability to mix down to mono without destructive phase cancelation, and uncompromised "binaural-like" spatial effects when reproduced on headphones.

The best-known pioneer in this alternate binaural recording approach is Leonard Lombardo, who manufactures some of the best free-mountable binaural pairs available outside the acoustics laboratory. I recommend a visit to his site for tons if interesting material, including some very interesting recordings using his DSM mics.

Anatomical-Related Transfer Function (ARTF)

When sound passes across the head from a single source, the unequal distances from the source to each ear introduce a tiny time-delay, and the acoustic properties of the head and pinnae alter both high-frequency response and loudness. When the entire acoustic environment of the head, neck, and upper torso is taken into account, this complex effect is known as the ARTF (anatomical-related transfer function), or HRTF when the effect of the head alone is considered. The auditory portions of the nervous system then process this information to create additional "dimensions" of the auditory experience. When this information is missing, audio reproduction is not experienced as being fully "real." (This overlap of physics and subjectivity is the domain of psychoaoustics.) Thus, for maximum live recording realism, we need something better than stereo and multi-mic approaches, and less limiting than binaural techniques; thus far the DSM approach seems to be the most practical.

In addition to live recording, it is also possible to simulate psychoacoustic ARTF effects. Using acoustic space simulation software ("convolvers"), one can process an audio track to precisely simulate the left- and right-ear recordings for a sound originating at any particular location in space. When the convolution processing is done judiciously, something close to DSM recordings can be created entirely in the digital domain.

Fortunately, MIT has produced a standard set of HRTF position samples that can be loaded into most commercially-available convolvers. Using these MIT samples, I have processed some of my eight-channel recordings so that they can be appreciated in two-channel simulated binaural form on headphones. This is a good alternative to setting up a special room in your house with eight amps and speakers, just to listen to some weird electronic music. With luck, the good scientists at MIT may some day produce a set of DSM convolver samples for both spatial realism and stereo compatibility.

Multiple Speakers

Another approach to improving the psychoacoustics of recordings is to use more than two speakers. In this case, instead of trying to recreate the two-ear experience, we are trying to recreate the original multidimensional sound field. Various types of "surround-sound" schemes have become popular in home theater installations, but these systems have a more limited purpose — they seek to recreate a generalized acoustic space within which the movie sounds can be placed. What surround sound tends to avoid is the creation of a new acoustic environment in its own right.

To put this another way, consider the typical 5.1 surround setup. A "center" speaker is used to reproduce most of the film’s dialog, while two larger front-side speakers handle most of the rest of the soundtrack. A pair of much smaller side/rear speakers is used to simulate ambient sound (reflections from distant objects, rear walls, etc.) and also to provide occasional directional effects (planes flying past, explosions, doors slamming, etc.). Finally, a subwoofer is used for low frequence effects such as earthquakes, crashes, and the like. The system, in other words, is a collection of specialized subsystems for enhancing the main sound, and does not function as a set of independent sources.

Consider instead, a room with a speaker on each wall, and a speaker in each corner. These eight speakers are full range (including subwoofers), and are of equal size and capacity, so "front" or "center" are no longer meaningful. Instead of trying to simulate the experience of sitting in front of a sound-stage with reflected sound from rear walls, we are creating a new, specific experience of sitting among eight separate musicians. This approach is not designed for supporting movies or any other flat medium (like musicians on a bandstand) — it is designed to provide an intrinsically three-dimensional experience.

What happens when we listen to eight-channel recordings in this room? We find ourselves immersed in something completely different from "surround sound." Each speaker can reproduce a portion of a composition without any kind of electronic or speaker-induced interaction of the other parts of the music. What is more, all the psychoacoustic ARTF information is present, because it is not even a part of the recording — it is real. As a result, the "mix" or integration of the eight sources is in the listener’s head (or mind, to be more precise) instead of the pan-pots on the studio mixing board. Just as we can listen to any one of several simultaneous conversations in a room full of people, so now can we shift our attention from one musical component to another, effectively becoming part of the final audio experience.

I call this configuration Ashtangakasha (eight-fold space), and I have installed Ashtangakasha listening rooms in a few locations. The reactions of listeners have been most encouraging — the difference is profound when you compare eight discrete sources (Ashtangakasha) with just two sources sharing eight signals (stereo, with or without surround processing). Since my own work is an exploration of timbre and tonality, emphasising unfamiliar or unrecognizable sound textures, when these sources are mixed in stereo the listener cannot discriminate one sound from another at all. The mixed sound becomes the new sound, and the individual textures are integrated before they are heard. With the Ashtangakasha setup, each texture remains discrete, and the listener can shift attention from one to another at will, mixing his own final version of the piece in realtime.

The challenge, of course, is to find some way to reproduce this experience without building an Ashtangakasha installation. The solution is to process the eight tracks psychoacoustically with the ARTF transfer functions, and then mix the result into a two-channel recording. When heard on headphones, the result is that each of the original eight tracks still has its unique ARTF signature, and the mind can still discriminate each source individually. This preserves much of the experience of being in the eight-channel listening room. Some day, when we finally have a dynamic ARTF processing technology, the virtualization will be complete, because the listener can turn his head and experience the discrete sound sources remaining stable in the projected space outside his head.

This is, in fact, the basis of our entire experience of the external world as a volumetric, three-dimensional environment. We build that subjective model psychoacoustically, using the most primordial of our senses (touch, elaborated as hearing), and then project it into a virtual reality "outside" that is, for all intents and purposes, the real world.

Examples will be posted here, along with links to interesting binaural and spatial recordings on other sites. For the time being, please refer to the City of Nine Gates album, which currently offers a few of its tracks in synthetic DSM form. (Note that psychoacoustic effects benefit significantly from high quality reproduction, and compressed tracks or mediocre headphones can’t provide the full experience.)