EDUCATIONAL INFO

Learning Materials For The Aspiring Music Producer & Engineer

Early Reflections In An Enclosed Environment

When sound arrives to a listening position in an enclosed reflective environment it is composed of three sonic elements. The elements are described as 1) direct sound path, 2) early reflections and 3) reverberation.

Early reflections are sounds that arrive to a listening position after being reflected from surfaces of the enclosed listening environment, such as walls, ceilings and floor. They arrive later than the direct sound, often in a range between 15ms to 100ms and arrive before the onset of full reverberation. The early reflections provide to the listening experience the audio evidence regarding the size of the enclosed environment and the sense of distance and dimension of the overall sound. They perform an important role in determining the overall character and sound of the experience.

Reverberation is based on two discrete components “early reflections” and “reverberant field”. The “early reflections” are often recreated in a mix by using digital delays, representing the sound reflected the first few times from all the walls and on rare occasions, the ceiling. To regenerate early reflections, the engineer simply allows a certain amount of feedback on the inserted digital delay.

Reverberation is a collection of complex reflections and a diffusion pattern that builds up over time to a dense thickness after the moment you hear the original direct path sound. That’s why there are controls that are different between reverb plug-ins, such as variable parameters like “spread” and “shape” to control how the reverb thickens and builds up before finally decaying.

Many artificial reverb plug-ins treat the early reflections separately, and possess a separate group of parameters to adjust the reverb attributes accordingly. Regardless of what terminology that is applied to define specific parameters in reverb processing software, it is known that there can be discrete delays that reach a listener in a space that could potentially be characterized as separate from the overall reverberation experience. That is what is referred to when speaking of early reflections (ER). On many occasions, I will use two separate plugins to create a sense of distance and dimension; one for reverb and the other for early reflections. Once the settings of the reverb software and the early reflection software have been finalized, I then fuse both plugins in creating an all-inclusive and holistic sense of distance and dimension.

Direct Path Sound

If you were to suspend two individuals two meters above the ground and two meters apart from each other in an open field, you would be able to set up a situation where they could have a conversation with each other, where the only audio heard is via the direct path route.

There would be no floor, ceiling, or walls to reflect the original sound, where each of the two individuals would describe the audio characteristics as being very dry sounding, without any ambience one would normally hear in an enclosed reflective environment.

When the distance between the two individuals increase, the amplitude and high frequency response decreases, due to the “inverse law of sound” and the absorptive atmospheric conditions.

When one individuals is talking directly on axis in one ear of an individual just listening, the high frequency content of the signal would sound exceptionally clear and subjectively described by the listener as “emotionally intimate”.

In an enclosed environment the direct path sound is always the loudest portion of the overall audio experience, where early reflections and reverberation are always less in high frequency content and in amplitude. The only exception is if the direct path between the sound source and the listener would be obstructed.

The listener always hears the direct sound first, then the early reflections and finally reverberation.

In an enclosed environment like a performance center, the listener will always look on axis at the sound source; therefore the image of the performer is heard as being dead-center, no matter what the acoustics of environment are.

Early Reflections

Sound radiating first from the surfaces in an enclosed reflective environment are known as “early reflections” (e.g., walls, floor, and ceiling). These reflections contribute in augmenting a sense of distance and dimension in an enclosed reflective environment.

Sophisticated architectural mathematics and physics are employed by studio designers, in their efforts to design and construct excellent sounding recording studios, mixing rooms and live concert venues that embrace effective sounding early reflections.

The length of time difference between the arrival of the direct sound and the arrival of the early reflections will directly correlate to amplitude levels, with amplitude decreasing as the difference in time lengthens.

Figure 1: Direct Sound and Early Reflections

Effective sounding reflections that can enhance a pleasant reverberating listening experience need to arrive to the ear within 15ms-100ms after the arrival time of the direct path sound and must be lower in amplitude with less high frequency content. The length of time between the direct sound and the early reflections influences the amplitude and high frequency content of the reflections, with amplitude and high frequency content decreasing when the time difference increases. Therefore the type and distance of the reflective surfaces from the listening position affect their amplitude and high frequency content.

Reflections arriving to the listener between 15ms (left) and 30ms (right) will be louder and contain more high frequency content than reflections arriving between 60ms (left) and 80ms (right).

Left and right early reflections arriving less than 15ms after the arrival of the direct sound will produce a flanging effect and image location difficulties for locating the original sound source in a stereo situation*(see precedence effect). Also left and right early reflections arriving 15ms after the arrival of the direct sound, but with both reflections arriving with less than 15ms between each other, will produce a flanging effect and image location difficulties for the early reflections. This effect can be easily reproduced if one claps their hands and listens for a flutter echo flange within the sound of the echo; an effect caused by two or more reflections arriving less than 15ms apart from each other in an enclosed reflective environment containing highly reflective parallel walls.

Once the early reflections pass the 100ms mark, they begin to sound detached and discrete from the direct path signal and no longer contribute in influencing a sense of distance and dimension for the overall sound experience.

If the sound source is transient sounding in nature, the early reflections are easier to hear and distinguish from each other but could prove to be distracting in a listening experience if the reflections are arrive later than approx. 70ms, after the arrival if the direct sound.

Early reflections continue to regenerate and multiply over time until there are so many of them, they will eventually be perceived as reverberation. Early reflections originate from room surfaces (e.g.

walls, floor and ceiling) and are known as the ER component of a reverb plug-in. The length of time difference between the arrival of the direct sound and the arrival of the early reflections will equal an amplitude decrease, with amplitude decreasing when the difference in time lengthens.

NB: The transient nature of the original signal will influence the 15msec to 100msec range for replicated transient reflections. A transient snare drum may begin to sound discrete in the 60msec-80msec range, where a smoother sounding instrument, such as a cello or flute, will not generate reflections that will begin to sound discrete from the original sound source until at least approx. 100msec. The amount of high frequency content at the front of the original sound’s waveform and the tempo of the music are also influencing factors in determining a sense of dimension in an enclosed environment.

Figure 2: Direct Sound and Early and Late Reflections

Reflections arriving later than 100ms, are known as late reflections or more commonly known as “discrete reflections”. As the direct path sound decays, the initial sound of the reflections and reverb will often be louder than the decay of the direct sound, thus appearing attached to the decay of the direct sound. Later reflections, arising from “reflected reflections” from one surface to another, are assumed to arrive equally and diffused from all directions with even amplitude to both ears and can be described as exponentially decaying sound or otherwise reverberation.

Highly absorptive materials will soak up a great deal of the energy of a sound wave. When sound comes into contact with the absorptive materials, the energy is greatly reduced in the reflected portion of the sound, thereby reducing the overall reverberation time, high frequency content and amplitude.

Late reflections will become more highly diffused as the distance between the initial sound source and listener decreases especially when the reflective surfaces are at a reasonable distance. This is where the decay of the direct sound decreases until it is sensed as equal to the onset of the diffused reverb. In an effective enclosed environment the decay of the direct sound should dovetail into the onset of the early reflections and reverberation.

Breakdown of an Audio Signal in an Enclosed Reflective Environment

An audio signal takes numerous different paths in an enclosed reflective environment to reach a fixed listening position.

  1. The direct path signal from the originating source to the listening position.
  2. The early reflections from the left and right walls (maybe ceiling).
  3. The numerous diffused reflections emanating from the direct sound and early reflections contribute to “reverberation”.

The unobstructed direct signal is always the loudest, and it is the most defined in its frequency response and it’s easy to perceive subtle variations in its amplitude. The time it takes for the signal to travel from the source to a listening position is determined by the speed of sound (approximately 1 meter/sec). If the direct path is perceived as located dead center in a stereo image, then the audio’s arrival time to both ears are identical. If the direct sound is perceived to be coming from the left, the listener will confirm the location of the source; for the audio signal will arrive to the left ear slightly sooner and louder than the right ear. If a direct sound arrives from the right the listener will also be able to distinguish the correct location as coming from the right*.

The first indication of dimension is when reflected sound arrives at a lower level between 15ms and 100ms after the arrival time of the direct original sound. As previously stated, if the reflection arrives sooner than 10ms-15ms, it won’t create dimension but will create imaging and precise localization problems. If it arrives later than 100ms, it will be perceived as detached and as a distinct sound experience.

The amplitude, time duration and frequency content of the early reflections and reverb determine the size of the enclosure and the type reflective materials of the enclosure.

In listening enclosures, the first two early reflections, will typically arrive from the left and right walls. In most circumstances, the two delay times will be slightly different from each other, and sound connected to the direct signal if they arrive between 15ms-100ms after the original direct sound. As previously stated, the reflection’s frequency content is always restricted with less high frequency and lower in volume than the direct path signal, with the amplitude and high frequency reduction based on the absorption coefficient properties of the reflective surfaces of the enclosure and the distance traveled.

In a situation where the listener is situated at a fixed distance directly in front of a sound source, exactly between the left and right walls of a concert hall, the entire audio experience will contain an early left reflection and an early right reflection. Both reflections will create a sense of distance originating from the sound source and will also contribute to the generation of reverb, adding a sense of distance and dimension. Both early reflections will lose high frequency content for the materials used in the walls of a concert hall are designed to absorb the higher frequencies of the early reflections in order to provide accurate imaging and develop a smooth and warm type of reverb in the hall.

If the listener is sitting at distance from the stage but dead center, theoretically the left and right early reflections should arrive exactly at the same time. However due to the shape of the human head and the fact that the head is always slightly in motion, the arrival of the left and right early reflections are not exactly identical at any one given time. There is always a minor difference which result’s in random fluctuation of arrival times between the left and right reflections to the listening position.

In an example where the originating direct path sound arrives to the listener’s ears at 5ms and two early reflections from the left and right walls arrive extremely close together within 25ms of the initial sound burst, there will be a difference of 20ms between the arrival of the direct path sound and the two early reflections.

If the listener moves a couple of meters from the center position to the left, the left reflection will be slightly louder than the right reflection where the left reflection arrives to the listener marginally sooner than the right reflection. This will create a situation where the listener will perceive they are close to a reflective surface located to the left of the listening position. There will still be an early reflection arriving from the right wall but it will be slightly later and not as loud as the early reflection coming from the left. Therefore the early reflection coming from the right will only add a sense of dimension and distance to the overall listening experience. The direct sound will still sound like it is heard dead center when the listener is facing directly on axis to the originating direct sound. If the left and right early reflections were swapped with each other, the listening position would be reversed and the listener would conclude that he is situated close to a reflective surface located to their right.

The variables that will influence the position and distance between the listening position to the sound source and the type of materials the surfaces of the enclosed environment are constructed of are:

These variables can be controlled and manipulated by the engineer anywhere from the original microphone recording placement setup and/or in the final mix where the engineer can alter:

In an excellent sounding listening environment the correct balance of direct sound, early reflections and reverb will contribute to an favourable listening experience as in Fig:3 When the listening experience is confusing and seriously compromised it is usually because the environment is too reflective where the reverb and early reflections obscure the direct sound as in Fig:4.

Figure 3: Audio signal with Excellent Early Reflections and Reverb

Figure 4: Audio Signal with Inferior Early Reflections and Reverb

It is notable to understand how the ear determines distance from the sound source especially when the amplitude of the direct sound diminishes and the amplitude of the early reflections and reverb emerges in relation to the decay of the original direct sound. It is also imperative to note that the amplitude and high frequency content of the early reflections and reverb, can never be greater than the amplitude and high frequency content of the direct signal as heard by the human ear in an enclosed environment. (The physics of sound should not be violated when trying to emulate a realistic listening situation).

When left and right delays are generated in mono, to substitute as early reflections are of identical value and both within 15ms-100ms of the arrival of a mono direct sound, listeners perceive the direct sound and both delays to be coming from one location and all audio elements will be heard as mono in a mix, which could be a desirable effect when required. When the direct sound source and the two early reflections are only heard in mono, it is very difficult to perceive an accurate sense of distance and dimension that would be better established if the early reflections could be heard in stereo with the original sound source remaining in mono or better yet in stereo. If the instrument were recorded in stereo with the addition of left and right delays (reflections) of identical value, there will still be a perceivable albeit, limited sense of distance and dimension.

Figure 5: Direct Path, Early Reflections and Reverberation

Figure 6: Direct Sound, Early Reflections, and Reverberation

In a real life listening situation the right reflection and the left reflection would never be identical in time and amplitude, for it would be impossible for the left and right reflections to arrive to both human ears at exactly at the same time, same amplitude and with identical frequency content. Therefore, if one wants to create dimension in a stereo environment, liberties need to be taken when offsetting digital delay settings to generate left and right early reflections, in order to create a realistic sense of distance and dimension.

One method is to take a stereo-recorded instrument and add two delays (early reflections) of different values that follow the suggested guidelines between 15ms—100ms. If the instrument were a stereo-recorded piano and left and right delays set to 30ms each, the delays would directly follow the original stereo panning of the piano. This method would create an unsatisfactory, but still a good quasi-sense of distance and dimension.

One method of creating a sense of distance and dimension with a localized “mono” direct sound source is to place the original signal in the center position and create two delays at least 15ms and less than 100ms later than the arrival of the original sound source.

Remember that a delay of less than 15ms from the related sound source will produce phasing and poor imaging effects and a delay of more than 100ms will create the illusion of a separate discrete event and will not contribute in creating a sense of distance and dimension.

Pan the original signal to the center with the delays panned hard left and hard right at a lower level with some high frequency content rolled off. An Important factor when left and right delays are needed to emulate early reflections is that the delays cannot be of the same value and at least 15ms separate from each other to prevent phasing and image problems. If both of the delays are of the same time value and panned hard left and hard right they will collapse into mono and this will not effectively create a sense of distance and dimension. Therefore in creating dimension, set the left delay to 20ms and the right delay at 35ms (or vice versa). I tend to always offset the left and right delays (reflections) by 15ms to achieve a stereo effect.

Theoretically, the right delay (35msec) should be slightly lower in both volume and high frequency content, but for the purposes of creating dimension, it is unnecessary to apply this theoretical principle, because listeners most likely will not be hearing just one instrument in a mix, or be in a listening situation to detect the exact location and frequency content of the delays (early reflections). The dimensional effect created by the 20ms (left) and 35ms (right) delays will greatly over-ride the sound of the time difference of the 15ms between the two delays (reflections).

It is also a good idea to regenerate the delays through the feedback send on the digital-delay plugin, for this is what truly happens in a genuine situation where the early reflections continue to bounce off of surfaces losing more high frequency content and amplitude with each and every reflection. Also send the returns of the digital delay to the reverb send. For this is also an authentic emulation, where the early reflections eventually develop into reverb in a real reflective environment.

If the left delay is 30ms and the right delay is 35ms there will be inaccurate imaging and possible phasing problems between the left and right delays. If the both delays are quite different in set times (15ms-left and 80ms-right), it will create an unrealistic and undesirable listening environment, as if the listener is situated right next to one highly reflective surface.

With an increase in delay times the listening position will sound closer to the sound source then if the delays were shorter. The listening position will be located further from reflective wall surfaces but closer to the sound source. Using a left delay of 45ms and a right delay of 60ms, with both delays at a lower level and with slightly less high frequency content would create a more sense of distance than if the delays were set to shorter times and more amplitude.

It is important to set the amplitude of the delays at a level where they will be only be perceived as supporting a sense of distance and dimension. If the delays are almost as loud as bright as the direct sound it will make the overall sound confusing and create the illusion that there might be rhythm discrepancies within a performance (flams). As previously mentioned, even though there is a difference of 15ms in the arrival times of the left and right delays, the dimensional effect will greatly override the 15ms time delay difference in relation to the fixed listening position, particularly if the direct sound is panned in the middle.

If the sound source is stereo a sense of dimension will also be created. With two delays, and the associated altered frequency response and amplitude settings, a sense of depth and distance is added to the original direct stereo sound to create a dimensional effect.

I should also note that the type of volume and frequency of the beginning of the sound envelope of the original sound source, be it a percussive attack or a slow attack. This type of transient must be considered when establishing the time setting of the two delays for it will determine whether the early reflections (delay times) will sound dimensional or unfortunately discrete and messy.

Furthermore, the frequency content of the delays will determine the absorption coefficients of the reflective surfaces. Therefore if the audio is transient in nature, the engineer could extend the times of the delays (early reflections) but would have to roll off more high frequency content at a lower frequency for the dimensional effect to sound realistic.

These delays (early reflections) alert the psycho-aural response in a way that tells the listener that they perceive the sound at a set distance in an enclosed reflective environment. When the listener hears only the original sound, without reflections or reverb, the psycho-aural response would suggest that the audio being heard would have to be occurring a few meters apart between the sound source and the listener, elevated in the middle of a field at an elevated height. The only sense of a change of distance could only be possible because of lesser high frequency content and lesser amplitude. In some mixing productions, certain instruments and vocals have no reverb or delay and are meant to sound “in your face” on purpose. This type of effect comes from more of a production decision, than trying to maintain realistic perception in a mix.

Reflected audio that sounds dull and dark suggests that the listener is in an enclosed environment that contains reflective surfaces such as wooden walls, which absorbs the high frequency content.

Reflected audio will sound brighter if the reflective surfaces are made of something firmer, like fiberglass or concrete.

Sound that reflects off surfaces always sounds less brighter than the original sound no matter what type of material the reflective surface is composed of, for every type of surface absorbs some high frequency content and amplitude. The duller the sound is of the reflection, the higher the absorption co-efficient of the reflective surface materials.

Discrete delays are easy to localize in the stereo image but will prove to be distracting unless the delays are used to enhance a rhythmic idea for a performance recorded at a fixed tempo. If a song is recorded at 120bpm and if the engineer pans a delay arriving at 195ms to the left side, it will be heard distinctly as if it’s directly and discretely arriving from the left but out of sync with the performance. This will not help create a sense of distance and dimension, for the reflection will sound too detached from the original sound source event. For a rhythm effect such as this to work effectively, the delay would have to be closer to 250ms (eighth note).

In order to add dimension to the sound of a percussive instrument, such as a snare drum, the delays (reflections) need to be in the vicinity of 15ms–60ms because of the transient nature of the sound of the drum (see audio waveform). If the delays are longer than approximately 60ms, they may conceivably sound totally discrete, because one would now hear the discrepancy between the transients of the original snare drum and the onset of the transient of the generated delay, which would result in a random and confusing sound. A good rule to remember for adding dimension to percussive elements is this: the faster the attack of the sound envelope, the shorter the delays (reflections) need to be to prevent the discrete delay from creating a confusing and muddled sound.

The engineer could extend the times of the delays but would have cut more high frequency content at a lower roll off frequency for the dimensional effect to sound realistic.

If one wanted to simulate a canyon like echo sound effect, then feel free to add discrete delays longer than 100msec, but make sure they are duller and at a lower volume than the original sound, as well as at a time setting that is not a rhythmic factor to the tempo of the music (in a rhythmic delay situation, the delay will likely land on a half, quarter, eighth, or sixteenth note of the tempo and will be masked by the rhythm of other instruments, which will make it hard to hear the rhythmic delay effect linked to the original sound.

If the instrument happens to be a piano, guitar or violin, delays can be between, left 15ms-100ms and right 15ms-100ms, with a 15ms time difference between the left and right delays.

For instruments with fast acting transients the delays should be between, left 15ms-60ms and right 15ms-60ms with the 15ms time difference between the left and right delays.

Another application to follow is; when adding longer delay times, dampen the high frequency content of the delays as the delay times increase. If the delays for one environment are 20ms and 35ms and you want change the sense of distance and dimension, then modify the delay times to 50ms and 65ms and remove more high frequency content and lower the high frequency roll off and mix the delays in at an even lower level. With plenty of instrumentation, the listener will barely notice the minor alteration of delay times and the small changes in EQ but will notice a modification in dimension. This technique also creates the illusion that the delays (reflections) have lost more high frequency content because the reflected sound has traveled further than a setting with shorter delays resulting in a warmer sound.

The delay’s frequency response used to generate early reflections will dictate the type of material of the reflective surface used in the listening environment (absorption coefficients). The delay times will dictate the size of the hall and how far the reflective surfaces are from the original sound source and the listening position. Mix engineers have the ability to alter the frequency content and amplitude of the delays in a way that suits them to simulate a type of desired listening environment with a preferred sense of distance and dimension.

In the photos, you will notice the famous Massey Hall and the newer Roy Thompson Hall. Most concert halls and performance centers have a variety of reflective surfaces within their environments. In Toronto these two performance centers are employed for classical, jazz, soft rock and pop music. Massey Hall is an older structure where the walls are mainly composed of wood and soft plaster. The acoustics of Massey Hall have been described as warm and rich sounding. The other performance center is Roy Thompson Hall, a newer building where the walls are composed of concrete and glass where the acoustics have been described as bright and at times confusing.

Figure 7: Three Different Listening Positions

Fig:7 shows the layout of a concert hall with three different listening positions, “A-B-C”, are all situated at fixed distances from the performance stage. The goal here is to determine which factors contribute to the overall sound experience with the listener seated at the three different fixed distances from the sound source.

If a group of musicians are performing on a stage in a performance center, the listener will hear three different aural experiences when seated at the three different distances from the stage. The quality of the direct sound, early reflections, reverb, high frequency content and amplitude will all be significantly different between one fixed listening position and another position. Once an engineer understands why the three listening experiences are different, he will then be acquainted with the knowledge of the fundamental factors and how they differ and relate between one listening location and another. The engineer will then be able to manipulate specific audio processing to create dimension, so instead of the engineer (listener) having to change his location of listening positions to hear different dimensional perspectives, he can than remain in one fixed position and create a sense of distance and dimension between the different instruments and vocalist to achieve a sense of dimension between the individual instruments and vocals for a good sounding mix. If the engineer can determine which rules of sound that are involved and how they contribute and determine the type of listening experience for each listening position, the engineer will then be able to reverse the situation and mix a performance where the musicians are placed at different distances in the mix from his fixed mixing position (listener’s position). So instead of the listener having to move to different positions to hear different characteristics of the overall listening experience, the listener (engineer) can then take this knowledge and situate the listener in one fixed position and have the musicians placed at various distances with different senses of dimension in the mix. For one example, the engineer can build a mix where the lead vocalist sounds like the listener is sitting very close to the stage in the “A” position, the guitars and keyboards have the listener situated further back in the “B” position and the drums in the “C” position, therefore creating diverse perspectives for the various instruments in the mix.

In the following descriptions, I will state the factors that contribute to determining the kind of sound experience that is occurring with direct sound, early reflections, reverb, levels and EQ at various distances from the performance stage.

With orchestral recording and live shows, the engineer can utilize ideal placement of microphones to capture the ambience required to replicate the sense of dimension in a final mix. In the mixing stage, the engineer could also sweeten the mix with additional artificial reverb. This is a simple approach that only requires the engineer to follow basic rules of recording and mixing.

The more enhanced and representative approach to sound mixing requires the engineer to create an environment from mono and stereo sources. It’s obvious that a simple stereo reverb will create an environment with a fixed decay time that might offer the option of fixed early reflections. But this unsophisticated approach is very limiting.

If the engineer uses only one reverb plug-in with just one setting and exploits the use of it with all instruments, the music will sound like it is all heard at one fixed location in an enclosed environment.

The final mix will usually sound uninteresting and lifeless.

So when speaking of creating distance and dimension in a mix, the engineer will need to utilize digital delay settings to create the sense of early reflections and equalization to reduce the high frequency content of the delays to emulate the type of material of the reflective surfaces. Additionally the type and operation of the reverb setting will assist in emulating a believable environment.

Generally, any delays will not affect clarity if they are equalized and mixed at appropriate levels. The addition of reverb to these delays emulates and enhances a realistic sounding environment.

The additional reverb with the correct pre-delay settings, decay time and equalization, allows engineers to enhance the accuracy of a realistic sound environment.

Listening Positions in an Enclosed Environment

When Engineers record orchestral music for films and compact disc’s, the “B” listening position is what they strive for in the recording and mixing process. This orchestral listening position is often referred to as “the listener’s sweet spot”. These seats are the most expensive for they offer an optimum listening position that has a favourable balance of direct path sound, early reflections and reverb. When orchestral music is recorded, the microphone setup relies on close mics, medium distant mics and ambient mics, where the engineer has complete control of mixing for a fixed position between “A’ & “C”.

In pop-rock music mixing, the use of all the positions (A, B and C) is strived for in creating a sense of distance and dimension. The different positions will represent an accurate balance of direct sound, early reflections and reverb for the individual placement of vocals and instruments. The goal in creating dimension is how to create the three listening positions from stereo and mono sources.

Using any one position as a starting point, the engineer can then use slight alterations in equalization, delays and related amplitudes to relocate the listener between either of the other listening positions.

The main variations that define each of the three different listening positions are primarily concerned with early reflection times, high frequency content, decay time and all involved amplitudes. To enhance the sense of dimension in some of the positions, the reverb decay time should change slightly between the three positions. I try to keep the differences in reverb decay time between the “A & C” position to less than 1.5sec to maintain a sense of dimensional adhesion. The principal change in reverb will be with its decay time, pre-delay and high frequency content that emulates the properties of the reflective surfaces and its amplitude setting in how it relates to the delays and the direct path sound. The pre-delay for the reverb will significantly influence the listening position’s distance from the sound source. The principal changes in the early reflections will be the set times, high frequency content, delay feedback and amplitude levels.

The “B” Listening Position

NB: The numerical figures assigned to set time values and equalization frequency points in all three locations are not exact and are valued as approximate.

This “B” position (middle) is slightly in front of the exact center of the enclosure, 7-10 meters away from the origin of the fixed positioned sound source. In a typical concert hall, this position is usually the middle of the front row in the first balcony.

In most halls, the “B” listening position is regarded as the optimum listening position, preferred since it represents an effective balance of direct sound, early reflections and reverb for the listener. When engineers record orchestral music for films, performance and CD’s, this is the listening position they strive to emulate in the recording and mixing process.

However the “B” position is not the ideal location when a music score calls for a vocalist during the performance or when the tempo is very slow. When a vocalist is performing a solo motif with an orchestra it can at times be incredibly demanding to hear the vocalist clearly amongst the excessive amplitude arising from all the other instruments.

If the orchestra is playing at a largo tempo (slow), silent gaps in the ambience may become prevalent and objectionable for the listener.

The earliest and loudest sound component heard in any listening positions is via the direct path route. Of the overall 100% of the total audio heard in the “B” position, approx. 70% of the total sound heard will come via the direct sound route. Due to the distance, the high frequency content will be slightly lower in response due to atmospheric conditions, so there would be need to increase frequency content above 10khz.

The early reflections will comprise 15% of the total sound heard.

There will be early reflections coming from both the sidewalls and rear walls, so delays have to be created to generate the impression of these early reflections. Remembering that all early reflections that arrive to the listening position need to reside between 15ms and 100ms to be effective. Another factor is that the dedicated set times of the delays (reflections) are based on the differences of arrival times of sound between the direct path route and early reflections (delays). In the “B” position both reflections should be between 30ms-60ms. The actual delay times selected represent how far both walls are from the listening position. In a respectable performance venue, the excessive elevated height of the ceiling does not contribute in generating early reflections. The floor is usually covered with highly absorptive materials, which significantly mutes the sound, so no reflections from the floor.

In the delay setup, it is recommended that the audio return of the delays be regenerated pre-fade and allow to regenerate and decay to give the effect of multiple reflections coming from both walls.

Although this will regenerate delays outside of the “creating dimension” range, the additional delays (reflections) will be much lower in level than the individual two delays. In a real life situation, the early reflections coming from the walls continue to bounce from other walls to create dimension and eventually develop into highly diffused reverb. These regenerated longer delays (reflections) are low enough in level that they don’t generate dimension as a component of early reflections, but rather contribute in enhancing the richness of the reverb.

There needs to be a high frequency roll off on the delay plug-in return, for the delays are the summation of all the numerous reflections that lose more high frequency content with every regenerated reflection. It is also critical that the delay return be assigned post fade to the related reverb being used for the “B” position. The amount of delay return level that is sent to the reverb determines how “live” the sound is and the choice is an aesthetic one by the engineer.

The Reverb component will make up the remaining 15% of the total sound heard. The decay time of the reverb is likewise an aesthetic choice, but should be long enough in duration to be perceived as realistic ambience in a reflective environment and short enough to keep the mix from sounding harmonically confusing and unclear.

There needs to be a high frequency roll off on the reverb return, for reverb is the summation of all the numerous reflections that lose more high frequency content with every regenerated reflection.

The roll off frequency selected as the onset of diminishing high frequency content should be approx. between 3khz-5khz. The lower the roll off frequency point, the duller the wall surfaces reflections will be perceived. Most reverb processors reduce high frequency content, as the reverb decays over time. As previously stated, the high frequency content of the reverb will be a lesser amount at 2sec than it will be at 1sec.

Creating Dimension in the “B” Position

In session template setup, assign 1 mono send (not stereo) to 1 stereo return channel and insert a digital delay. Assign both channel returns hard left and hard right.

Setup Example 1. (approximate)

Amplitudes

Early Reflection-Delay Times

Equalization- Roll Off Point of High Frequency Content

Delay Regeneration

Reverb

For moderate tempos assign a reverb time between 1.5sec-2.0sec.

Reverb time length should be set in regards to the musical density of the song and its tempo. The length of the decay time also depends on the mixer’s aesthetic choice.

Send both delay return channels equally to reverb (stereo aux track) Pre-delay needs to be 0.0ms for reverb sends from both delay returns.

Both sends from delay returns should be of equal level.

In the “B” position the engineer can also create a mix template that offers the “B” position with or without the additional use of the delays (early reflections).

It is obvious that a greater sense of distance will be established when the delays are used as early reflections, but at times the engineer will not desire to have all the instruments sounding like they are located at the same fixed distance in the “B” position.

The engineer at times will want to make subtle dimensional differences in the “B” position common to more than one instrument. A good solution for this is to generate the effect that some instruments only need a sense of dimension without the additional use of the delays (early reflections). Therefore the engineer should create two separate reverb plug-ins for the “B” position. One position will include the delays to generate a sense of distance and dimension and the other will just be used to establish a sense of dimension. Therefore the reverb that includes the delays will have 0.0 pre-delay, for the pre-delay has already been factored in with the insertion of the pre-set delay plug-in. The other reverb will need to have the necessary pre-delay between 30ms-60ms. The pre-delay of the reverb without the delay sends has to be at least as long as the first of the two delays from the other reverb. For example: if the delays are 30ms-left and 45ms-right the pre-delay on the reverb without delays, has to be at least 30ms. It is impossible to hear reverb before the early reflections! Reverberation is indicative of the size of the environment and how reflective the wall surfaces are. If the mix engineer wanted to create an environment where all of the walls are further away from the “B” listening position, than he will have to shift all the delay times (early reflections) later and likewise the pre-delay for the non-delay reverb. In making this alteration to delay times, the engineer should be aware that he is beginning to encroach on the specifications for the “A” position. The pre-delay for the reverb’s send from the delay channel return, still remains at 0.0msec.

Another factor that needs to be appreciated is that the high frequency roll-off should be identical on both reverbs used in the “B” position.

This model for the “B” position with delays, for creating one fixed listening location will be effective if the listener wishes to hear all the instruments at the same distance and with similar sense of dimension in identical sounding environments. This situation would work well for jazz, classical and live shows.

The engineer should employ in a mix, a strategy to design different mix templates for hearing various instruments and vocals with different perceptions of distance and sense of dimension for the listener. The engineer can select the type of reflective surfaces (EQ on delays) and how large the environment is, indicated by the delay times and length and EQ of the reverb (RT-60).

In a pop-rock song, a valuable mixing strategy would have the listener hear: 1) the lead vocals as if he were sitting in the “A” position, 2) the guitar and piano situated at the “B” listening position and 3) the drummer performing from the “C” listening position. Because the bass possess low frequency content, creating dimension is challenging for two reasons: 1) bass frequencies are hard to locate due to the fact that they are omnidirectional and 2) bass frequency localization will result in rendering the mix very indistinct in the low frequency range. However if the bass were performing as a rhythmic element in the production, then creating localization in the 1khz-2.5khz with appropriate EQ and compression would effectively work.

If you physically moved the listening position a few meters further back from the “B” location center position, the amplitude and high frequency content of the direct sound will decrease in relation to the amplitude and high frequency content of the reflections and reverb. In an extreme example, if a listener stood at the very rear wall of a concert hall, the amplitude and high frequency content of the direct path, reflections, and reverb would be extremely close together, creating the illusion that the listener would be at a substantial distance from the originating sound source (“C” position) and in a highly reflective environment.

If the walls are composed of concrete and glass then the reverb (RT-60) will be bright sounding and last longer. For that type of hall at least a 6khz roll off is required in the reverb settings. Since the walls are highly reflective the decay time would be longer and it also depends on the mixer’s sound aesthetics. Remember that the mixing engineer will have to still be aware of the density of the sound of the combined instruments and various tempos. The slower the tempo with only 4-5 instruments will allow for a reverb decay time between 2.0sec-2.5sec and possibly longer with less instruments.

If the production is complex with a quick tempo then a decay time between 1.5sec-2sec should work effectively. The mixing engineer should always make sure that there is clarity with clean harmonic distinction in creating dimension. If clarity is sacrificed and the mix is harmonically messy, it will sound like a piano player making quick chord changes with his foot permanently on the sustain pedal.

If the walls are composed of wood and fabric then the reverb will be duller however warmer sounding and be slightly shorter in duration. A roll-off frequency set between 3.0khz and 4.0khz is a good range to work within. As mentioned above, decay times should be based to a degree on aesthetics, density of the music production and tempos. Most classical and jazz mixing engineers strive for a very warm and highly diffused sounding reverb. Rock engineers with their desire for plenty of midrange prefer brighter sounding, shorter reverb times.

The “A” Position

NB: The A position will be represented by using the “Lead Vocal” as the original sound source.

The “A” position is located 3 meters from the sound source. If the listener moves forward to the “A” position from the “B” position, the amplitudes and high frequency content of the direct sound, early reflections and reverb will be different, since the distance between the listener and the lead vocal is now shorter than it was in the “B” position. The mix engineer can also take liberty and slightly increase the frequency content in the 12khz region a couple of db to create the impression of intimacy between the vocalist and the listener. How much of an increase is an aesthetic decision by the engineer.

For all intents and purposes, the early reflections do not exists and don’t play a role in creating dimension in the “A” position since the distance between lead vocal and listening position is too close for the listener to perceive any influencing effect from early reflections.

An option for the engineer to enhance a lead vocal or soloist’s reverb used in the “A” position is to create a delay that is rhythmically linked to the tempo with some regeneration (delay feedback). Send the delay and its regeneration to the same reverb being used for the lead vocalist. If the engineer choses to add the additional rhythmic delay at a low level into the mix, then he should take liberty in making the delay stereo instead of mono. If the tempo were 120 bpm an eighth note delay would be 250msec.

This would only be a mono return and might sound like a slap back effect if mixed in loudly, which is not the goal of creating effective reverb. Instead of a mono delay at 250msec, instead create a stereo delay with the left return set to 240msec and the right return set to 260msec (or vice versa). As previously stated the delays have to be at least 15msec apart to avoid imaging problems. Remember to remove some of the high frequency content and also de-ess the send. This option has little to do with creating distance but is used to enhance the essence of the reverb sound. The goal here is to extend the melodic idea contained in the original vocal sound or solo so that the melody sounds more appealing to the listener. This feature works well in slow vocal ballads.

The reverb is the only other element besides the direct path sound, contributing to the overall sound experience. It still only makes up 15% because of the increase in direct path sound amplitude in the “A” listening position. What is different about the reverb is it will be perceived as more highly diffused with less high frequency content than the reverb in the “B” position. As previously stated, the reverb time between 2.0sec-3.0sec is less bright than the same reverb between 1.0sec-2.0sec. It will also be sensed as an overall longer sounding event than in the “B” position. The reason for this is that the pre-delay on the reverb send of the “A” position is in between 80msec-120msec, which is longer than the pre-delay setting in the “B” position, so the time difference between the onset of the direct path sound to the end of the decay time is longer because of the extended pre-delay time setting on the “A” position.

Creating Dimension in the “A” Position

Amplitudes

Early Reflection-Delay Times

Reverb

For moderate tempos assign a reverb time between 2.0sec-2.5sec.

Reverb time length should be set in regards to the musical density in the song and its tempo. The length of the decay time is also an aesthetic choice by the mix engineer.

The basic idea of the hearing a performance in the A listening position is to situate lead vocals and solos in a mix to create presence and ambience. It additionally enhances the impression of a singer or soloist standing very close in front of you in a beautiful intimate sounding environment.

Creating Dimension in the “C” Position

The “C” listening position is the furthest from the sound source.

The levels from the direct sound path, early reflections and reverb are much closer together in amplitude and high frequency content than in either of the “A & B” positions.

The early reflections, reverb and the original sound all arrive later and are closer together in amplitude and high frequency content because the entire sound experience has to travel further for the “C” position than in the “A & B” positions. Because theses time differences and amplitudes are closer together in value, the perception suggests that the listener is located at an extensive distance from the originating sound source and in an advantageous reflective environment. You will notice that the delay times of the early reflections are the shortest of all the listening positions. At this point it is imperative to remind you that the set delay times (early reflections) are referenced to the time difference between the arrival of sound from the direct path source and arrival of sound from the early reflections.

Amplitudes

Early Reflections-Delay Times

Equalization- Roll Off Point of High Frequency Content

Delay Regeneration Feedback (pre-fade)

Reverb

A Mixture of Listening Positions

One of the ideas and objectives for mixing is to create a sense of distance and dimension in an enclosed environment with a choice of numerous listening experiences perceived from various fixed locations.

To achieve the best sense of dimension in a mix is to assign different instruments and vocals to be heard from the three listening positions. The engineer cannot expect to assign all musical elements to one position and have the listener keep moving forwards and backwards in the theater or at home to appreciate different senses of dimension. The strategy in mixing is having the listener stationed in one location, a sweet spot as they say, and apply to the mix, the techniques of using delays (early reflections), reverb, different EQ and different levels for all of the elements that will contribute to generating a sense of distance and dimension.

Mixing Examples

When I work on orchestral recordings with soloists and I alter the sense of dimension to meet all my aesthetic needs that I feel the listener will also enjoy. When I was mixing a film soundtrack that featured Andrea Bocelli with the Berlin Philharmonic, I took plenty of liberties in the mixing process. I placed the orchestra in the “B & C” positions and Andrea in the “A” position. When the listener is hearing the mix on a high quality sound system, they will perceive Andrea positioned three meters in front of them with the orchestra positioned 10-20 meters behind Andrea. The vocal will have a long pre-delay (100msec), the reverb send is de-essed, no delays (early reflections), the voice equalized for presence and the reverb EQ will be rolled off at 2.5khz. The orchestra will have a short predelay (40msec), delays for early reflections (40ms & 55ms), very little EQ on the instruments, and the reverb EQ will be rolled off at 3.5khz.

If I am mixing a pop song, I will enlist the following process: The “A” position will contain the lead vocal and guitar solo. The vocal and solo will have a long pre-delay (80msec) on the reverb send (de-essed), no delays for early reflections), the voice equalized for presence and the reverb EQ will be rolled off at 3.0khz.

For the rhythm guitar and piano I will place them both in the “B” position, with the guitar in the “B” position with early reflections and reverb for creating distance and dimension and the piano in the “B” position for just creating a sense of dimension.

The rhythm guitar will be located in the “B” position for creating a sense of distance and dimension. The reverb pre-delay will be 40ms, delays for early reflections (30ms & 45ms), and the reverb EQ and delays will be rolled off at 4khz.

The piano will be located in the “B” position for creating just a sense of dimension. The reverb pre-delay will be 40ms, no early reflections and the reverb EQ and delays will be rolled off at 4khz.

The “C” position will contain drums. The reverb pre-delay will be 20msec, early reflections of 15ms & 30ms, and the reverb EQ rolled off at 5khz.

Stereo Dimension Conclusion

Before we move in to surround sound, a conclusion can be made for creating dimension for stereo mixing by emulating reflections through the use of digital delays. When the mix engineer assigns separate delays (early reflections) to vocal(s) or instruments recorded either in mono or stereo, he can then create a sense of distance and dimension for the original musical performance. The two delays to be generated need to be between 15ms–100ms and at least 15msec separate form each other. The delay time stings, amplitude and assigned high frequency roll-off of the delays will imply the type of material the reflective surfaces are constructed of and how far the distance of the surfaces are between the listener and the source sound.

The best sounding mixes have the type of perspective where listeners can visualize distance and dimension in a recorded performance. To achieve this, the mix engineer needs to understand how direct sound, reflected sound, and reverb work in combination with each another. In other words, how can the engineer relate and use this knowledge to achieve a desired dimensional perspective in a dimensional sounding mix ? Sound design, ambience, Foley and dialogue are mostly mono or stereo elements, then altered and adapted to create an aural image.

As previously stated, once an engineer comprehends how sound works in a three-dimensional environment he will also have the ability to take mono and stereo elements and generate a surround sound mix.

It some cases the techniques required to create dimension in surround sound are very unconventional and remarkably original. As they say, “If you want to break the rules, you need to know the rules you are breaking.” An outstanding example of being original is in the imaginative utilization of convolution reverb, which can be manipulated to achieve astonishing believable realism.

With a good understanding of the physics of enclosed environments, as well as the fundamental operational principles of audio processing, it is possible to create the illusion of virtually almost any listening environment that can be imagined.

Precedence Effect (Hass Effect)

Why delays have to be at least 15msec

The auditory system of the ear can clearly localize a sound source in the presence of multiple reflections and reverberation. In fact, the auditory system “combines” both direct and reflected sounds in such a way that they are heard as a “localized event” and the localization of the direct sound has been determined by the “precedence effect”, also known as the “Haas effect” or the law of first waveforms. The precedence effect allows us to localize a sound source in the presence of reverberation, even when the amplitude of the reverberation is greater than the direct sound.

Localization is based on the time difference between the left and right ear of the arriving sound event. (Figure 2) Of the various experiments that investigate the precedence effect, the most common exercise positions a to have a listener in front and between two loudspeakers placed in a triangular setting, in an anechoic or a very dead sounding environment.

One loudspeaker is used to deliver the direct sound while the other loudspeaker delivers a delayed replicate of the direct sound, thus simulating a slight delay. Such studies indicate the following:

  1. If a direct sound event is generated simultaneously with identical amplitude in both the left and right loudspeakers, then a single sound source (virtual source) will be perceived by the listener at a location point centered exactly between the left and right loudspeakers. (Phantom center mono position)
  2. When the direct sound and the delayed sound are of equal volume but the delayed sound one is increased from 0.1msec to 1 msec in the right loudspeaker, the perceived location of the sound source starts to move towards the left sound loudspeaker (direct) and this is known as summing localization.
  3. When a delay between 1msec and approx. 15msec of the original sound is delivered form the right loudspeaker, the sound source is perceived as directly coming from the left loudspeaker even though the volume levels are identical between the left and right loudspeakers
  4. When the delay in the right loudspeaker exceeds 15msec, the direct sound is precisely localized in the left loudspeaker; however and delayed sound (right loudspeaker) is also now localized as a distinct sound and is perceived as an early reflection of the direct sound creating a sense of distance and ambience.
  5. If the sound source in the right loudspeaker is delayed between 1msec-15msec from the from the same sound source of the left loudspeaker and is louder in amplitude (+3db to +6db), the listener will perceive the sound as coming from the left loudspeaker even though the right loudspeaker is louder in amplitude.

The slight difference in milliseconds (1msec-15msec) of an identical sound arriving from both loudspeakers overrides the difference in amplitude between both loudspeakers and will appear to be discretely localized in the non-delay speaker.

The experiments show how we are capable of correctly localizing a sound source in the presence of reverberation, provided the reflection arrives within 15msec of the direct sound. The possibilities for using the precedence effect in widening the image of the dedicated center speaker for surround sound mixing will be explored and experimented later in the article. If the second-arriving sound is at least 15 dB louder than the first, the precedence effect breaks down.

Figure 8: The Precedence Effect (Haas Effect)