Learn the Language of Good Sound
Better understand the difference a hi-fi system can make
If you love music, you love good sound. You know it when you hear it. So why bother learning how to describe it?
When you know how to pinpoint what's good or bad about the sound of an audio system, you'll have a much better idea how to fine tune it. Knowing precisely what to listen for will make it easier for you to distinguish the good components from the ordinary or overhyped ones when you're shopping for new gear. And you'll get more out of the product reviews you read if you can decode the lingo some professional reviewers toss around.
To help expand your hi-fi vocabulary, we are pleased to present the following excerpt from the Introductory Guide to High-Performance Audio Systems by Robert Harley , Editor-in-Chief of The Absolute Sound magazine.
In the book's introduction, Mr. Harley encourages all music lovers to explore the realm of high-end audio, which does not necessarily mean high-priced audio. We couldn't agree more — It doesn't take deep pockets or "golden ears" to obtain and appreciate good sound.
An excerpt from Introductory Guide to High-Performance Audio Systems. © 2007 by Robert Harley. Reprinted with permission. www.hifibooks.com. The author's opinions don't necessarily match those of Crutchfield or any of its employees.
Sonic Descriptions and their Meanings
The biggest problem in critical listening is finding words to express our perceptions and experiences. We hear things in reproduced music that are difficult to identify and put into words. A listening vocabulary is essential not only to conveying to others what we hear, but also to recognizing and understanding our own perceptions. If you can attach a descriptive name to a perception, you can more easily recognize that perception when you experience it again.
By describing in detail the specific sonic characteristics of how electronic components change the sound of music passing through them, I hope to attune you to recognizing those same characteristics when you listen. After reading this next section, listen to two products for yourself and try to hear what I'm describing. It can be any two products—if you have a portable CD player, hook it up to your system and compare it to your home CD player. Even comparing a CD and an MP3 file made from that CD will get you started. The important thing is to start listening analytically. If you don't hear the sonic differences immediately, keep listening. The more you listen, the more sensitive you'll become to those differences.
I notice this first-hand when I occasionally spend time listening critically in my listening room with visiting manufacturers and designers of high-end equipment—many of them highly skilled listeners. While we share many commonalities in determining what sounds good, there is a wide range of perception about what aspects of the presentation are most important.
You should also know that recordings made with audiophile techniques are more revealing of some aspects of reproduced sound than recordings made for mass consumption. For example, a recording of classical music made in a concert hall with very few microphones, a simple signal path, and high-quality recording equipment will likely reveal more about a component's soundstaging performance than a pop recording made in a studio. Similarly, most mass-market recordings have almost no dynamic range so that they sound "good" on a 4" car-stereo speaker. For these reasons, some of the sonic terms described in this chapter apply much more to audiophile-quality recordings than to mass-market ones.
It's also useful to understand the broad terms that describe the audio frequency band. The range of human hearing, which spans ten octaves from about 16Hz (cycles per second) to 20,000Hz, or 20 kilohertz (20kHz), can be divided into the specific regions described below. Note that these divisions are somewhat arbitrary; you can't say specifically that the lower treble begins at 2,000Hz and not 2,500Hz, for example. The table nonetheless provides a rough guideline for understanding the relationship between frequency ranges and their descriptive names.
|Lower Limit (Hz)||Upper Limit (Hz)||Description|
This rough guide will help you understand the following terms and definitions. A full characterization of how a product "sounds" will include aspects of each of the following sonic qualities.
The first aspect of the musical presentation to listen for is the product's overall tonal balance. How well balanced are the bass, midrange, and treble? If it sounds as though there is too much treble, we call the presentation bright. The impression of too little treble produces a dull or rolled-off sound. If the bass overwhelms the rest of the music, we say the presentation is heavy or weighty. If we hear too little bass, we call the presentation thin, lightweight, uptilted, or lean.
A product's tonal balance is a significant—and often overwhelming—aspect of its sonic signature.
The term perspective describes the apparent distance between the listener and the music. Perspective is largely a function of the recording (particularly the distance between the performers and the microphones), but is also affected by components in the playback system. Some products push the presentation forward, toward the listener; others sound more distant, or laid-back. The forward product presents the music in front of the loudspeakers; the laid-back product makes the music appear slightly behind the loudspeakers. Put another way, the forward product sounds as though the musicians have taken a few steps toward you; the laid-back product gives the impression that the musicians have taken a few steps back.
Another way of describing perspective is by row number in a concert hall. Some products seem to "seat" the listener at the front of the hall—in Row D, say. Others give you the impression that you're sitting farther back; say, in Row S. Several other terms describe perspective. Dry generally means lacking reverberation and space, but can also apply to a forward perspective. Other watchwords for a forward presentation are immediate, incisive, vivid, aggressive, and present. Terms associated with laidback include lush, easygoing, and gentle.
Products with a forward presentation produce a greater sense of an instrument's presence before you, but can quickly become fatiguing. Conversely, if the presentation is too laid-back, the music is uninvolving and lacking in immediacy.
A laid-back presentation invites the listener in, pulling her gently forward into the music, allowing her the space to explore its subtleties. It's like the difference between having a conversation with someone who is aggressive, gets in your face, and talks too loudly, compared with someone who stands back, speaking quietly and calmly.
In loudspeakers, perspective is often the result of a peak or dip in the midrange (a peak is too much energy, a dip is too little). In fact, the midrange between 1kHz and 3kHz is called the presence region because it provides a sense of presence and immediacy. The harmonics of the human voice span the presence region; thus, the voice is greatly affected by a product's perspective.
Good treble is essential to high-quality music reproduction. In fact, many otherwise excellent audio products fail to satisfy musically because of poor treble performance. The treble characteristics we want to avoid are described by the terms bright, tizzy, forward, aggressive, hard, brittle, edgy, dry, white, bleached, wiry, metallic, sterile, analytical, screechy, and grainy. Treble problems are pervasive; look how many adjectives we use to describe them.
If a product has too much apparent treble, it overstates sounds that are already rich in high frequencies. Examples are overemphasized cymbals, excessive sibilance (s and sh sounds) in vocals, and violins that sound thin. A product with too much apparent treble is called bright. Brightness is a prominence in the treble region, primarily between 3kHz and 6kHz. Brightness can be caused by a rising frequency response in loudspeakers, or by poor electronic design. Many CD players and solid-state amplifiers that measure as having a flat (accurate) frequency response nevertheless add prominence to the treble.
The presentation will lack life, air, openness, extension, and a sense of space if the treble is too soft.
Tizzy describes too much upper treble (6kHz-10kHz), characterized as a whitening of the treble. Tizzy cymbals have an emphasis on the upper harmonics, the sizzle and air that rides over the main cymbal sound. Tizziness gives cymbals more of an ssssss than a sssshhhh sound.
Forward, if applied to treble, is very similar to bright; both describe too much treble. A forward treble, however, also tends to be dry, lacking space and air around it. Many of the terms listed above have virtually identical meanings. Hard, brittle, and metallic all describe an unpleasant treble characteristic that reminds one of metal being struck. In fact, the unique harmonic structure created from the impact of metal on metal is very similar to the distortion introduced by a solid-state power amplifier when it is asked to play louder than it is capable of playing.
A particularly annoying treble characteristic is graininess. Treble grain is a coarseness overlaying treble textures. I notice it most on solo violin, massed violins, flute, and female voice. On flute, treble grain is recognizable as a rough or fuzzy sound that seems to ride on top of the flute's dynamic envelope. (That is, the grain follows the flute's volume.) Grain makes violins sound as though they're being played with hacksaw blades rather than bows—a gross exaggeration, but one that conveys the idea of the coarse texture added by grain.
The most common sources of these problems are, in rough order of descending magnitude: tweeters in loudspeakers, overly reflective listening rooms, digital source components (usually the CD player or digital processor), preamplifiers, power amplifiers, cables, and dirty AC power sources. So far, I've discussed only problems that emphasize treble. Some products tend to make the treble softer and less prominent than live music. This characteristic is often designed into the product, either to compensate for treble flaws in other components in the system, or to make the product sound more palatable. Deliberately softening the treble is the designer's shortcut; if he can't get the treble right, he just makes it less offensive by softening it.
The following terms, listed in order of increasing magnitude, describe good treble performance: smooth, sweet, soft, silky, gentle, liquid, and lush. When the treble becomes overly smooth, we say it is romantic, rolled-off, or syrupy. A treble described as "smooth, sweet, and silky" is being complimented; "rolled-off and syrupy" suggests that the component goes too far in treble smoothness, and is therefore colored. A rolled-off and syrupy treble may be blessed relief after hearing bright, hard, and grainy treble, but it isn't musically satisfying in the long run. Such a presentation tends to become bland, uninvolving, slow, thick, closed-in, and lacking detail. All these terms describe the effects of a treble presentation that errs too far on the side of smoothness.
The presentation will lack life, air, openness, extension, and a sense of space if the treble is too soft. The music sounds closed-in rather than being big and open. The best treble presentation is one that sounds most like real music. It should have lots of energy—cymbals can, after all, sound quite aggressive in real life—yet not have a synthetic, grainy, or dry character. We don't hear these characteristics in live music; we shouldn't hear them in reproduced music. More important, the treble should sound like an integral part of the music, not a detached noise riding on top of it. If a component has a colored treble presentation, however, it is far less musically objectionable if it errs on the side of smoothness rather than brightness.
J. Gordon Holt, Stereophile magazine founder and the father of observational audio equipment evaluation, once wrote, "If the midrange isn't right, nothing else matters."
The midrange is important for several reasons. First, most of the musical energy is in the midrange, particularly the important lower harmonics of most instruments. Not only does this region contain most of the musical energy, but the human ear is much more sensitive to midrange and lower treble than to bass and upper treble. Specifically, the ear is most sensitive to sounds between about 800Hz and 3kHz, and to small changes in both volume and frequency response within this band. The ear's threshold of hearing—i.e., the softest sound we can hear—is dramatically lower in the midband than at the frequency extremes. We've developed this additional midband acuity probably because the energy of most of the sounds we heard every day for hundreds of thousands of years—the human voice, rustling leaves, the sounds of other animals—is concentrated in the midrange.
Midrange colorations can be extremely annoying. Loudspeakers with peaks and dips in the mids sound very unnatural; the midrange is absolutely the worst place for loudspeaker imperfections. Confining our discussion to loudspeakers for the moment, midrange colorations overlay the music with a common characteristic that emphasizes certain sounds. The male speaking voice is particularly revealing of midrange anomalies, which are often described by comparisons with vowel sounds. A particular coloration may impart an aaww sound; a coloration lower in frequency may emphasize ooohhh sounds; a higher-pitched coloration may sound like eeeee; another coloration might sound hooty.
Some midrange colorations can be likened to the sound of someone speaking through cupped hands. Try reading this sentence while cupping your hands around your mouth. Open and close your hands while listening to how the sound of your voice changes. That's the kind of midrange coloration we sometimes hear from loudspeakers—particularly mass-market ones.
Loudspeakers with peaks and dips in the mids sound very unnatural; the midrange is absolutely the worst place for loudspeaker imperfections.
In short, if recordings of male speaking voice sound monotonous, tiring, and resonant, it's probably the result of peaks and dips in the loudspeaker's frequency response. (These colorations are most apparent on male voice when listening to just one loudspeaker.)
Terms to describe poor midrange performance include peaky, colored, chesty, boxy, nasal, congested, honky, and thick. Chesty describes a lower-midrange coloration that makes vocalists sound as though they have colds. Boxy refers to the impression that the sound is coming out of a box instead of existing in open space. Nasal is usually associated with an excess of energy that spans a narrow frequency range, producing a sound similar to talking with your nose pinched. Honky is similar to nasal, but higher in frequency and spanning a wider frequency range.
As described previously under "Perspective," too much midrange energy can make the presentation seem forward and "in your face." A broad dip in the midrange response (too little midrange energy over a wide frequency span) can give an impression of greater distance between you and the presentation.
When choosing loudspeakers, be especially attuned to the midrange colorations described. What is a very minor—even barely noticeable—problem heard during a brief audition can turn into a major irritant over extended listening.
The preceding descriptions apply primarily to midrange problems introduced by loudspeakers. Expanding the discussion to include electronics (preamps and power amps) and source components (LP playback or a digital source) introduces different aspects of midrange performance that we should be aware of.
An important factor in midrange performance is how instrumental textures are reproduced. Texture is the physical impression of the instrument's sound—its fabric rather than its tone. The closest musical term for texture is timbre, defined by Merriam Webster's Collegiate Dictionary, Tenth Edition as "the quality given to a sound by its overtones; the quality of tone distinctive of a particular singing voice or instrument." Sonic artifacts added by electronics often affect instrumental and vocal textures.
The term grainy, introduced in the description of treble problems, also applies to the midrange. In fact, midrange grain can be more objectionable than treble grain. Midrange grain is characterized by a coarseness of instrumental and vocal textures; the instrument's texture is granular rather than smooth.
Midrange textures can also sound hard and brittle. Hard textures are apparent on massed voices; a choir sounds glassy, shiny, and synthetic. This problem gets worse as the choir's volume increases. At low levels, you may not hear these problems. But as the choir swells, the sound becomes hard and irritating. Piano is also very revealing of hard midrange textures, the higher notes sounding annoyingly brittle. When the midrange lacks these unpleasant artifacts, we say the textures are liquid, smooth, sweet, velvety, and lush.
Bass performance is the most misunderstood aspect of reproduced sound, among the general public and hi-fi buffs alike. The popular belief is that the more bass, the better. This is reflected in ads for "subwoofers" that promise "earthshaking bass" and the ability to "rattle pant legs and stun small animals." The ultimate expression of this perversity is boom trucks that have absurd amounts of extraordinarily bad bass reproduction.
We don't just want the physical feeling that bass provides; we want to hear subtlety and nuance.
But we want to know how the product reproduces music, not earthquakes. What matters to the music lover isn't quantity of bass, but the quality of that bass. We don't just want the physical feeling that bass provides; we want to hear subtlety and nuance. We want to hear precise pitch, lack of coloration, and the sharp attack of plucked acoustic bass. We want to hear every note and nuance in fast, intricate bass playing, not a muddled roar. If Ray Brown, Stanley Clarke, John Patitucci, Dave LaRue, Dave Holland, or Eddie Gomez is working out, we want to hear exactly what they're doing. In fact, if the bass is poorly reproduced, we'd rather not hear much bass at all.
Correct bass reproduction is essential to satisfying musical reproduction. Low frequencies constitute music's tonal foundation and rhythmic anchor. Unfortunately, bass is difficult to reproduce, whether by source components, power amplifiers, or—especially—loudspeakers and rooms.
Perhaps the most prevalent bass problem is lack of pitch definition or articulation. These two terms describe the ability to hear bass as individual notes, each having an attack, a decay, and a specific pitch. You should hear the texture of the bass, whether it's the sonorous resonance of a bowed double bass or the unique character of a Fender Precision. Low frequencies contain a surprising amount of detail when reproduced correctly.
When the bass is reproduced without pitch definition and articulation, the low end degenerates into a dull roar underlying the music. You hear low-frequency content, but it isn't musically related to what's going on above it. You don't hear precise notes, but a blur of sound—the dynamic envelopes of individual instruments are completely lost. In music in which the bass plays an important rhythmic role—rock, electric blues, and some jazz—the bass guitar and kick drum seem to lag behind the rest of the music, putting a drag on the rhythm. Moreover, the kick drum's dynamic envelope (what gives it the sense of sudden impact) is buried in the bass guitar's sound, obscuring its musical contribution. These conditions are made worse by the common mid-fi affliction of too much bass.
Terms descriptive of this kind of bass include muddy, thick, boomy, bloated, tubby, soft, congested, loose, and slow.Terms that describe excellent bass reproduction include taut, quick, clean, articulate, agile, tight, and precise. Good bass has been likened to a trampoline stretched taut; poor bass is a trampoline hanging slackly.
The amount of bass in the musical presentation is very important; if you hear too much, the music is overwhelmed. Excessive bass is a constant reminder that you're listening to reproduced music. This overabundance of bass is described as heavy. If you hear too little bass, the presentation is thin, lean, threadbare, or overdamped.
An overly lean presentation robs music of its rhythm and drive—the full, purring sound of bass guitar is missing, the depth and majesty of double bass or cello are gone, and the orchestra loses its sense of power. Thin bass makes a double bass sound like a cello, a cello like a viola. The rhythmically satisfying weight and impact of bass drum are reduced to shadows of their former power. Instruments' harmonics are emphasized in relation to the fundamentals, giving the impression of well-worn cloth that's lost its supporting structure. A thin or lean presentation lacks warmth and body. As described earlier in this chapter in the discussion of audio sins of commission and omission, an overly lean bass is preferable to boomy bass.
Two terms related to what I've just described about the quantity of bass are extension or depth. Extension is how deep the bass goes—not the bass and upper bass described by lean or weighty, but the very bottom end of the audible spectrum. This is the realm of kick drum and pipe organ. All but the very best systems roll off (reduce in volume) these lowermost frequencies. Fortunately, deep extension isn't a prerequisite to high-quality music reproduction. If the system has good bass down to about 35Hz, you don't feel that much is missing. Pipe-organ enthusiasts, however, will want deeper extension and are willing to pay for it. Reproducing the bottom octave correctly can be very expensive. Much of music's dynamic power—the ability to convey wide differences between loud and soft—is contained in the bass. Though I'll discuss dynamics later in this section, bass dynamics bear special discussion—they are that important to satisfying music reproduction.
A system or component that has excellent bass dynamics will provide a sense of sudden impact and explosive power. Bass drum will jump out of the presentation with startling power. The dynamic envelope of acoustic or electric bass is accurately conveyed, allowing the music full rhythmic expression. We call these components punchy, and use the terms impact and slam to describe good bass dynamics.
A related aspect is speed, though, as applied to bass, "speed" is somewhat of a misnomer. Low frequencies inherently have slower attacks than higher frequencies, making the term technically incorrect. But the musical difference between "slow" and "fast" bass is profound. A product with fast, tight, punchy bass produces a much greater rhythmic involvement with the music. (This is examined in more detail later.)
Although reproducing the sudden attack of a bass drum is vital, equally important is a system's ability to reproduce a fast decay; i.e., how a note ends. The bass note shouldn't continue after a drum whack has stopped. Many loudspeakers store energy in their mechanical structures and radiate that energy slightly after the note itself. When this happens, the bass has overhang, a condition that makes kick drum, for example, sound bloated and slow. Music in which the drummer used double bass drums is particularly revealing of bass overhang. If the two drums merge into a single sound, overhang is probably to blame. You should hear the attack and decay of each drum as distinct entities. Components that don't adequately convey the sudden dynamic impact of low-frequency instruments rob music of its power and rhythmic drive.
Soundstaging is the apparent physical size of the musical presentation. When you close your eyes in front of a good playback system, you can "see" the instrumentalists and singers before you, often existing within an acoustic space such as a concert hall. The soundstage has the physical properties of width and depth, producing a sense of great size and space in the listening room. Soundstaging overlaps with imaging, or the way instruments appear as objects hanging in three-dimensional space within the recorded acoustic. As mentioned previously in this chapter, a large and well-defined soundstage is most often heard when playing audiophile-grade recordings made in a real acoustic space such as a concert hall or church.
Of all the ways music reproduction is astounding, soundstaging is without question the most miraculous.... Unfortunately, many products destroy or degrade the subtle cues that provide soundstaging.
The most obvious descriptions of the soundstage are its physical dimensions—width and depth. You hear the musical presentation as existing beyond the left and right loudspeaker boundaries, and extending farther away from you than the wall behind the loudspeakers.
Of all the ways music reproduction is astounding, soundstaging is without question the most miraculous. Think about it: The two loudspeakers are driven by two-dimensional electrical signals that are nothing more than voltages that vary over time. From those two voltages, a huge, three-dimensional panorama unfolds before you. You don't hear the music as a flat canvas with individual instruments fused together; you hear the first violinist to the left front of the presentation, the oboe farther back and toward the center, the brass behind the basses on the right, and the tambourine behind all the other instruments at the very rear. The sound is made up of individual objects existing within a space, just as you would hear at a live performance. Moreover, you hear the oboe's timbre coming from the oboe's position, the violin's timbre coming from the violin's position, and the hall reverberation surrounding the instruments. The listening room vanishes, replaced by the vast space of the concert hall—all from two voltages.
A soundstage is created in the brain by the time and amplitude differences encoded in the two audio channels. When you hear instrumental images toward the rear right of the soundstage, the ear/brain is synthesizing those aural images by processing the slightly different information in the two signals arriving at your ears. Visual perception works the same way: there is no depth information present on your retinas; your brain extrapolates the appearance of depth from the differences between the two flat images.
Audio components vary greatly in their abilities to present these spatial aspects of music. Some products shrink soundstage width and shorten the impression of depth. Others reveal the glory of a fully developed soundstage. I find good soundstage performance crucial to satisfying musical reproduction. Unfortunately, many products destroy or degrade the subtle cues that provide soundstaging.
Terms descriptive of poor soundstage width are narrow and constricted—the music, squeezed together between the loudspeakers, does not envelop the listener. A soundstage lacking depth is called flat, shallow, or foreshortened. Ideally, the soundstage should maintain its width over its entire depth. A soundstage that narrows toward the presentation's rear robs the music of its size and space.
The illusion of soundstage depth is aided by resolution of low-level spatial cues such as hall reflections and reverberation. In particular, the reverberation decay after a loud climax followed by a rest helps define the acoustic space. The loud signal is like a flash of light in a dark room; the space is momentarily illuminated, allowing you to see its dimensions and characteristics.
Now that we've covered space and depth, let's discuss how the instrumental images appear within this space. Images should occupy a specific spatial position in the soundstage. The sound of the bassoon, for example, should appear to emanate from a specific point in space, not as a diffuse and borderless image. The same could be said for guitar, piano, sax, or any other instrument in any kind of music. The lead vocal should appear as a tight, compact, definable point in space exactly between the loudspeakers.
Some products, particularly large loudspeakers, distort image size by making every instrument seem larger than life—a classical guitar suddenly sounds ten feet wide. A playback system should reveal somewhat correct image size, from a 60'-wide symphony orchestra to a solo violin. I say "somewhat" because it is impossible to recreate the correct spatial perspectives of such widely divergent sound sources through two loudspeakers spaced about 8' apart. Although image size and placement are characteristics inherent in the recording, they are dramatically affected by components in the playback system.
Terms that describe a clearly defined soundstage are focused, tight, delineated, and sharp. Image specificity also describes tight image focus and pinpoint spatial accuracy. A poorly defined soundstage is described as homogenized, blurred, confused, congested, thick, and lacking focus.
Some products produce a crystal-clear, see-through soundstage that allows the listener to hear all the way to the back of the hall. Such a transparent soundstage has a lifelike immediacy that makes every detail clearly audible. Conversely, an opaque soundstage is thick or murky, with less of an illusion of "seeing" into space. Veiling is often used to describe a lack of transparency.
Finally, superb soundstaging is relatively fragile. You need to sit directly between the loudspeakers, and every component in the playback chain must be of high quality. Soundstaging is easily destroyed by low-quality components, a bad listening room, or poor loudspeaker placement. This isn't to say you have to spend a fortune to get good soundstaging; many very-low-cost products do it well, but it is more of a challenge to find those bargains.
The dynamic range of an audio system isn't how loudly it will play, but rather the difference in level between the softest and loudest sounds that the system can reproduce. It is often specified technically as the difference between the component's noise level and its maximum output level. A symphony orchestra has a dynamic range of about 100 decibels, or dB; a typical rock recording's dynamic range is about 10dB. In other words, comparatively speaking, the rock band is always loud; it has little dynamic range.
Dynamics are a very important part of music reproduction. They propel the music forward and involve the listener. Much of music's expression is conveyed by dynamic contrast, from pp (pianissimo) to fff (triple forte), as well as by very small dynamic inflections by the performer.
There are two distinct kinds of dynamics. Macrodynamics refers to the presentation's overall sense of slam, impact, and power—bass-drum whacks and orchestral crescendos, for examples. If the system has poor macrodynamics, we say the sound is compressed or squashed. Microdynamics occur on a smaller scale. They don't produce a sense of impact, but are essential to providing realistic dynamic reproduction. Microdynamics describe the fine dynamic structure in music, from the attack of a triangle or other small percussion instruments in the back of the soundstage to the suddenness of a plucked string on an acoustic guitar. Neither sound is very loud in level, but both have dynamic structures that require agility and speed from the playback system.
Products with good dynamics—macro and micro—make the music come alive, allowing a vibrancy and life to emerge. Dynamic changes are an important vehicle of musical expression; the more you hear the musicians' intent, the greater the musical communication between performers and listener. Some otherwise excellent components fail to convey the broad range of dynamic contrast.
The best products will reveal all the low-level cues that make music interesting and riveting, but not in a way that results in listening fatigue.
These characteristics are associated with transient response, a system's ability to quickly respond to an input signal. A transient is a short-lived sound, such as that made by percussion instruments. Transient response describes an audio system's ability to faithfully reproduce the quickness of transient signals. For example, a drum being struck produces a waveform with a very steep attack (the way the sound begins) and a fast decay (the way a sound stops). If any component in the playback system can't respond as quickly as the waveform changes, a distortion of the music's dynamic envelope occurs, and the steepness is slowed. Audio components described as quick or fast reproduce the suddenness of transient signals.
But just because a component or system can reproduce loud and soft levels doesn't necessarily mean it has good dynamics. We're looking for more than a wide dynamic range. The system must be capable of expressing fine gradations of dynamics, not just loud and soft. As the music changes in level (which, except in many rock recordings, it's doing most of the time), you should hear loudness changes along a smooth continuum, not as abrupt jumps in levels.
Detail refers to the small or low-level components of the musical presentation. The fine inner structure of an instrument's timbre is one kind of detail. The term is also associated with transient sounds (those with a sudden attack) at any level, such as those made by percussion instruments. A playback system with good resolution of detail will infuse music with that sense that there is simply more music happening.
Assembling a good-sounding music system or choosing between two components can often be a tradeoff between smoothness and the resolution of detail. Many audio components hype detail, giving transient signals an etched character. Etch is an unpleasant hardness of timbre on transients that emphasizes their prominence. Sure, you can hear all the information, but the presentation becomes too aggressive, analytical, and fatiguing: low-level information is brought up and thrust at you, and you feel a sense of relief when the music is turned down or off—not a good sign.
Ultimately, musicality—not dissecting the sound—is what high-end audio is all about.
Components that err in the opposite direction don't have this etched and analytical quality, but neither do they resolve all the musical information in the recording. These components are described as overly smooth, or having low resolution. They tend to make music bland by removing parts of the signal needed for realistic reproduction. These kinds of components don't rivet your attention on the music; they are uninvolving and dull. You aren't offended by the presentation, as you are with an analytical system, but something is missing that you need for musical satisfaction.
It is a rare product indeed that presents a full measure of musical detail without sounding etched. The best products will reveal all the low-level cues that make music interesting and riveting, but not in a way that results in listening fatigue —that sense of tiredness after a long listening session. The music playback system must walk the very fine line between resolution of real musical information and sounding analytical.
Finally, we get to the most important aspect of a system's presentation—musicality. Unlike the previous characteristics, musicality isn't any specific quality that you can listen for, but the overall musical satisfaction the system provides. Your sensitivity to musicality is destroyed when you focus on a certain aspect of the presentation; i.e., when you listen critically. Instead, musicality is the gestalt, the whole of your reaction to the reproduced sound. We also use the term involvement to describe this oneness with the music. A sure indication that a component or a system has musicality is when you sit down for an analytical listening session and minutes later find yourself immersed in the music and abandoning the critical listening session. This has happened many times to me as a reviewer, and is a good measure of the product's fundamental "rightness." Ultimately, musicality—not dissecting the sound—is what high-end audio is all about.
Robert Harley is the Editor-in-Chief of The Absolute Sound magazine and author of The Complete Guide to High-End Audio, Introductory Guide to High-Performance Audio Systems, and Home Theater for Everyone. His books have sold more than 150,000 copies in four languages.