The Practice of Mastering - 1 : History
In all musical styles, most musical production destined for commercial distribution now undergoes a final process of verification and transformation before pressing called mastering. An unavoidable step for people working in the audio industry, this process, largely unknown to the wider public, is often misunderstood and even looked down on by people working on the fringes of commercial circuits. In the face of such a disparity of positions, it seemed important to us to provide a general context, accompanied with technical explanations and practical clarifications on the subject. Concretely focused on the connections between mastering and electroacoustics, this study is based foremost on documentary research about the professional practice of mastering. The experience of the author in the two domains has also contributed, as have two informal studies undertaken in the autumn of 2002. The first of these studies was a survey of organizations responsible for releasing electroacoustic music about their use of mastering, and the second a survey of member composers of the Canadian Electroacoustic Community (CEC).
|Definition of Mastering|
The author of this paper is easily frustrated by any discussion without at least the potential to lead to a practical result, which is the case with most quibbles revolving around preliminary definitions. The dryness of the below definition has as its goal to discourage any such initiative. A progressive elaboration of the concept of mastering in the mind of the reader is one of the goals of this study. .
"Mastering is the set of activities in the audio chain between the final production of the music on an intermediary format and its transfer to a distribution format."
In presenting mastering, one is obligated, before anything else, to discuss the equipment required and standards used. We are now going through an extremely confused period in the history of the audio industry, marked by a proliferation of formats and by the concomitant abandonment of a certain number of normative practices. Mainly responsible for this situation are a number of large-scale commercial wars presently in progress, waged with the reinforcements of mental manipulation and misinformation. In this context, the only usable references for the development of a study that hopes only to devote itself to current tendencies and practice are of three types:
- the marketing propaganda direct from the manufacturers;
- "trend reports" published by specialized magazines in the audio industry. When these are not directly sponsored, they are drawn up from the only information available to their authors, again the documents generously distributed to the press by manufacturers; There are many and various occasions for such distribution: the launch of new products and new formats, commercial fairs and other events, variously named but always organized with the same goal of presentation and sale. And add to this the "state of affairs reports" regularly published by the same companies ;
- information, impressions and opinions from users, producers and consumers in the immediate entourage of the "researcher".
Now that any illusion of scientificity has been healthily dispensed with, the reader will understand that the mania- often pretentious- for specifying sources at the outset would here be entirely futile. In this study, there is no real source: the least information must be decoded, reinterpreted and confronted as often as possible with a version symmetrically contrary coming from the competition.
To offer a synthesis of all of this would be rigorously impossible without adopting an ideological position, a true preliminary vision of the author. It is this vision which is, in all honesty, truly presented here, with the support of a selection of facts. Let us specify from the outset, to spare at least some effort of interpretation: the point of view presented here is that of a sympathizer of the Left, who feels a cold contempt towards the current methods and ideologies of the industry, which he considers cynical and irresponsible. But he also has an ear that rebels on hearing the experimental and militant sonic constructions of certain comrades. Irritated by their sparse sound, puny and thin, he sees, in the verbose explanations of content and in the enticing displays of creativity, only fastidious demonstrations of incompetence.
The informative content- or a stylistic illusion of such- will thus be maintained or abandoned according to the author's capacity for writing and dissimulation. Under the cover of caprice, he will oversee a mixture of real information with abusive extensions intended to support his ideals.
And, of course, without any references.
|Section 1 : History of Commercial Mastering, from Vinyl to DVD|
Objective of this section: To give the reader a precise idea of what was and is the general context of mastering. Thus:Details of the steps in audio production situated before and after the mastering step are only described insofar as they clarify the principal purpose; Current arguments justifying the intervention of the mastering engineer are only touched on here: this is the main purpose of the section following.
|The analog period|
Mastering, as an activity distinct from recording, first appeared in 1948, with the introduction of the first commercial tape recorder, which was immediately adopted as an alternative to the previous process. Before this date, all recordings were made by printing directly to vinyl in real time, to the exclusion of any possibility of editing. The first mastering engineers- the exact name was transcription engineer- were novices in the profession to whom were given the thankless task of transforming the master tapes coming from the recording studios into a product capable of surviving the process of vinyl cutting. After a few years, promotion became possible to more creative and prestigious positions for sound engineers specializing in recording, or, very gradually after 1955 when multitrack recording became possible, mixing.
|The problems with vinyl|
The main concern has been, then and now, of aiming to safeguard the largest possible part of the sound quality of the master tape, by navigating among the numerous pitfalls caused by the nature of the medium itself, and in trying to circumvent a sufficiently drastic set of limitations. The groove, of the thickness of a hair, carries pitch information laterally, and amplitude information vertically.
- bass frequencies therefore affect the total width of the groove, to the detriment of the length of the programme;
- while the thickness of vinyl affects the dynamic range available.
When stereo cutting makes its appearance, in 1957, the problem becomes more complicated: if the information of the two channels is out of phase, especially in the bass frequencies, the needle is confronted by a groove which dilates and contracts in width, which it obviously cannot read, unless the height of the groove, coinciding miraculously- and in inverse proportion- with the lateral topology, permits it.
The other problem of size concerns the very high frequencies, inevitably limited by the maximum speed of the cutting arm, but becoming entirely impossible to cut once the acceleration of the frequency becomes too important, in other words once the sudden jump between two high frequencies becomes too large. Transients, partly controlled by smoothing circuits, are a clear example, but the phenomenon affects equally certain consonants- s, ch, and z sounds, etc- and a whole assortment of sounds produced by- or resembling- instruments like the hi-hat or the kabassa.
The set of these physical limitations has inevitable repercussions for the quality of reproduction of the audio content:
- no stereo signal is possible beyond a certain threshold in the bass frequencies;
- filtering is obligatory as soon as percussion appears with strong high frequency content: cymbals, maracas, etc;
- control over sibilants is obligatory, by a de-esser in the best case, and if not, again by passive filtering;
- absolute limits affect the speed of transients;
- there is a compromise between maximum frequency range and the length of the programme;
- there is a compromise between the maximum dynamic range and the costs associated with the thickness of the vinyl; (1)
- there is a compromise between the longevity of the finished LP and the costs associated with the quality of the vinyl.
The current "retro" movement favoring vinyl has led to a new credulity towards the virtues of the medium. However, one should know that while the power of cutting systems has gone from ten watts in the 1950s to a maximum of almost 500 watts around 1975, which has allowed for better reproduction of transients and sibilants, the quality of audio produced has evolved in parallel, notably with new formulations of magnetic tape of higher performance with respect to high frequencies. The net effect being that the number of compromises that must be made in order to cut vinyl has stayed the same. In addition, it should be said that the last cutting system from the Neumann factories, around 1990, presented only cosmetic differences from the models produced 15 years earlier, which leaves to the imagination the meaning of affirmations about the "progress" accomplished in this area over the past while..
1) In the interests of completeness, and understanding that this is only indirectly related to mastering, we should add to this list the problems that arise during reading of the microgroove:
- wow and flutter;
- weakness of stereo separation;
- motor noises, directly transmitted to the needle, and/or induced because of deficiencies in electrical insulation;
- surface noise, etc.
|The introduction of aesthetics in mastering|
If the first mastering studios all belonged to the principal record companies, the end of the sixties saw the emergence of independent studios, which, in an effort to build up a clientele, began to propose to improve the sound of master tapes. This practice, called at the time custom mastering, became progressively perceived as a prestigious specialty, an aura that the small number of available practitioners- numbering barely 150 in the US in 1978- tended to reinforce.
It was no longer simply an issue of dealing with the limitations of analog cutting, but also of intervening in certain aspects of the frequency and dynamic envelope of the product, in order to obtain a "cleaner" sound, a better separation between the instruments, a wider stereo field, more bite and life in the percussion, etc. The reputation of certain big names in mastering began to be made in this era, first based on precise and refined choices of reproduction equipment, of speakers and even of acoustic conception of studios. Certain combinations proved more effective than others, and henceforth a clear boundary separated recording studios from mastering studios.
In the same way, the profile of competence required to become a mastering engineer became more and more specific, and the time for training longer and longer: it was not only a question of knowing the equipment in depth, but also of being able to instantly identify the problematic frequencies in each mix, intuiting the level of compression to apply, in brief doing what's needed so that at the end of the process, the listener perceives an appreciable improvement in the sound compared to the un-retouched mix.
We will return, in the course of the next section, to the arguments invoked in favour of such intervention, in addition to the methods generally used to arrive at the desired results. Let it suffice here to say that, a few years later, the practice was universalized to such an extent that it became utterly unthinkable for a commercial audio production, even if only of medium scale, to not pass through this "optimizing" stage.
|The beginning of digital audio|
The success of custom mastering explains in part why the beginning of CD, in 1982, far from causing the decline of the mastering studio- since there was no "cutting" to do (2) - instead led to its development. Set free from the compromises imposed by the limitations of vinyl, engineers were able to go "further", not hesitating, for example, to accentuate to their needs the extreme high and low frequencies, or to expand the dynamic range, steps exactly opposite to those that would have been necessary only very recently in analog cutting. In effect, any intervention in the signal is dictated only by the desire to optimize the sound, which places even more direct responsibilities in the hands of the mastering engineer. Other parallel phenomena, such as the decline of AM radio and the generalization of stereo reproduction systems, notably in cars, would soon permit them to ignore equally the limitations imposed by mono summation. One could leave out-of-phase components in the signal, and even purposely make high frequencies out-of-phase to enlarge the stereo image.
The second explanation for the continuing importance of the mastering studio after the arrival of CD is tied to the fact that it has remained, up until the end of the 1990s, an obligatory step between the mixing studio and the manufacture of a CD. The manufacturing plants have for a long time only accepted specialized digital media for pressing, supported by a complex procedure of error verification. These media, like the U-matic 3/4" video cassette generated by the Sony 1630 system, or the 8mm cassette Exabyte from the DDP system, necessitate the purchase and maintenance of machinery as expensive as it is fragile. No recording studio to our knowledge has ever embarked on this adventure.
(2) The glass master, equivalent of the stamper in analog, is also made at the pressing plant, but directly from a digital signal on tape or cassette. For a vinyl record, the original is a purely mechanical medium, a master disk, called lacquer, which must be cut at the mastering studio.
|Extensions to the mastering role|
We have thus come to entrust the mastering studios with decisions of more and more serious consequence, associated with increasingly complex tasks, including editing among others. For example, the mixer is asked to provide several master versions of the same piece, each with minor differences in level of a critical instrument or group of instruments, or, even more often, the voice. It thus falls to the mastering engineer to select the version which offers him the most latitude for his work, or even, as the need arises, to reconstruct the piece in sections by combining these versions. The installation in studios of dual path A/B consoles offers him in addition the possibility of presetting a second series of treatments in advance, and to manually engage it at the right moment. One can thus flip back and forth in real time between one set A of treatments, intended for example for the verses, and a set B for the choruses.
A significant step in this extension of the role of the mastering engineer, even if ephemeral and exclusively linked to a specific musical style, was the dance mix fad, today devolved, in a modified form, to the DJ. A rhythmic pop song is delivered to the mastering studio with, along with its official mix, a series of excerpts, submixes of rhythm sections, solo voices, etc. The mastering engineer constructs an extended version of the song, destined for, among others, nightclubs, adding as required supplemental effects and even sounds from other sources.
This sophistication of his role puts the mastering engineer in a situation where he must contrast the benefits to be had from a supplemental intervention with the tape and the drawbacks tied to the multiplication of generations. One example: most often equipped with a 3 or 4-band parametric equalizer, he may be tempted to apply a preliminary equalization while copying the whole mix to an intermediate tape, in order to free up the equalizer and thus benefit from a new set of three or four frequencies on which he can intervene. But the drawbacks of such a step must be taken into account: augmentation of background noise and distortion, alteration of the frequency envelope, etc (copying in digital- mastering engineers should discover this early enough- is not really transparent either, although it affects different aspects of the signal). In the same way, and with the same drawbacks, we might want to run the mix twice through compressors, each time with different settings.
Certain artists and producers, of greater means, escape the terms of this dilemma by putting in place a work process which allows them to reap maximum benefit from the possibilities offered by mastering, without forcing their product to undergo the further degradation resulting from multiple copies. In a time when mixing consoles were not yet capable of storing and recalling settings, the solution consisted in booking the mixing studio for a period of time longer than the time for mixing, and most importantly, without interruption. After each mix, the console was left as is, with all of its settings intact, and the tape would be auditioned in the mastering studio. There, the engineer would make a few trial manipulations, and then draw up, with the producer, a list of modifications to bring to the configuration of the mixing console, devised to:
- avoid having to make equalizations which would lead to any kind of compromise; Example: the mastering engineer, having determined that the perceived excessive level of a particular frequency comes mostly from a particular track, would ask the mixer to equalize or lower the level of just the track in question, thus avoiding a global correction at this frequency.
- allow, inversely, equalizations to be made without prejudice to the whole; Example: We observe that the mix lacks fullness and that it would truly benefit, during mastering, from an increase in the bass frequencies, if only track XYZ would not then take on awkward proportions: again, an isolated track must be equalized or lowered, not because of what is heard in the mixing studio, but because of what one intends to do when mastering..
The mixer makes the required changes and produces a new master, which is again brought in for mastering. This back-and-forth process is repeated many times, until it is decided that the mix is perfectly shaped with the goal of an exclusively aesthetic mastering, and that the only changes which remain to be made can only be done in mastering. If one adds to the cost of this operation the waiting time during which the artist and/or producer, having brought the acetates (sometimes called a ref, an acetate is a single-sided vinyl reference record of limited lifespan) home, evaluate the completed step in a familiar listening environment, we'll get a good idea of the extravagance of the investments that come into play. What to say then about the case where the producer, judging that the sound of these acetates, too distant from that of an industrial finished product, does not give a faithful enough idea of the final result, commissions and listens to test pressings before freeing up the mixing studio?
|CD-R, master for pressing|
Opposite to these producers for whom the search for a certain perfection in audio seems to know no financial bounds, we find a certain number of artists for whom the additional budget obligatorily caused by mastering represents too steep a price. For such groups, the possibility of bypassing the mastering studio only effectively presented itself at the end of the 1990s, when pressing plants began to accept CD-Rs. But the pressing process that this medium engenders, and which even today is only used in productions of extremely limited budget- electroacoustics makes great use of it- is, in effect, extremely limiting to the level of quality, and without anyone involved even seeming to be aware of it:
- it is generally believed that the burning of any CD-R can be done, without prejudicial results, at the maximum speed offered by the burner. This is only true for CD-Rs of Yellow Book standard, used for data, which undergo a bit by bit verification. The older CD-audio standard, called Red Book, is much more permissive towards error. The number of errors, along with the amount of jitter increases with the speed of burning;
- even at 1x, the number of errors is very high. The burning protocol of an audio CD-R permits an error level up to 3%, which represents an impressive figure of 220 errors each second. However, a CD of high quality yields in general an error level per second between 20 and 30…
- finally, the CD-R is encouraged by the pressing plants because it permits them to produce a glass master directly at 2 or 4x, which represents an important savings in time, which of course is not seen on the bill sent to the client. By comparison, the Sony 1630 will only permit this transfer at 1x, unless a copy is made to an intermediate format. The glass master at 1x is the only one that can be said to have a reasonable resemblance to the original master.
Again, the weaknesses of CD-R do not suffice to explain the continued success of the mastering studio, even more so because they are unknown to its supporters: the general confidence of the industry in the mastering studio to optimize the quality of the audio product has been maintained. We will see that the arrival of DVD-Audio and SACD, the new high density audio formats, will further enhance this dependence.
|Digital Audio Workstations|
Mastering studios were for a long time exclusively stereo environments. The arrival on the market of Digital Audio Workstations or DAWs of top-of-the-line quality (SonicStudio from Sonic Solutions, Dyaxis from Studer¸ SADiE, and very recently, Nuendo from Steinberg) opened the door to a much larger extension to their possibilities for intervention, by giving them access to a stereo mix from a multitrack source. In practice, the new working methods takes on two forms:
- the studio receives on a data CD-ROM a set of synchronized stereo tracks, each containing an entirely premixed subgroup of instruments, also called stems. The original mix is reconstructed exactly simply by syncing each of these subgroups to a single reference point in time, reproducing each at unity gain. The engineer may then treat each of the instrument groups separately, each intervention now being less prejudicial to the ensemble while permitting more in depth work;
- the same principle is pushed even further: the studio obtains on CD-ROM a proprietary file containing all of the tracks from the original mix, again in a premixed form. This process necessitates that the two studios each have access to a compatible system. The ProTools packages, despite being systematically decried by mastering engineers for the mediocre quality of their treatments and the audio degradation caused by even the smallest calculation, are often used in this type of collaboration because of their high level of penetration in mid-range recording studios. In such cases, before any intervention, the mix is of course immediately exported to a system more in line with the norms of quality practiced in professional mastering.
It is important to make clear that the mastering engineer never attempts here to do the work of the mixer. He takes advantage only of the possibility of intervening more precisely and in a manner less prejudicial to the entire mix. In the past, situations would arise where, for example, the equalization of a voice which was too piercing could lead to the dulling of a guitar which had been well mixed, but close to the voice in the frequency domain: in this case, one can now intervene uniquely on the voice, which permits the conservation of a larger proportion of the original mix..
|The Level Wars|
Parallel to all of these evolutions, a new tendency saw the light of day, which would very negatively affect the reputation of mastering in certain domains- again, notably in that of electroacoustic music. This attitude persists even today, and we will assign ourselves the task of demystifying the roots.
From the middle of the 1970s, before the end of the vinyl period, the most profitable sector of the audio industry, particularly in rock, pop, disco, etc (but the phenomenon would not be long in spreading to country-and-western and jazz-rock), launched into an escalation that would often be called the level wars. It had been observed that listeners seeking a radio station to tune in to, had a tendency to stop on stations with the highest volume level. When questioned, the listeners responded that these seemed to "sound better". This discovery was another step in the systematization of the quest for profit, an addition to the 'scientific' toolkit for controlling the reactions of the music consumer. In this climate of collective exuberance, no one bothered to ask what might happen in the aforementioned spirit of the consumer after a few seconds of exposure to this uninterrupted wash of decibels.
Nevertheless, there is no lack of observations on the subject, especially concerning the progressive fatigability of the ear to high volume and high frequencies, and also the sense of intrusion produced by sustained levels of compression. We know that a higher level of volume permits sound to detach itself from ambient noise, thus permitting better perception of detail in the music, but in systems of limited dynamic- radio transmissions, vinyl records- there is little room for maneuver if we want to preserve a reasonable headroom for adequate reproduction of transients..
Rather than taking these killjoy reservations into account, the industry adopted the theory- in vogue even today, and mainly used by the supporters of vintage technologies- that certain types of distortion fascinate the ear more than the same programme reproduced with a purer sound. The amplification stage situated just after the radio tuner (dnd soon, as we shall see, the cartridges of record players) is attacked by an average signal equal or superior to the anticipated limit. It thus obligingly produces the expected distortion, and the listener, in a state of titillation, heads to the retailer to acquire the record. In this scenario, saturated with clumsy mercantilism, it is of little importance that the listener tires of the product at the beginning of time x or y, insofar as consumers' purchasing decisions have been found to be, after many studies on the subject, essentially impulsive. No problem is seen in seeking in the headroom the required decibels to crush both transients and the commercial competition, and equipment manufacturers have hastened to develop more and more transparent compressors/limiters, that is to say, more and more capable of raising the average volume level to the absolute ceiling.
This principle would soon apply to the cutting of vinyl, and the mastering engineers quickly saw themselves urged by producers to apply the most absurd levels of compression. The situation has degenerated to the point where studio time has been exclusively allocated to the quest for all possible means to obtain gain, in contempt of all the optimization techniques that led to the success of mastering. Some have even become specialists, exploiting the semblance of correction that frenzied compression, by radically flattening amplitude, seems to apply to problems with the frequency envelope. And it is thus that it has become common, for some, to associate overcompression with mastering. It must however be said that the vast majority of mastering engineers consider the level wars to be a practice strongly prejudicial to sound quality, and that they have continued to practice their profession without this constraint in productions of classical music, jazz, world music, new age, etc, styles which have never followed this trend.
Another exception worth highlighting is sound mixing for cinema, a domain in which norms both for absolute ceiling and for average level were quickly adopted. The current move promoting a return to an equilibrium of levels in pop-related music production is moreover inspired, as we will see, from these procedures.
|Towards the absolute limit|
At the dawn of digital, some believed that the CD, with its extended dynamic range, its capability of producing high and low frequencies at the limit of human hearing, and most of all its virtually instantaneous transients, would make superfluous and invalid the level wars, so high were the expectations of extracting musical exhilaration from this medium. Alas, the miracle did not happen, for several reasons, among others:
- the technological immaturity of the first converters and digital production tools;
- sound engineers' misunderstanding of the inherent weaknesses of the new medium, among others in its lack of fidelity in reproducing high frequencies;
- the fact that a large proportion of analog equipment used in recording, treatment and mixing still suffered at the time from inherent deficiencies: noise, distortion, severely restricted dynamic, coloration, etc, all made more perceptible by the relative transparence of the new medium.
In the end, first generation CDs were generally afflicted with an annoyingly constrained and metallic sound that succeeded in making people forget the advantages of the new medium. The level wars thus continued, now nourished by a new generation of digital compressors/limiters capable, by applying a short delay to the incoming signal, of 'anticipating' the amplitude peaks- the look-ahead function- and thus crushing ultra-fast transients that analog circuits would still let through.
Result: by comparison with levels practiced in 1980, there has been an average reduction of 17 dB on CD in the amount of headroom, which is now, in 2000, only 3 dB. The waveforms on the following pages, taken from a variety of productions from 1956 to 2002, allow this phenomenon to be seen:
A few remarks about these waveforms. Since the vertical scale is linear, the 50% mark represents only 6 dB below the absolute maximum, which gives the impression that the productions recorded with an appropriate headroom are in some way 'under-using' the available space. In reality, it is the recordings too close to the limit that profit least from the available dynamic sensitivity, since they are only using a range of 5-6 dB. Technically they could be compared to products of 'low density'. The first page (in blue) brings together reference waveforms, that is to say those from productions that in principle have seen little or no effect from the level wars:
- St Thomas reaches a maximum of 0.0 dB, which is surprising for a jazz recording of this quality, with a fluid and breathing sound. This level is however only attained at one single peak, found at the end of the first third;
- the weakness of the maxima in the 1st Movement of the Beethoven symphony and in La Colombe should not surprise, given that these are excerpts from longer works, in which subsequent sections reach higher levels.
Even though the previous pages were devoted to pop productions chosen somewhat at random, we remark that they illustrate quite faithfully the escalation of levels.
- 1968/1986: Glass Onion and Woodstock are CD remasters dating from the beginning of the 1990s, which explains why their average level should be higher than that of other waveforms from the same period;
- 1987/1994: Slave to the Rhythm is a Trevor Horn production, a producer with the reputation of an audiophile. Entirely recorded and mixed digitally on a Synclavier II system- extremely rare for the time- the CD has long remained a reference for audio professionals;
- 1995/2002: in order to arrive at a listening level subjectively comparable to those of the productions immediately before, the author of this study was forced for this period to reduce by 9 dB the output level of his D/A converter. Despite this, and despite the relative variety of styles represented, the listening proved uniformly unpleasant, indeed even, at the risk of sounding affected, agonizing;
- within the boundaries of pop music, Shakira and Peter Gabriel could be considered to belong to very opposed tendencies, as much in terms of recording techniques as in the modes of distribution and the intended audience. The similarity in the range of levels is thus all the more troubling…
- notice the evolution of the waveforms from the three songs by Prince, going from Kiss, among the most dynamic of its time, to the other two songs, among the most flattened of their period. The waveform from So Far, So Pleased resembles a square wave: here one attains the absolute limit of what is possible to inject on a CD.
|Will the pendulum swing back ?|
In the end, music producers, in their desire to catch, in a few seconds, the attention of DJs and VJs, program directors of radio/television stations, those in charge of sales at retailers, in brief the decision makers and key people in broadcasting and distribution circuits, have imposed on purchasers of CDs, their true clients, the following situation:
- passing innocently from one CD to another, without taking the precaution of monitoring his volume knob, the consumer may be victim of an explosion in average amplitude on the order of 10 to 12 dB! (A reminder: each jump of 6 dB doubles the perceived volume)
- certain low-range CD players systematically produce an audible distortion at all volume settings when reproducing some of the most recent pop CDs;
- even in the case of players capable of sustaining prolonged high levels, the combination of a true low-resolution dynamic and a quasi-permanent over-compression is physically trying for the ear.
All of these annoyances tend to act as a deterrent to having any listening session among friends. In such conditions, it is not surprising that CD sales are in free fall, all the more so since recent observations tend to exonerate the Internet, in finding an association between the appearance of a product in free distribution in mp3 format and an increase in its CD sales. The audio industry therefore may well have to ask itself if there are reasons other than the Internet to explain these current disappointments. Some among the decision makers have started to suspect the effects of the level wars, among other signs of a generalized relaxation in vigilance over the level of sound quality.
The arrival of new audio formats, high density and multichannel, could have been seen as an occasion to begin from a new foundation, more respectful of the sonic integrity of audio productions and the ears of consumers. Alas! In the eyes of the decision makers, this arrival presents entirely different opportunities.
|The new formats|
Another sort of conflict has added itself to the causes of the dereliction of the audio industry, called the "format war". Inspired- like the level wars- by the bait of financial gain, and similarly pursued in contempt of the interests of consumers, this battle was not initiated by content producers but by the manufacturers of electronic reproduction equipment. Examples of fights for power over this area are abundant in the recent past: Mac vs. Windows, VHS vs. Beta, DBX vs. Dolby, etc, but never before have they reached the scope, the level of absurdity, and the paralyzing power that characterizes their impact on the current situation in audio.
Again the goal is shortsighted, but this time, the consequences, even in the short term, are disastrous for all, including the manufacturers from all camps. Nevertheless it seems that the prospect of imposing a new format covered by a series of patents and then to reap the dividends for life- even for units made by the competition- is replacing every other form of reason or motivation.
Here, where it would have logically sufficed to have a single multichannel and high-quality format as a successor to CD, we instead find, since 2000, two main contenders for the title of the new high-density audio medium, DVD-Audio and SACD, and two other systems claiming to become the new multichannel standard, Surround and Ambisonic. And this is but an extreme schematization of the real audio situation, not entering into the details of the various possible sub-combinations of the systems named above, nor into the maze of subscenarios equally in play: data compression, security encryption, number and placement of loudspeakers, density of the digital information, etc. If most audio professionals find themselves lost here, consumers, losing all confidence in the industry, are opting for a prudent path: they are slowing their CD purchases- another non-Internet reason explaining the plunge in sales- but are also restraining their investments in new audio reproduction equipment for the home. Compared to the speed of penetration in living rooms of standards like CD-Audio and DVD-Video, the new audio formats, mired in conflicts and contradictions, have stagnated now for almost a decade.
|High Density Audio: generalities|
Even if CD has succeeded in rapidly overcoming its initial failings in the matter of sound quality, due notably to oversampling and an increasing sophistication of D/A converters, it has never managed to convince audiophiles, who criticize it above all for its coldness. Despite this, no new analog medium has been proposed to replace CD at the consumer level. Commercial studios, who rapidly abandoned the 16-bit DAT as a master format in favor of half-inch analog tape at 30 inches/second, with or without Dolby SR, have, as soon as it became possible, rallied massively around high density digital, which now threatens to replace all other multitrack and master formats. Despite all of the ethical and prescriptive shortcomings of the manufacturers, a consensus can be seen to be emerging, as much among producers as consumers, for a better digital.
This has not stopped those who love 'debates' to question the real utility of high density audio formats. It may be interesting to reveal the arguments that have been put forward:
- sampling above 44.1 kHz is useless, since human hearing is limited to 20 kHz;
- why go beyond 16 bits of quantization, since the meaningful dynamic range in the vast majority of listening contexts does not even attain the 96 dB currently available?
These are shocking contentions, putting the emphasis uniquely on the extreme and marginal characteristics of high density. And even so, they are only partially true. Thus:
- tweeters for which the frequency response curve is identical up to 18 kHz, but different beyond this, are easily differentiated by most listeners;
- even if it were recognized that the ear does not perceive ultra-high frequencies, other parts of the body, particularly certain parts of the bone structure, respond to them;
- the sensation of an open and spacious listening is universally reported on auditions of material containing ultra-high frequencies;
- most consumer headphones easily reproduce 110 dB;
- thousands of people pack into nightclubs and raves in which sound systems play back at 120 dB and beyond.
Of course, the main reason for sampling at 96 kHz- rather than at 48 kHz, for example- is that it uses twice as many samples to 'describe' the audio content, at all frequencies. The digitization is thus finer and more realistic. On a wave of 48 cycles per second, it may seem superfluous in this example to go from 1000 to 2000 samples per cycle, but what would one say about a sine wave at 12000 cycles per second, which only uses, at 48 kHz, four samples per cycle? When the wave is reconstructed, do we still have a sine wave? The reasoning is the same when discussing number of bits. Regardless of the initial volume, a subtle increase or decrease will be more faithfully described through the higher number of 'steps' offered by 24-bit recording. The waveforms below demonstrate this idea:
Let us make clear straight away that these waveforms are simply an illustration conceived to facilitate visualizing the process of digitization. They do not claim to be an exact reflection of reality of reproduction: in CD players, a variety of corrective mechanisms, notably low-pass filters, are introduced in the circuit to 'smooth out' in some way the distortion produced in high frequencies by sampling frequencies that are too low, by eliminating the higher harmonics produced by square waves, transforming them back into sines. But these filters stay in place even if the initial wave was really a square wave! It can be seen: all of these mechanisms are merely crutches, generating problems themselves. In the final tally, the original information is simply not present in these truncated recordings, and it is left to an electronic circuit the task of making an approximation of the original wave.
|DVD-Audio : Characteristics|
Technically possible since the middle of the 1990s, but paralyzed by numerous discussions of a commercial nature within the consortium sponsoring it, DVD-Audio wasn't officially introduced until the end of the year 2000. The clumsy publicity given to this delay and the content of these discussions has strongly aggravated the public, which has greeted it on its arrival with only marginal interest. In this climate, even the extreme versatility of the medium has been misperceived, transforming it into a source of confusion.
It ought to be, from the beginning, a high definition, multichannel medium, capable of offering 6 channels of linear audio at 24 bits/96 kHz. Now:
- this result would necessitate a bandwidth of 13.8 Mb/second, which would exceed the 9.6 Mb/s which was finally fixed for the medium. The recourse has been to non-destructive data compression, of the zip type used in computing, called MLP (for Meridian Lossless Packing), which reconstitutes "bit-for-bit" the original signal, all while offering a compression rate of 1.85:1;
- but… the DVD-A standard also permits data compression of a destructive nature, based on psychoacoustic principles, such as Dolby Digital, also called AC-3, and DTS!
- a linear signal at 24 bits/192 kHz is equally possible, but only in stereo, using all of the available space on a DVD-A;
- finally, within the limits offered on diskspace and bandwidth, virtually any combination is possible, from the number of channels- from 2 to 6- to even the structure itself of each of the individual channels: 16, 20 or 24 bits, from 44.1 to 192 kHz, linear encoding, MLP, AC-3, DTS, etc.
- the producer may also attach, to each audio track, certain fixed images: information on the performers, words to songs, etc. But to add to the confusion that already reigns in the minds of many trying to figure out the difference between DVD-A and DVD-Video, a dedicated video zone has also been planned, even if it is of limited capacity.
Mastering engineers, for the most part, have already been carrying out a conversion of all digital tapes they receive in 44.1 or 48 kHz formats into 24 bit/88.2 kHz and 24 bit/96 kHz before any treatment, because they have simply understood that this will immediately improve the sound quality. Most digital processing units that they use will work at these rates. The receipt of masters- or, as discussed before, stems or multitrack premixes- already recorded in these formats is evidently a further amelioration. Since there are still very few professional transfer media that are capable of supporting such formats, the current tendency is to transfer all of the material onto several data CD-ROMs, a cheap method which seems perfectly acceptable from the point of view of reliability and integrity of the signal.
We must make clear here that the mastering engineer can no longer be considered, as far as DVD-Audio is concerned, as the final stage before sending to manufacturing plants. In effect, as with DVD-Videos containing only an audio programme, which we will discuss in detail later, navigation among the various tracks implies- not obligatorily, but the consumer expects it- the use of a television screen, which permits him to visualize the content and choose from the various options. This leads to work on graphical conception, a certain amount of programming, and an integration of all of these elements with the audio, inside an authoring program. A handful of mastering studios have started up separate departments to offer their clientele supplemental services, but the majority of them are still content to send on the optimized audio to specialized companies. A similar situation affects SACD.
With impeccable synchronicity, Sony and Philips, multinationals experienced in this kind of exercise since they were partly responsible for the lamentable VHS/Beta episode, have waited until the setbacks with DVD-Audio have brought interest in high density audio to its lowest level, to further disgust the public by introducing their own format. It consists of an extension of the oversampling technique called DSD (for Direct Stream Digital), capable of offering, due to a sampling rate of 2.8224 mHz at 1 bit, a frequency response ranging from direct current to 100 kHz and a dynamic range of 120 dB. This said, the lack of timeliness of Super Audio CD is dismaying:
- it is based on a technology completely at odds with LPCM (for Linear Pulse Code Modulation), used in all digital systems to this day. No current recording system, program, plug-in, etc can be adapted to it, and the specific sound recording and treatment tools are very rare and exorbitant;
- even if largely considered, from the point of view of sound quality, as equal to or marginally superior to LPCM 24 bit/96 kHz and 24 bit/192 kHz, the SACD, like CD-Audio before it- but unlike its high-density LPCM competitors- is a closed format, offering few possibilities of integration with eventual improvements;
- unlike DVD-Audio readers, for which a FireWire digital output is foreseen, offering at least the possibility of a D/A conversion by a specialized external device, all SACD readers, with one exception, are equipped only with analog outputs. An aberrant contradiction with the audiophile vocation of the format;
- and of course, it is out of the question for an SACD reader to read DVD-A, and vice-versa. As for the possibility of reading an SACD on a CD-Audio player, it is illusory: for a much higher price than CD, we get a performance exactly the same, since all that's happening is that an entirely separate layer is being read from the SACD, at the Red Book standard!
Even if the conflict seems, to the weary eyes and ears of the public, just one more race for monopolistic control between two equivalent formats, the mass-media war between DVD-Audio and SACD continues, regularly nourished by the news that one or another music multinational has been won over to one or the other camp. On both sides, the sales of discs and players are insignificant, and many observers already consider the introduction of high density audio to be a failure.
Torn between fantasies of conquest and commercial realism, the Sony/Philips tandem announced, at the beginning of October 2002 at the 113th convention of the AES , that a million SACD players had already been sold, neglecting to mention that these were, in fact, DVD-Video players capable of reading the SACD format. In fact, during the previous year, many dedicated SACD players from both companies were recalled from the market, without any announcement of replacement models in the future.
|Multichannel Sound: Generalities|
With more than 30 million Home Cinema systems already installed, multichannel sound, again despite an overabundance of sub-formats and declensions, seems to have adopted a consensual form: most reproduction systems that have been installed are a simplification of the Surround 5.1 standard. The players are DVD-Video and game consoles, which leads us directly to the following observation: music has until now largely underused multichannel sound! Now, if the avatars of high definition audio are largely responsible for this delay, we will see that other causes- among others stylistic- are identifiable.
We will discuss briefly the only- and hardly threatening- competition with Surround, the Ambisonic system, which is, ironically, a system conceived above all for the reproduction of music, before coming back to various aspects of Surround, including the mastering of it.
|The Ambisonic system|
It is something the experts responsible for the acoustic conception of recording studios, mastering rooms, and all other rooms destined for optimal audio reproduction have known for a long time: outside of a small area situated halfway between and at a precise angle from the speakers- the famous sweet spot- the stereo image becomes inconsistent. The problem is accentuated with conventional Surround, which reduces the small margin for backwards and forwards maneuver that stereo still offered.
Offering the advantage of a greatly extended optimal listening area, the Ambisonic system proposes, by way of a complete and integrated chain of processes from recording to the final listening, an integral reconstruction in three dimensions of the sonic space at the time of recording. Encoded in only four channels, the information may be reproduced on the number of channels chosen by the consumer, who equally has the choice of the placement of each loudspeaker. The higher their number, the greater the fidelity of the reproduction of the original ambience. Ambisonic encoding is carried out either at the moment of recording, using special microphones, or with the help of specialized equipment, cabled to the output of a multitrack console.
The handicap of this system is that it requires a specific decoder, which is costly. The solution currently on offer, the G-format, is to "congeal" on DVD-A the collapsing of tridimensional Ambisonic sound to 6 channels, distributed according to the Surround 5.1 standard. Which leads the consumer to the limits and obligations of this standard…
Hastily derived from cinema, the diverse declensions of the Surround system- 5.1, 7.1, 10.2- offer no greater intelligibility in the eyes of the consumer than the other tinkerings of the contemporary audio industry. The famous ".1", for example, which designates the LFE (for Low Frequency Effects). channel, is most often confused with subwoofer(s) in general usage, that is to say, intended to extend the frequency response of the other channels. One must equally know that the "home cinema" 5.1 standard is different from the 5.1 defined by the DVD-A consortium. The details are unimportant: consumers have not significantly increased the part of their earnings they are willing to spend on their sound reproduction systems, so that the resources which would have served to buy 2 channels now must be split among 5. The loss of quality which follows might be compensated by the additional excitement that extra channels produce. But again, for this the placement of all of these speakers must correspond at least slightly to the specifications of the standard, something that was already very rare with only stereo…
|Mastering in Surround|
Recording studios have long been reluctant to transform their installations to accommodate Surround production. The confusion and uncertainty tied to the format wars add to the thorny problem of acoustic conception of studios that must manage such a quantity of loudspeakers: phase problems, reflections, clutter, etc. Mastering studios are here even more circumspect, and have hardly begun to think about it. A common compromise seems to be to continue to concentrate the bulk of resources on the principal stereo pair, even if it means adding another complete multichannel system of much lower quality. Equalizations are made, by pairs of channels, on the principal system, and the multichannel system is used only for minor balance adjustments among the channels. On the production side, it would seem however that the multichannel mix would be, in the end, a much easier and more satisfying operation than in stereo. It is no longer necessary to have recourse to complex equalizations whose only goal was often to allow a large number of sources to co-exist on two loudspeakers. Spreading out the channels thus allows here an economy of treatments, which translates into a reduction in number of errors induced by monitoring systems, and thus to a more open sound (this idea is clarified in 2.1.3). The question remains: at what level of degradation will these advantages reach consumers?
It is not only because of its relative novelty that SACD, as we have seen, has only a very small number of digital treatment tools. It is above all because any treatment- other than recording and editing without transformation, even of gain- of a signal in this format necessitates, at the present time, a certain degree of conversion towards an encoding of type LPCM. In also considering the inverse operation of final re-encoding to a DSD flux, we can be sure that the very relative quality advantages of this format have disappeared at the end of the process. So in reality the format only satisfies its purpose in the case of recordings of performances designed to be reproduced as is: only a portion of classical and jazz productions, to which we could add the highly hypothetical case of capturing at the output of a console an impeccable electroacoustic performance for which the source would be a top-of-the-line analog synthesis system… in brief, a minuscule portion of music produced. And the Ambisonic process is even more restrictive in this regard: the system only serves its purpose when the recording is made only via microphones, and then again the room being used must offer an acoustic worthy of consideration.
Now, for the past thirty years, a remarkable evolution in musical production equipment has triggered a process of relative democratization in such production: we are thinking here of analog multitracks, near-field monitors and sampling, then of computerization of digital- notably the proliferation of plug-ins- and Internet distribution. The current transition to high-density and multichannel audio via DVD and Surround belongs, in a certain way, to this movement. A movement which has permitted a large number of styles and points of view, up until now deprived of all dissemination, to become known and to express themselves, in an unprecedented explosion of creativity. We have finally emerged from the alienating dichotomy between "serious" and "commercial" music, which gave the right to exist only to modes of expression "already in the book".
The regressive aspects of this tendency are equally worthy of mention: a marked fall in concern for sound quality, carelessness from a technical point of view, or in brief a general complacency which has been rewarded by a disaffection from the consumer public, condemned to disgust by unrestrained, slapdash, and excessive musical offerings. Pointless to specify that the democratization of the means of production is not the only cause of this debacle- far from it.
Now, will SACD and Ambisonic systems come to oppose this tendency for degradation, with a powerful recall of a minimal demand for sound quality? Far from it! After only a quick glance at their technical requirements and limitations it is clear that they are only aimed at a very restricted circle of consumers, whose musical horizons are limited strictly to the straightforward recording-of-a-performance type . Now these productions, often largely of archival interest, are in reality reserved for the only social groups intensively- and expensively- trained to appreciate them. How can we be sure that this new purism, that seems somewhat in line with the 'style' of neo-conservative ideology, is solely technical and musical? Is it not also obligatorily social, even ethnic? It would be surprising, for example, if someone were to consider releasing an Ambisonic remastering of the back-catalogues of Om Kalsoum, or Trini Lopez…
Handicapped by an austere and elitist image, corresponding to no real demand, commercially non-viable, these systems cannot even claim to play a role of social regulation, since the only musical styles for which they claim to become the privileged format have not fulfilled this function for a long time. But then, why have they been introduced? What is the motivation behind the sums of money invested in their useless promotion? Only two explanations come to the mind of the author of this study, and they are thin:
- their promoters are disconnected from reality, they get aesthetic pleasure from absurd situations, or they are simply idiots;
- certain people do not feel secure unless they are isolated from the masses, in all aspects of their existence, by a complete set of unassailable mechanisms. Strategists of rare subtlety, experts in predictive marketing, have identified the beginning of a demographic renewal of this particular psychological profile…
And before the intellectual standing of this document sinks further, provoking a greater degradation in its author's reputation, let's change section.
About the author: Dominique Bassal