Shannon, Beethoven, and the Compact Disc
By Kees Schouhamer Immink
Abstract
An audio compact disc (CD) holds up to 74 minutes, 33 seconds of sound, just enough for a complete mono recording of Ludwig van Beethoven's Ninth Symphony (‘Alle Menschen werden Brüder’) at probably the slowest pace it has ever been played, during the Bayreuther Festspiele in 1951 and conducted by Wilhelm Furtwängler. Each second of music requires about 1.5 million bits, which are represented as tiny pits and lands ranging from 0.9 to 3.3 micrometers in length. More than 19 billion channel bits are recorded as a spiral track of alternating pits and lands over a distance of 5.38 kilometers (3.34 miles), which are scanned at walking speed, 4.27 km per hour.
In December 2007 it was 25 years ago that Philips and Sony introduced the CD. In this jubilee article I will discuss the various crucial technical decisions made that would determine the technical success or failure of the new medium.
Shaking the tree In 1973, I started my work on servo systems and electronics for the videodisc in the Optics group of Philips Research in Eindhoven. The videodisc is a 30 cm diameter optical disc that can store up to 60 minutes of analog FM-modulated video and sound. It is like a DVD, but much larger, heavier, and less reliable. The launch of the videodisc in 1975 was a technical success, but a monumental marketing failure since the consumers showed absolutely no interest at all. After two years, Philips decided to throw in the towel, and they withdrew the product from the market. While my colleagues and I were working on the videodisc, two Philips engineers were asked to develop an audio-only disc based on optical videodisc technology. The two engineers were recruited from the audio department, since my research director believed a sound-only disc was a trivial matter given a video and sound videodisc, and he refused to waste costly researcher’s time. In retrospect, given the long forgotten videodisc and the CD’s great success, this seems a remarkable decision. The audio engineers started by experimenting with an analog approach using wide-band frequency modulation as in FM radio. Their experiments revealed that the analog solution was scarcely more immune to dirt and scratches than a conventional analog LP. Three years later they decided to look for a digital solution. In 1976, Philips demonstrated the first prototypes of a digital disc using laser videodisc technology. A year later, Sony completed a prototype with a 30 cm diameter disc, the same as the videodisc, and 60 minutes playing time [2].
The Sony/Philips alliance
In October 1979, a crucial high-level decision was made to join forces in the development of a world audio disc standard. Philips and Sony, although competitors in many areas, shared a long history of cooperation, for instance in the joint establishment of the compact cassette standard in the 1960's. In marketing the final products, however, both firms would compete against each other again. Philips brought its expertise and the huge videodisc patent portfolio to the alliance, and Sony contributed its expertise in digital audio technology. In addition, both firms had a significant presence in the music industry via CBS/Sony, a joint venture between CBS Inc. and Sony Japan Records Inc. dating from the late 1960s, and Polygram, a 50% subsidiary of Philips [4]. Within a few weeks, a joint task force of experts was formed. As the only electronics engineer within the ‘Optics’ research group, I participated and dealt with servos, coding, and electronics at large. In 1979 and 1980, a number of meetings, alternating between Tokyo and Eindhoven, were held. The first meeting, in August 1979 in Eindhoven, and the second meeting, in October 1979 in Tokyo, provided an opportunity for the engineers to get to know each other and to learn each other’s main strengths. Both companies had shown prototypes and it was decided to take the best of both worlds. During the third technical meeting on December 20, 1979, both partners wrote down their list of preferred main specifications for the audio disc. Although there are many other specifications, such as the dimensions of the pits, disc thickness, diameter of the inner hole, etcetera, these are too technical to be discussed here.
Item Philips Sony
Sampling rate (kHz) 44.0 - 44.5 44.1
Quantization 14 bit 16 bit
Playing time (min) 60 60
Diameter (mm) 115 100 EC
Code t.b.d. t.b.d.
Channel Code M3 t.b.d.
t.b.d. = to be discussed
As can be seen from the list, a lot of work had to be done as the partners agreed only on one item, namely the one-hour playing time. The other target parameters, sampling rate, quantization, and notably disc diameter look very similar, but were worlds apart.
Shannon-Nyquist sampling theorem
The Shannon-Nyquist sampling theorem dictates that in order to achieve lossless sampling, the signal should be sampled with a frequency at least twice the signal’s bandwidth. So for a bandwidth of 20 kHz a sampling frequency of at least 40 kHz is required. A large number of people, especially young people, are perfectly capable of hearing sounds at frequencies well above 20 kHz. That is, in theory, all we can say. In 1978, each and every piece of digital audio equipment used its own ‘well-chosen’ sampling frequency ranging from 32 to 50 kHz. Modern digital audio equipment accepts many different sampling rates, but the CD task force opted for only one frequency, namely 44.1 kHz. This sampling frequency was chosen mainly for logistics reasons as will be discussed later, once we have explained the state-of-the-art of digital audio recording in 1979. Towards the end of the 1970s, ‘PCM adapters’ were developed in Japan, which used ordinary analog video tape recorders as a means of storing digital audio data, since these were the only widely available recording devices with sufficient bandwidth. The best commonly-available video recording format at the time was the 3/4" U-Matic. The presence of the PCM video-based adaptors explains the choice of sampling frequency for the CD, as the number of video lines, frame rate, and bits per line end up dictating the sampling frequency one can achieve for storing stereo audio. The sampling frequencies of 44.1 and 44.056 kHz were the direct result of a need for compatibility with the NTSC and PAL video formats. Essentially, since there were no other reliable recording products available at that time that offered other options in sampling rates, the Sony/Philips task force could only choose between 44.1 or 44.056 KHz and 16 bits resolution (or less). During the fourth meeting held in Tokyo from March 18-19, 1980, Philips accepted (and thus followed Sony’s original proposal) the 16-bit resolution and the 44.1 kHz sampling rate. 44.1 kHz as opposed to 44.056 kHz was chosen for the simple reason that it was easier to remember. Philips dropped their wish to use 14 bits resolution: they had no technical rationale as the wish for the 14 bits was in fact only based on the availability of their 14-bit digital-analog converter. In summary, Compact Disc sound quality followed the sound quality of Sony’s PCM-1600 adaptor since logistically speaking there was no other choice. Thus, quite remarkably, in recording practice, an audio CD starts life as a PCM master tape, recorded on a U-Matic videotape cassette, where the audio data is converted to digital information superimposed within a standard television signal. The industry standard hardware to do this was the Sony PCM-1600, the first commercial video-based 16-bit recorder, followed by the PCM1610 and PCM-1630 adaptors. Until the 1990s, only video cassettes could be used as a means for exchanging digital sound from the studios to the CD mastering houses. Later, Exabyte computer tapes, CD-Rs and memory sticks have been used as a transport vehicle.
Coding systems
Coding techniques form the basis of modern digital transmission and storage systems. There had been previous practical applications of coding, especially in space communications, but the Compact Disc was the first mass-market electronics product equipped with fully-fledged error correction and channel coding systems. To gain an idea of the types of errors, random versus burst errors, burst length distribution and so on, we made discs that contained known coded sequences. Burst error length distributions were measured for virgin, scratched, or dusty discs. The error measurement was relatively simple, but scratching or fingerprinting a disc in such a way that it can still be played is far from easy. How do you get a disc with the right kind of sticky dust? During playing, most of the dust fell off the disc into the player, and the optics engineers responsible for the player were obviously far from happy with our dust experiments. The experimental discs we used were handmade, and not pressed as commercial mass-produced polycarbonate discs. In retrospect, I think that the channel characterization was a far from adequate instrument for the design of the error correction control (ECC). There were two competing ECC proposals to be studied. Experiments in Tokyo and Eindhoven -Japanese dust was not the same as Dutch dust- were conducted to verify the performance of the two proposed ECCs. Sony proposed a byte-oriented, rate 3/4, Cross lnterleaved Reed-Solomon code (CIRC) [6]. Vries of Philips designed an interleaved convolutional, rate 2/3, code having a basic unit of information of 3-bit characters [9]. CIRC uses two short RS codes, namely (32, 28, 5) and (28, 24, 5) RS codes using a Ramsey-type of interleaver. If a major burst error occurs and the ECC is overloaded, it is possible to obtain an approximation of an audio sample by interpolating the neighboring audio samples, so concealing uncorrectable samples in the audio signal. CIRC has various nice features to make error concealment possible, so extending the player's operation range [10]. CIRC showed a much higher performance and code rate (and thus playing time), although extremely complicated to cast into silicon at the time. Sony used a 16 kByte RAM for data interleaving, which, then, cost around $50, and added significantly to the sales price of the player. During the fifth meeting in Eindhoven, May 1980, the partners agreed on the CIRC error correction code since our experiments had shown its great resilience against mixtures of random and burst errors. The fully correctable burst length is about 4.000 bits (around 1.5 mm missing data on the disc). The length of errors that can be concealed is about 12.000 bits (around 7.5 mm). The largest error burst we ever measured during the many long days of disc channel characterization was 0.1 mm. We also had to decide on the channel code. This is a vital component as it has a great impact on both the playing time and the quality of ‘disc handling’ or 'playability'. Servo systems follow the track of alternating pits and lands in three dimensions, namely radial, focal, and rotational speed. Everyday handling damage, such as dust, fingerprints, and tiny scratches, not only affects retrieved data, but also disrupts the servo functions. In worst cases, the servos may skip tracks or get stuck, and error correction systems become utterly worthless. A well-designed channel code will make it possible to remove the major barriers related to these playability issues. Both partners proposed some form of (d, k) runlength-limited (RLL) codes, where d is the minimum number and k is the maximum number of zeros between consecutive ones. RLL codes had been widely used in magnetic disk and tape drives, but their application to optical recording was a new and challenging task. The various proposals differed in code rate, runlength parameters d and k, and the so-called spectral content. The spectral content has a direct bearing on the playability, and we had to learn how to trade playability versus the code rate (and thus playing time). In their prototype, Philips used the propriety M3 channel code, a rate 2, d=1, k=5 code, with a well-suppressed spectral content [1]. M3 is a variation on the M2 code, which was developed in the 1970s by Ampex Inc. for their digital video tape recorder [5]. Sony started with a rate 1/3, d=5, RLL code, but since our experiments showed it did not work well, they changed horses halfway, and proposed a propriety rate 12 , d=2, k=7 code, a type of code that was used in an IBM magnetic disk drive. Both Sony codes did not have spectral suppression, and the engineers had opposing views on how the servo issue could be solved. Synchronization of signals with unknown speed read with constant linear velocity (disc rotational speed varies with the radius) was another issue. Little was known, and every idea had to be tried on the testbed, and this took time. So that, at the May 1980 meeting, the choice of the channel code remained open, and ‘more study was needed’. Before continuing with the coding cliffhanger, we take a musical break.
Playing time and Beethoven’s Ninth by Wilhelm Furtwängler
Playing time and disc diameter are probably the parameters most visible for consumers. Clearly, these two are related: a 5% increase in disc diameter yields 10% more disc area, and thus an increase in playing time of 10%. The Philips’ top made the proposal regarding the 115mm disc diameter. They argued 'The Compact Audio Cassette was a great success', and, 'we don't think CD should be much larger'. The cross diameter of the Compact Audio Cassette, very popular at that time and also developed by Philips, is 115 mm. The Philips prototype audio disc and player were based on this idea, and the Philips team of engineers restated this view in the list of preferred main parameters. Sony, no doubt with portable players in mind, initially preferred a 100 mm disc. During the May 1980 meeting something very remarkable happened. The minutes of the May 1980 meeting in Eindhoven literally reads: disc diameter: 120 mm, playing time: 75 minutes, track pitch: 1.45 µm, can be achieved with the Philips M3 channel code. However, the negative points are: large numerical aperture needed which entails smaller (production) margins, and the Philips’ M3 code might infringe on Ampex M2. Both disc diameter and playing time differ significantly from the preferred values listed during the Tokyo meeting in December 1979. So what happened during the six months? The minutes of the meetings do not give any clue as to why the changes to playing time and disc diameter were made. According to the Philips’ website with the ‘official’ history: "The playing time was determined posthumously by Beethoven". The wife of Sony's vice-president, Norio Ohga, decided that she wanted the composer's Ninth Symphony to fit on a CD. It was, Sony’s website explains, Mrs. Ohga's favorite piece of music. The Philips’ website proceeds: “The performance by the Berlin Philharmonic, conducted by Herbert von Karajan, lasted for 66 minutes. Just to be quite sure, a check was made with Philips’ subsidiary, Polygram, to ascertain what other recordings there were. The longest known performance lasted 74 minutes. This was a mono recording made during the Bayreuther Festspiele in 1951 and conducted by Wilhelm Furtwängler. This therefore became the maximum playing time of a CD. A diameter of 120 mm was required for this playing time”. Everyday practice is less romantic than the pen of a public relations guru. At that time, Philips’ subsidiary Polygram –one of the world's largest distributors of music– had set up a CD disc plant in Hanover, Germany. This could produce large quantities CDs with, of course, a diameter of 115mm. Sony did not have such a facility yet. If Sony had agreed on the 115mm disc, Philips would have had a significant competitive edge in the music market. Sony was aware of that, did not like it, and something had to be done. The result is known.
Channel code continued, EFM Popular literature, as exemplified in Philips’ website mentioned above, states that the disc diameter is a direct result of the requested playing time. And that the extra playing time for Furtwängler’s Ninth subsequently required the change from 115mm to a 120 mm disc (no one mentions Sony’s 100 mm disc diameter). It suggests that there are no other factors affecting playing time. Note that in May 1980, when disc diameter and playing time were agreed, the channel code, a major factor affecting playing time, was not yet settled. In the minutes of the May 1980 meeting, it was remarked that the above (diameter, playing time, and track pitch) could be achieved with Philips' M3 channel code. In the mean time, but not mentioned in the minutes of the May meeting, the author was experimenting with a new channel code, later coined EFM [3]. EFM, a rate 8/17, d=2, code made it possible to achieve a 30 percent higher information density than the Philips' M3. EFM also showed a good resilience against disc handling damage such as fingerprints, dust, and scratches. Note that 30 percent efficiency improvement is highly attractive, since, for example, the increase from 115 to 120 mm only offers a mere10 percent increase in playing time. A month later, in June 1980, we could not choose the channel code, and again more study and experiments were needed. Although experiments had shown the greater information density that could be obtained with EFM, it was at first merely rejected by Sony. At the end of the discussion, which at times was heated, the Sony people were specifically opposing the complexity of the EFM decoder, which then required 256 gates. My remark that the CIRC decoder needed at least half a million gates and that the extra 256 gates for EFM were irrelevant was jeered at. Then suddenly, during the meeting, we received a phone call from the presidents of Sony and Philips, who were meeting in Tokyo. We were running out of time, they said, and one week for an extra, final, meeting in Tokyo was all the lads could get. On June 19, 1980 in Tokyo, Sony agreed to EFM. The 30 percent extra information density offered by EFM could have been used to reduce the diameter to 115mm or even Sony’s original target diameter100mm, with, of course, the demanded 74 minutes and 33 seconds for playing Mrs. Ohga’s favorite Ninth. However such a change was not considered to be politically feasible, as the powers to be had decided 120mm. The option to increase the playing time to 97 minutes was not even considered. We decided to improve the production margins of player and disc by lowering the information density by 30 percent: the disc diameter remained 120mm, the track pitch was increased from 1.45 to 1.6µm, and the user bit length was increased from 0.5 to 0.6µm. By increasing the bit size in two dimensions, in a similar vein to large letters being easier to read, the disc was easier to read, and could be introduced without too many technical complications. The maximum playing time of the CD was 74 minutes and 33 seconds, but in practice, however, the maximum playing time was determined by the playing time of the U-Matic video recorder, which was 72 minutes. Therefore, rather sadly, Mrs. Ohga’s favorite Ninth by Furtwängler could not be recorded in full on a single CD till 1988 (EMI 7698012), when alternative digital transport media became available. On a slightly different note, Jimi Hendrix's Electric Ladyland featuring a playing time of 75 minutes was originally released as a 2 CD set in the early 1980s, but has been on a single CD since 1997.
The inventor of the CD
The Sony/Philips task force stood on the shoulders of the Philips’engineers who created the laser videodisc technology in the 1970s. Given the videodisc technology, the task force made choices regarding various mechanical parameters such as disc diameter, pit dimensions, and audio parameters such as sampling rate and resolution. In addition, two basic patents were filed related to error correction, CIRC, and channel code, EFM. CIRC, the Reed-Solomon ECC format, was completely engineered and developed by Sony engineers, and EFM was completely created and developed by the author. Let us take a look at numbers. The size of the taskforce varied per meeting, and the average number of attendees listed on the official minutes of the joint meetings is twelve. If the persons carrying hierarchical responsibility of the CD project are excluded (many chiefs, hardly any Indians) then we find a very small group of engineers who carried the technical responsibility of the Compact Disc ‘Red Book’ standard. Philips' corporate public relations department, see The Inventor of the CD on Philips' website [7], states that the CD was "too complex to be invented by a single individual", and the "Compact Disc was invented collectively by a large group of people working as a team". It persuades us to believe that progress is the product of institutions, not individuals. Evidently, there were battalions of very capable engineers, who further developed and marketed the CD, and success in the market depended on many other innovations. For example, the solid-state physicists, who developed an inexpensive and reliable laser diode, a primary enabling technology, made CD possible in practice. Credit should also be given to the persons who designed the transparent Compact Disc storage case, the ' jewel box', made a clever contribution to the visual appeal of the CD. Philips and Sony agreed in a memorandum dated June 1980, that their contributions to channel and error correction codes are equal. Sony’s website, however, with their 'official' history is entitled 'Our contributions are equal' [8]. The website proceeds, “We avoid such comments as, ‘We developed this part and that part’ and to emphasize that the disc's development was a joint effort by saying, ‘Our contributions are equal’. The leaders of the task force convinced the engineers to put their companies before individual achievements” The myth building even went so far that the patent applications for both CIRC and EFM were filed with joint Sony/Philips inventors. Philips receives the lion’s share of the patent royalty income, which is far from equally shared between the two partners.
Everything else is gaslight
A favorite expression of audiophiles –particularly during the early period, when they were comparing both vinyl LP and CD versions of the same recordings– was: "It is as though a veil has been lifted from the music". Or, in the words of the famous Austrian conductor Herbert von Karajan, when he first heard CD audio: "Everything else is gaslight". Von Karajan was fond of the gaslight metaphor: he first conducted Der Rosenkavalier in 1956 with the soprano Elisabeth Schwarzkopf. Later, when he revived the opera in 1983 with Anna Tomowa, he referred to his 1956 cast as "gaslight", which rather upset Schwarzkopf. Philips and Sony settled the introduction of the new product to be on November 1, 1982. The moment the ink of the “Red Book”, detailing the CD specifications, was dry, the race started, and hundreds of developers in Japan and the Netherlands were on their way. Early January 1982 it became clear that Philips was running behind, the electronics was seriously delayed, and they asked Sony to postpone the introduction. Sony rejected the delay, but agreed upon a two-step launch. Sony would first market their CD players and discs in Japan, where Philips had no market share, and half a year later, March 1983, the worldwide introduction would take place by Philips and Sony. Philips Polygram could supply discs for the Japanese market. This gave Philips some breathing space for the players, but not enough, as in order to make the new deadline, the first generation of Philips CD players was equipped with Sony electronics. The first CD players cost over $2000, but just two years later it was possible to buy them for under $350. Five years after the introduction, sales of CD were higher than vinyl LPs. Yet this was no great achievement, as in 1980 sales of vinyl records had been declining for many years although the music industry was all but dead. A few years later, the Compact Disc had completely replaced the vinyl LP and cassette tape. Compact Disc technology was ideal for use as a low-cost, mass-data storage medium, and the CD-ROM and record-once and re-writable media, CD-R and CD-RW, respectively, were developed. Hundreds of millions of players and more than two hundred billion CD audio discs were sold.
About the author:
Dr. Kees A. Schouhamer Immink worked from 1968 till 1998 at Philips Research Labs, Eindhoven. In 1998, he founded Turing Machines Inc, where he currently serves as its CEO and president. Since 1994, he has been an adjunct professor at the Institute for Experimental Mathematics, Essen-Duisburg University, Germany, and a visiting professor at the Data Storage Institute in Singapore.
Further reading
[1] M.G. Carasso, W.J. Kleuters, and J.J Mons, Method of coding data bits on a recording medium (M3 Code), US Patent 4,410,877, 1983.
[2] T. Doi, T. Itoh, and H. Ogawa, A Long-Play Digital Audio Disk System, AES Preprint 1442, Brussels, Belgium, March 1979.
[3] K.A.S. Immink and H. Ogawa, Method for Encoding Binary Data (EFM), US Patent 4,501,000, 1985.
[4] T. Kretschmer and K. Muehlfeld, Co-opetition in Standard-Setting: The Case of the Compact Disc, //papers.ssrn.com/sol3/papers.cfm?abstractˍid=618484
[5] J.W. Miller, DC Free encoding for data transmission (M2 Code), US Patent 4,234,897, 1980.
[6] K. Odaka, Y. Sako, I. Iwamoto, T. Doi, and L. Vries, Error correctable data transmission method (CIRC), US Patent 4,413,340, 1983.
[7] The inventor of the CD, Philips’ historical website: //www.research.philips.com/newscenter/dossier/optrec/i ndex.html
[8] Our contributions are equal, Sony’s historical website: www.sony.net/Fun/SH/1-20/h2.html
[9] L.B. Vries, The Error Control System of Philips Compact Disc, AES Preprint 1548, New York, Nov. 1979.
[10] K.A.S. Immink, ''Reed-Solomon Codes and the Compact Disc'' in S.B. Wicker and V.K. Bhargava, Eds., Reed-Solomon Codes and Their Applications, IEEE Press, 1994. IEEE Information Theory Society Newsletter December 2007