The crucial flaw with Audio CDs

I finally decided to eliminate the many boxes I had been moving across various places full of CDs. But not before I actually archive them for my personal collection. Yes, I recognize that many of the albums can be accessed these days and are available via the so-called “hi-res” music services that offer to stream lossless formats. But I’ve already purchased the license to own this music, and I also have many albums not available on these sites [thanks to the independent record labels that have sent these to me as promos]. Plus, in case you didn’t know this already, I am a bit of a [digital] hoarder, slightly obsessive regarding the details, and very much on the technical side. Just read my article on How to fall in love with digital music. So, of course, I’m going to go through the trouble of “ripping” the CDs and storing them myself on a triple-backed-up NAS. I want to have an absolutely pristine and perfect copy before I move on. And herein lies the problem. What is a “flawless rip”? What is this talk about “bit-perfect” copy? In this write-up, I hope to explain a bit about the actual process of “ripping” and the rabbit hole that I’ve gone down in search of an answer. If you’re an audiophile or just a music geek, I think you will enjoy this one! [Please ignore the images – I generated them with AI to cut through a wall of text.]

First of all, let’s do a quick review of the Compact Disc as a medium. Introduced in 1982, a CD can store up to 80 minutes of digital audio in Pulse Code Modulation (PCM) format, encoded at a sample rate of 44.1 kHz using data represented by 16 bits. If I have already lost you, I recommend you catch up on my article On Digital Audio, Codecs, DACs and Bluetooth. Lossless codecs (such WAVs, ALAC, AIFF, and yes, FLAC) already use PCM for digitally representing the digital signal stored on a disc. So what could be the problem with copying the data off of the CD onto your hard drive? It should just be perfect because it’s digital, right? Wrong! Wait, what? “You’re speaking absolute nonsense, Mike. You’re like just those salesmen in those expensive hi-fi shops! It’s all digital! It has to be perfect! A zero and a one can not be misinterpreted. Otherwise, none of the software we have ever installed from CDs would work!

Let’s take a quick sidestep first. Before we get to the digital portion of our recording, we must accept the fact that a CD is still an analogue medium. We must “extract” those zeros and ones from the disc using a laser before we can interpret those bits into actual audio using a DAC. And a CD is not perfect. There are scratches, dust and dirt. A laser can be misaligned, resulting in interruptions in reading. There is even a phenomenon of a CD rot where the top metal layer begins to corrode, causing the data to be unreadable. “Yeah, but all those things aside, there is plenty of error correction built into the CD so that small scratches can be repaired!” And herein lies another issue! The Audio CD has less error correction than a Data CD! That’s right! The format for storing audio, as defined by the Red Book standard, differs from that of a data CD specification (which uses the Yellow Book standard, devised three years later)! And it has less built-in redundancy because audio CDs are optimized for continuous playback of music rather than “bit-perfect” data transfer. In fact, you can “rip” the same CD multiple times, and if your hardware is substandard and is not optimized for “bit-perfect” ripping, the checksum on your final digital files will be different! “Wait, whaaaaaaaat?”

Audio CDs store digital information in a continuous stream [versus a data CD, which actually uses a real file system]. For error correction, Audio CDs use a technique called Cross Interleaved Reed-Solomon Coding (CIRC) to detect and correct errors in the data on the fly without interrupting the playback. The audio data on a CD is divided into frames, each of which contains 24 bytes of user data and [only] 8 bytes of error correction code. The error correction code contains redundant information that can be used to detect and correct errors (if you want to get super nerdy on how this is done, you best read up on the “Polynomial Codes over Certain Finite Fields” article). When reading an audio CD, the CD player reads the audio data frame by frame and performs forward error correction on each frame. “So what’s the problem, then? This sounds pretty redundant to me.” 

Well… not always. Data CDs have more space available for error correction compared to audio CDs! Data CDs can detect and repair errors in approximately 792 bytes of data, while Audio CDs can detect and correct up to 220 bytes of data. If error correction fails on a Data CD, the data will be lost and corrupted, just like you imagine it to be. But on an Audio CD, if the error cannot be corrected, the player may interpolate the missing data to minimize the impact on the playback. This interpolation is an estimation of the missing data in a sequence of signals based on known data points [such as audio in our PCM]. This is where the player literally makes up new information for an audio stream, and you won’t even notice the difference! This wouldn’t work with pure data! And that’s the hidden flaw with [reading] audio CDs!

So while the Compact Disc stores music encoded in a lossless PCM format for perfect reproduction of audio, the act of reading, interpreting, playing, and, yes, “ripping” that data is absolutely not perfect! It certainly does not provide the appropriate level of redundancy on the corrupted data and even creates new data points on the fly! This is why ripping the same disc multiple times can produce a different final result (with a different checksum on the WAV). This is why bit-perfect hardware and software exist! They really try to pull those bits out as best as they can (with multiple passes over the same sectors). And even they can’t really tell if the final rip was perfect! If you want to learn more and go even deeper down this rabbit hole, check out the AccurateRip site that hosts a database of checksums for audio extraction and talks about the “Audio CD Error Detection Hole” in its secure ripping guide.

So does any of this matter? Can you hear the difference? Unless there is a significant skip which cannot be repaired, the answer is probably “no”. I can’t claim that anyone will be able to spot a delta in an interpolated value on something sampled 44,000+ times a second! Really. But is the Audio CD perfect? Is the act of playing and copying data lossless? No, it’s not. And if you’re a techie geek [just like me], then perhaps you will look into making those bit-perfect rips for your precious collection. To do that, you may need to get a quality CD drive (with C2, cache invalidation, over-reading, and drive offset features) and use software that also adopts AccurateRip. Or purchase that hi-fi- device that the sales rep was trying to sell you. In any case, I hope you learned something new here and maybe even changed your mind about Compact Discs!