This inspired me to read up on the low-level details of CD structure. I'm curious if anybody scanned an entire CD and shared the results, so that we could work with a raw image of disc that contains all its quirks, as opposed to the typical .iso format?
It's really difficult. Unlike floppy disks, where you tell the drive to seek and get back raw magnetic pulses (so you can produce raw flux images), or hard disks where you tell the drive to read an arbitrary sector and get a blob of data (so you can produce sector-level images), the protocol for talking to a CD ROM involves asking for track/sector addresses, which means you have to trust the drive to interpret all the track metadata and error-correction for you - you generally can't just dump the "raw" data and do the interpretation yourself.
That's why the most robust CD image format is the BIN/CUE format. The BIN file contains all the sectors the drive allows us to read, the CUE file contains the disc metadata as interpreted for us by the drive firmware.
There are some drives which support extra "raw read" commands, but they're incredibly rare and consequently in great demand by CD preservation projects like redump.org.
Some people have used the contents of BIN/CUE data to reconstruct what should actually be on the disk, but that's not quite the same thing. Here's a great explanation of the CD structure in all its complexity:
Even BIN/CUE is not enough. It cannot store subchannel data like CD+G and is only able to hold a single session which breaks bluebook CDs with audio and data.
We do not currently have a widely supported CD standard for storing data from a CD that can properly hold all data. Aaru [0] is close, but still has to output back to other formats like BIN/CUE to use the contents of the disc.
Apparently makemkv forum members created some patched firmware that lets you raw read BRs for the sake of extracting metadata that’s intentionally hidden for DRM. Though I’ll have to recheck my understanding since you’re saying you can’t actually raw read disks anyway
Audio CDs were never ripped/transferred as ISO files. ISO-9660 is a filesystem that came years later, and Redbook audio CDs simply do not contain files.
If you want to look at the structure of a whole audio CD, then one way is to rip it with a decent tool (perhaps cdrdao or EAC) and generate a bin/cue file pair as an output.
But that's not my goal. I'd like to be able to observe every grove, the physical encoding of data, and see if I could implement decoding from scratch. First problem is though that I don't know how to get a microscopic image of the disc.
You don't need a microscopic image of a disc to do that; a two-dimensional photograph is of essentially no advantage here.
All you need is the unmolested data from that disc. The data is arranged on a singular spiral groove starting from the center and slowly winding its way towards the outside.
The data is completely linear: It begins at the beginning, and continues to the very end without interruption. This is all akin to (although opposite of) how a single-track vinyl record is physically laid out. The entire CD -- whatever it contains -- is just a continuous string of pits and lands.
And to observe that string as it appears on a real disc, all you need to get started is a regular old-school CD player and some appropriate data acquisition gear, and maybe an oscilloscope to help figure out what you're looking at.
The optics and basic motor controls are already solved problems, and it doesn't even have to be particularly fast data acquisition gear by today's standards to record what is happening.
Look into the Domesday Duplicator project for Laserdiscs as an example of how what ssl-3 is talking about can be done using a high sample rate input. That exact process is possible and with enough storage and processing power can be used to get the most "low level" access to the data. It is not for the faint of heart though, and can take around 1TB of storage and hours of CPU time to process full movies in this way, I know because I've done it.
I believe I've seen there is work being done to attempt this on CDs but it would have still been in the exploratory phases and not yet ready to start archiving with. It might seem like overkill to do this to something meant to be digitally addressed but I've experienced enough quirks with discs and drives when ripping that I would 100% be willing to switch over to a known complete capture system to not have to worry about it anymore. Post process decoding also allows for re-decoding data later if better methods are found.
The "unmolested data" would still have undergone error correction though, wouldn't it? I don't think a bin/cue rip would contain the redundant stuff, which GP seems interested in, nor the subcodes (of which some are represented in the cue file, while the bin file is PCM audio).
Ah, I see. So what kind of capture hardware could read from that point? I assume it's a digital signal taking the form of 2-voltages, flipping on the order of 3.6 MHz (16 billion pits to read over 74*60 seconds). With Red Book audio at 1.4 Mbps, more than half of the raw data must be devoted to things like redundancy and other non-PCM stuff, if my interpretation that pits==bits isn't far off.
Aside: is your username inspired by Secure Socket Layer or Solid State Logic?
I'm getting off into the weeds of what I know here, so take this all with a grain of salt. (I probably used to know more about all of this than I do right now.)
The difference between a pit and a land is an optical phase change. The pits and lands vary in length, and there are 9 valid variations in their lengths. This combined phase/temporal situation eventually (thanks, science folks from 1970-something!) turns into a serial binary electrical signal inside of a CD player.
This binary electrical signal can be recorded.
Recorded with what, you asked?
CDs have a lot more going on than just audio data: Remember, there's forward error correction at play and (by spec, IIRC) a player is supposed to be able to completely recover data even if there is a gap of 1mm due to a scratch or other interruption. (There's also room for tricks like CD+G to live in the background, and certainly what may seem like an inordinate amount of data used just for clocking: CDs are CLV, so playing them happens at a continuously-varying rotational speed in a tightly closed loop because buffer RAM was expensive to buy, and expensive to manage, and tight speed control was cheaper to implement. Remember, this was a finished digital product that was released in 1981.)
I find old references[0] that suggest that the raw data rate of a CD (it does not matter what kind) is 4.3218 Mbps.
So, to posit some example hardware: With careful loops and decent wiring, accurately capturing this seems like it would be well within the purvey of an RP2040's PIO's DMA modes to get that data into RAM, and also well within one of its 133MHz 32-bit ARM core's ability to package up and deliver that data over USB 2 to a host machine that can store it for later analysis -- plus or minus a transistor or two, or maybe a pullup resistor in just the right spot.
(But that's just my opinion as a home hacker who has dabbled in RP2040 PIO assembler, and who is at or a bit beyond their knowledge of compact discs. I may wake up tomorrow and decide that the above is all bullshit and wish I could erase all of it. If in doubt, Phillips datasheets for CD player chipsets from the first half of the 1980s can probably help a lot more than I can.)
---
As to the username: It's old. It predates Secure Socket Layer, but it's way newer than Solid State Logic. I was just a young kid with a new modem when I dialed into a Telegard BBS and started to sign up for an account, and got stuck at the prompt to enter a "Handle". I didn't know what a handle was in this new-to-me context.
The sysop saw that I was stuck and dragged me into chat, as good sysops (hi Shawn!) tended to do upon seeing such a thing. We chatted for a bit, and I wasn't feeling creative, so he suggested that maybe I could look around for inspiration since most people used a made-up handle on his particular BBS.
I found a 5.25" floppy disk on the desk that I'd borrowed from the local public library. It was labeled "Selective Shareware Library, Volume 3." (It was also almost certainly infected with the Stoned virus[1]).
Anyhow, that was sufficiently inspiring, so ssl-3 it was.
You might do well enough with https://en.wikipedia.org/wiki/Cdparanoia without needing use different hardware to scan the disc. Instead it relies on the CD drive's ability to report on inaccuracies in keeping in sync with the grooves.
I wonder if you could just tear the controller out of a CD/DVD drive and build a new one from scratch, kind of like the new floppy controllers being used now to read the raw magnetic data. You could just command the head to move to the center, find the beginning of the data and just keep reading until you hit the buffers.
Floppies (most of them, anyway) have fixed track widths, and these tracks are arranged cylindrically, and these cylinders align with the steps of the stepper motor that is used to actuate the head assembly.
It's relatively easy, with the right ratio betwixt step advancement and track width, to get the head moving properly on a new implementation of a floppy controller. Want to read track 1? Step to the head N times to reach track 1 from wherever it started, and read it. Next, want to read track 33? Step the head N times to track 33, and read that.
But tracking the spiral groove of a CD is a very different problem to solve. Steps tend to lose their meaning. Instead of electromagnetic steps, it involves 3 different laser beams: Two to continuously keep the head centered where it needs to be on the ever-changing groove using a servo feedback loop, and a third to read the data from the pits and lands from the middle of that groove.
Is it do-able? Sure! People with far less advanced tech than we on HN might have laying around did it 40+ years ago.
It's just a very different nut to crack than reading a floppy is, even if the mechanical and optical bits are recycled.
(And that's just head positioning. The pits and lands still needs to be read, and those reflect back from the disc as optical phase shifts, not as changes in magnetic polarity and/or amplitude.)
Coming back to this, having read some of the (great!) replies, I'm going to go out on a limb and say that in theory, this sounds possible, and fun, but highly impractical. I'll assume that by "scan" you mean a high end "flatbed scanner" optical scan which would return a 2D bitmap.
It's impractical because the resolution required to retrieve the data "flatbed scanner style" is comically high, perhaps 50k dpi, far beyond the capability of any commercial unit and well into scanning microscope territory. Sure, from my understanding, it looks technically possible. But it would be a very significant and costly project just to assemble the image in the first place. Even if you had that, the resulting file would be hilariously huge (something like 122GB), extremely difficult to work with, and you would be starting from scratch implementing some kind of visual pathfinding helical decoder to painstakingly unravel the linear coil of data the scan just sort of blatted into two dimensions.
It's a cool idea. But it's comically, exponentially harder than just using the equipment as intended to just read the laser returns off the disk directly, into a far, far more easily dealt with format.
I'm adding that CD scan to my list of things I'd like to do if I ever get really rich.