Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The low level formatting has ECC, which never leaves the drive. That said, there are two cases to consider for misdirected writes. One is that the write clobbers multiple sectors in which case you would get uncorrectable sectors. The second is that it perfectly replaces another sector. In that case, the ECC is a perfect match as the ECC is stored with the sector. Neither drive would report a problem, but the data would not match. This is what I described as being a problem and traditional RAID is incapable of dealing with it.


And that is where I stated that drives can report a problem. If they 'seed' their ECC algorithm with the sector number (XOR-ing the result with it would be sufficient), they can (statistically) detect that, when they read sector #X, what they got wasn't what they ever wrote as sector #X.

In fact, I guess they already do. If they didn't, there would be misdirected reads, too.


The low level formatting does include a sector number, but it is not part of ECC. I am not sure what your point is. Your theoretical description of how hard drives could work does not reflect reality. Research by CERN and others has confirmed the existence of misdirected writes. Deployed ZFS installations are detecting corruption in situations where the drives report everything is fine. Even if the storage hardware improves, having end to end checksums in the filesystem will continue to make sense.

That said, I think you are fixating on one way that things can go wrong. Another way that misdirected writes can occur is a bit-flip in the micro-controller's memory. This also allows for misdirected reads as well as reading/writing data that has a single bit flipped. These devices micro-controllers do not have ECC memory. Even if it were added, you still need to prove that there are no programming bugs via formal verification, but given that these devices are black boxes that cannot be inspected, you cannot rely on the claim of a proof even if one is done and there would still be the possibility for errata in the micro-controller. It is far easier to just use end-to-end checksums in the filesystem. Even if you think the device is trustworthy, end-to-end checksums give you the ability to check that it is doing what it is supposed to do. You simply do not have that with traditional RAID.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: