At Amazon we had a bug that was the result of a compiler bug and the behaviour of intel cores being mis-documented. It was intermittent and related to one core occasionally being allowed to access stale data in the cache. We debugged it with a logic analyzer, the commented nginx source and a copy of the C++ 11 spec.
The hardware team had some semi-custom thing from intel that spat out (no surprise) gigabytes of trace data per second. I remember much of the pain was in constructing a lab where we could drive a test system at reasonable loads to get the buggy behavior to emerge. It was intermittent so it took use a couple weeks to come up with theories, another couple days for testing and a week of analysis before we came up triggers that allowed us to capture the data that showed the bug. it was a bit of a production.
It took longer than 2 days to fix.