Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did your incident response team look at the last few changes that were executed? If they had , they could have just rolleback the change. or just looking at the changes executed, in the vicinity of the start of the outage could have pointed to the problem.

Didn't the services that were crashing due to OOM raise any alerts?

This is shitty at so many levels.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: