You've hit the nail on the head with "devilishly hard." That phrase perfectly captures what I've felt.
What have you found to be the most "devilish" part of it? Is it defining what "normal" behavior is for a service, or is it trying to account for all the possible-but-rare failure modes?