I can share my experience: monitoring and alerting should be calibrated to the n...

I can share my experience: monitoring and alerting should be calibrated to the number of users you serve. Early on, we run load/stress tests; if those look good, many ancillary alerts aren’t necessary. Alerts are best reserved for truly critical events—such as outages and other severe incidents. Thresholds should be tuned to real-world conditions and adjusted over time. Hope this helps.