Ask HN: Can you get to AWS?

colinbartlett · on July 1, 2015

I often find out about these sorts of global internet issues early because of a little project of mine:

It just monitors the status pages of lots of different services. When I get 30 notices from various unrelated services at once all with comments like "Network connectivity issues", I know some kind of routing issue is plaguing the 'net.

nthitz · on July 1, 2015

What if statusgator.io or my routing to SG goes down? :)

vacri · on July 1, 2015

Same solution as every monitoring system: run a second, separate one. Whether it's the proverbial backhoe through the fibre link or a sysadmin fat-fingering a conf or deploy, there's always a way for a single system to be brought down.

I saw an absolutely crazy way to do this with Nagios[1]. Nagios saves its state in a file, and the suggested way to synchronise two separate systems was to have them look to see if the other one is active. If it is, sync the statefile across. If it isn't, start up nagios locally based on the latest sync'd statefile. I mean, the solution works, but due to Nagios it's inherently hacky. It's bizarre that Nagios doesn't (didn't?) have inbuilt support for failover this way.

[1]https://allmybase.com/2010/10/04/setting-up-fully-redundant-...

parhamn · on July 1, 2015

What do you use to monitor the status of the respective sites? Ping/head request to the site/api?

colinbartlett · on July 1, 2015

I actually don't monitor the status myself, I monitor the status that each service publishes. That is, whatever they report on their status page, StatusGator aggregates.

The whole thing was born after I was racking my brain trying to debug a problem, then I remembered to check status.whatever.com and realized it wasn't my problem to debug and the provider was already aware of it. I thought it would be nice to get notifications as well as aggregate that info from all the different services I use.

mawhidby · on July 1, 2015

Cool site! Any plans to add IBM Bluemix? Their status page is at https://developer.ibm.com/bluemix/support/#status

colinbartlett · on July 1, 2015

Done.

Note a feature in the pipeline which will allow you to subscribe to specific components or specific regions of services. Which will make status notifications for large cloud platforms like Bluemix, Openshift, AWS, DigitalOcean, etc. much more useful.

pliu · on July 1, 2015

In a situation like this, how do you know that routing to the status page is not the problem? Many of these companies use statuspage.io which is on AWS.

Edit: this is an edge case, clearly. I just think it's an interesting problem.

mvelie · on July 1, 2015

Looks to be because of: https://twitter.com/Axcelx/status/616058414746202113

bks · on July 1, 2015

I run a spam filtering company in AWS-Oregon and we are under a DDoS that is directed to some of our financial services clients...We know that our client is the target because the Russian team claims responsibility opened up a trouble ticket with us (not hosted in AWS) about 2 hours ago to let us know that they would DDoS us and the network.

I am sure it's just a coincidence :-)

JohnHaugeland · on July 1, 2015

It appears that it is. This appears to have been a BGP problem.

https://twitter.com/Axcelx/status/616058414746202113

_jomo · on July 1, 2015

There's another post on HN:

"EC2 us-east is down" - https://news.ycombinator.com/item?id=9809304

verelo · on July 1, 2015

"Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network. The issue has been resolved and the service is operating normally. "

http://status.aws.amazon.com/

I really hate how AWS use PDT for their status page. Seriously, is there any logical reason to use anything other than UTC? I find it means when there is a report, I first need to convert the PTD time shown to UTC, then to local time just so I can know if the issue is possibly the one I experienced. Unless you're also in PDT, couldn't they just save everyone else one step by showing all service issues in UTC?

jonas21 · on July 1, 2015

Whichever time zone you pick, everyone not in that time zone is going to have to do a conversion. So why not pick the time zone where most of Amazon's engineers, and quite possibly most of their customers' engineers, are located?

joshribakoff · on July 1, 2015

PDT differs from PST. Conversion logic can get especially confusing if you've specified a date time, and then the laws surrounding day light savings time are changed to occur on a different day. Using UTC is like using a unix timestamp, you don't have to worry about a lot of things like locales & local laws, when the date time was recorded vs when its being viewed & what laws might have changed in-between. For example the date/time that daylight savings takes effect was changed in 2007. If you're trying to compare old logs to an old AWS announcement, it could get confusing... whereas UTC should remain immune to changes in laws.

verelo · on July 2, 2015

This is exactly why I want UTC timestamps.

michaelkeenan · on July 1, 2015

A Google query helps with this. For example, search for: 6:07 PM PDT to HKT

Google will return a special result at the top of the page: "6:07 PM Tuesday, Pacific Time (PT) is 9:07 AM Wednesday, Hong Kong Time (HKT)"

source99 · on July 1, 2015

What percentage of admins(or people that care about these status) are in PDT compared to the rest of the world? If it's in UTC then everyone would have to do one conversion instead of nearly all people doing 1 conversion.

Also I know my offset from PDT but I don't know my offset from UTC so it's also saving me the step of having to lookup that offset also.

brie22 · on July 1, 2015

Note at the top of http://status.aws.amazon.com/:

Internet connectivity issues

We are currently monitoring an external Internet provider issue that is causing interrupted service connectivity to AWS services for some customers. AWS services are not affected and continue to operate normally.

danieljurek · on July 1, 2015

I can't get the status page to load: http://status.aws.amazon.com/

Edit: Scratch that. Some stuff I'm hosting in AWS East is still working. Maybe someone screwed up router configs on the west coast.

Edit: Rollback the scratch. It's oscillating between up and down right now.

NeutronBoy · on July 1, 2015

Status page updated:

"We are currently monitoring an external Internet provider issue that is causing interrupted service connectivity to AWS services for some customers. AWS services are not affected and continue to operate normally."

http://status.aws.amazon.com/

aws_ls · on July 1, 2015

I woke up to see two of my instances, running in AWS Singapore, at 100% and unresponsive. Rebooting also did not help. They had to be killed. Thankfully AWS had documented this on their support page:

"Additionally, some customers have reported continued connectivity issues for some of their instances. We have seen with these reported issues that this has been caused by a leap second bug within the instance operating system, which results in 100% CPU utilization. We recommend rebooting the instance via the EC2 Management Console or API, or resetting the operating system time to resolve the issue. For further information see:

https://access.redhat.com/articles/15145"

ytjohn · on July 1, 2015

https://blog.thousandeyes.com/route-leak-causes-amazon-and-a...

richadams · on July 1, 2015

A fiber cut in Oregon could be responsible (https://puck.nether.net/pipermail/outages/2015-June/007906.h...). I've been seeing connectivity issues with us-west-2 for most of the day.

There was also a fiber cut in San Francisco area this morning (http://www.usatoday.com/story/tech/2015/06/30/california-int...).

jstratr · on July 1, 2015

I can reach http://status.aws.amazon.com/ from my home Comcast connection in SF. The status page is showing green for all services.

mondoshawan · on July 1, 2015

Looks to be something strange with the network. Using a SOCKS proxy to a host running on he.net, I can get to US-East machines, but not from my Cogent-based connection in the office.

cornellwright · on July 1, 2015

Yeah, that's what we're seeing. We can finally get to US-West2 again from our office. Employees who were working from home didn't experience an outage at all.

ra1n85 · on July 1, 2015

https://twitter.com/Axcelx/status/616058414746202113

Yuck.

fuziontech · on July 1, 2015

I'm able to tunnel into ec2 over our VPC and get to most things that are down like slack and netflix. Amazon.com is still unreachable

Tekker · on July 1, 2015

Leap second posts?

t3f · on July 1, 2015

For what its worth, I usually start at [1].

[1] http://internetpulse.keynote.com/

kirk21 · on July 1, 2015

Nope. A ton of websites don't work.(even amazon.com)

sergiotapia · on July 1, 2015

I've been having issues all week reaching a lot of different AWS websites including atom.io.

Switching my DNS to 8.8.8.8 fixed it for me. Something is up though.

halayli · on July 1, 2015

faced similar dns issues using the default name server that ec2 points to. We ended up running bind.

imroot · on July 1, 2015

I just got an email from abovenet -- apparently they had a fiber cut and they're having major routing issues affecting the san francisco area.

TechRemarker · on July 1, 2015

Yes, it's been suffering from the massive load of launching Beats1. Beats1 from Apple went down, but looks like it's back up finally.

clebio · on July 1, 2015

Citation?

mgingras · on July 1, 2015

https://www.apple.com/ca/support/systemstatus/

rgbrenner · on July 1, 2015

and what makes you think that apple uses AWS? Considering they've invested billions in building data centers for themselves in NC and CA.. and they've building two new ones in EU, and one in OR. http://www.datacenterknowledge.com/the-apple-data-center-faq... https://www.apple.com/pr/library/2015/02/23Apple-to-Invest-1...

ikeboy · on July 1, 2015

I'm connecting fine to us-east.