Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Can you get to AWS?
60 points by cornellwright on July 1, 2015 | hide | past | favorite | 40 comments
We're having lots of trouble getting to many AWS/Amazon resources in US-West2 and it seems like many other people are reporting problems with US-East. Currently we cannot even reach http://status.aws.amazon.com. What are other people seeing and does anyone have any pointers to what the problem could be? There seems to also be a bit of discussion on Twitter: https://twitter.com/hashtag/aws?src=hash&vertical=default&f=tweets#


I often find out about these sorts of global internet issues early because of a little project of mine:

https://statusgator.io

It just monitors the status pages of lots of different services. When I get 30 notices from various unrelated services at once all with comments like "Network connectivity issues", I know some kind of routing issue is plaguing the 'net.


What if statusgator.io or my routing to SG goes down? :)


Same solution as every monitoring system: run a second, separate one. Whether it's the proverbial backhoe through the fibre link or a sysadmin fat-fingering a conf or deploy, there's always a way for a single system to be brought down.

I saw an absolutely crazy way to do this with Nagios[1]. Nagios saves its state in a file, and the suggested way to synchronise two separate systems was to have them look to see if the other one is active. If it is, sync the statefile across. If it isn't, start up nagios locally based on the latest sync'd statefile. I mean, the solution works, but due to Nagios it's inherently hacky. It's bizarre that Nagios doesn't (didn't?) have inbuilt support for failover this way.

[1]https://allmybase.com/2010/10/04/setting-up-fully-redundant-...


What do you use to monitor the status of the respective sites? Ping/head request to the site/api?


I actually don't monitor the status myself, I monitor the status that each service publishes. That is, whatever they report on their status page, StatusGator aggregates.

The whole thing was born after I was racking my brain trying to debug a problem, then I remembered to check status.whatever.com and realized it wasn't my problem to debug and the provider was already aware of it. I thought it would be nice to get notifications as well as aggregate that info from all the different services I use.


Cool site! Any plans to add IBM Bluemix? Their status page is at https://developer.ibm.com/bluemix/support/#status


Done.

Note a feature in the pipeline which will allow you to subscribe to specific components or specific regions of services. Which will make status notifications for large cloud platforms like Bluemix, Openshift, AWS, DigitalOcean, etc. much more useful.


In a situation like this, how do you know that routing to the status page is not the problem? Many of these companies use statuspage.io which is on AWS.

Edit: this is an edge case, clearly. I just think it's an interesting problem.



I run a spam filtering company in AWS-Oregon and we are under a DDoS that is directed to some of our financial services clients...We know that our client is the target because the Russian team claims responsibility opened up a trouble ticket with us (not hosted in AWS) about 2 hours ago to let us know that they would DDoS us and the network.

I am sure it's just a coincidence :-)


It appears that it is. This appears to have been a BGP problem.

https://twitter.com/Axcelx/status/616058414746202113


There's another post on HN:

"EC2 us-east is down" - https://news.ycombinator.com/item?id=9809304


"Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network. The issue has been resolved and the service is operating normally. "

http://status.aws.amazon.com/

I really hate how AWS use PDT for their status page. Seriously, is there any logical reason to use anything other than UTC? I find it means when there is a report, I first need to convert the PTD time shown to UTC, then to local time just so I can know if the issue is possibly the one I experienced. Unless you're also in PDT, couldn't they just save everyone else one step by showing all service issues in UTC?


Whichever time zone you pick, everyone not in that time zone is going to have to do a conversion. So why not pick the time zone where most of Amazon's engineers, and quite possibly most of their customers' engineers, are located?


PDT differs from PST. Conversion logic can get especially confusing if you've specified a date time, and then the laws surrounding day light savings time are changed to occur on a different day. Using UTC is like using a unix timestamp, you don't have to worry about a lot of things like locales & local laws, when the date time was recorded vs when its being viewed & what laws might have changed in-between. For example the date/time that daylight savings takes effect was changed in 2007. If you're trying to compare old logs to an old AWS announcement, it could get confusing... whereas UTC should remain immune to changes in laws.


This is exactly why I want UTC timestamps.


A Google query helps with this. For example, search for: 6:07 PM PDT to HKT

Google will return a special result at the top of the page: "6:07 PM Tuesday, Pacific Time (PT) is 9:07 AM Wednesday, Hong Kong Time (HKT)"


What percentage of admins(or people that care about these status) are in PDT compared to the rest of the world? If it's in UTC then everyone would have to do one conversion instead of nearly all people doing 1 conversion.

Also I know my offset from PDT but I don't know my offset from UTC so it's also saving me the step of having to lookup that offset also.


Note at the top of http://status.aws.amazon.com/:

Internet connectivity issues

We are currently monitoring an external Internet provider issue that is causing interrupted service connectivity to AWS services for some customers. AWS services are not affected and continue to operate normally.


I can't get the status page to load: http://status.aws.amazon.com/

Edit: Scratch that. Some stuff I'm hosting in AWS East is still working. Maybe someone screwed up router configs on the west coast.

Edit: Rollback the scratch. It's oscillating between up and down right now.


Status page updated:

"We are currently monitoring an external Internet provider issue that is causing interrupted service connectivity to AWS services for some customers. AWS services are not affected and continue to operate normally."

http://status.aws.amazon.com/


I woke up to see two of my instances, running in AWS Singapore, at 100% and unresponsive. Rebooting also did not help. They had to be killed. Thankfully AWS had documented this on their support page:

"Additionally, some customers have reported continued connectivity issues for some of their instances. We have seen with these reported issues that this has been caused by a leap second bug within the instance operating system, which results in 100% CPU utilization. We recommend rebooting the instance via the EC2 Management Console or API, or resetting the operating system time to resolve the issue. For further information see:

https://access.redhat.com/articles/15145"



A fiber cut in Oregon could be responsible (https://puck.nether.net/pipermail/outages/2015-June/007906.h...). I've been seeing connectivity issues with us-west-2 for most of the day.

There was also a fiber cut in San Francisco area this morning (http://www.usatoday.com/story/tech/2015/06/30/california-int...).


I can reach http://status.aws.amazon.com/ from my home Comcast connection in SF. The status page is showing green for all services.


Looks to be something strange with the network. Using a SOCKS proxy to a host running on he.net, I can get to US-East machines, but not from my Cogent-based connection in the office.


Yeah, that's what we're seeing. We can finally get to US-West2 again from our office. Employees who were working from home didn't experience an outage at all.



I'm able to tunnel into ec2 over our VPC and get to most things that are down like slack and netflix. Amazon.com is still unreachable


Leap second posts?


For what its worth, I usually start at [1].

[1] http://internetpulse.keynote.com/


Nope. A ton of websites don't work.(even amazon.com)


I've been having issues all week reaching a lot of different AWS websites including atom.io.

Switching my DNS to 8.8.8.8 fixed it for me. Something is up though.


faced similar dns issues using the default name server that ec2 points to. We ended up running bind.


I just got an email from abovenet -- apparently they had a fiber cut and they're having major routing issues affecting the san francisco area.


Yes, it's been suffering from the massive load of launching Beats1. Beats1 from Apple went down, but looks like it's back up finally.


Citation?



and what makes you think that apple uses AWS? Considering they've invested billions in building data centers for themselves in NC and CA.. and they've building two new ones in EU, and one in OR. http://www.datacenterknowledge.com/the-apple-data-center-faq... https://www.apple.com/pr/library/2015/02/23Apple-to-Invest-1...


I'm connecting fine to us-east.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: