Drupal Gardens outage due to Amazon Web Services

Chris Brookins's picture
Moderator

Chris Brookins
April 21, 2011
10:21am

Update @ 11:56PM ET We believe all sites to be recovered now. If you see any errors, please let us know and we'll investigate further.

------------

Update @ 4:34PM ET Amazon continues to enable more and more storage volumes.  We understand it's been a very brutal time for all our customers. We'll keeping posting progress information as we go and look for side effects of the outage.

No further AWS status update at this time.

------------

Update @ 12:50PM ET More sites are coming back online all the time.  Others continue to be down.   We continue to pressure Amazon and stay on top of the situation.   We have also posted a formal apology and an announcement regarding compensation to those affected by this outage.  Amazon has posted the following 2 status updates:

----- AWS Update follows ------
 
8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.  

6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.

------------ 

Update @ 7:40AM ET Many more sites just came back on-line.  We continue to monitor the situation and we will report back here when we have more information.

 

----- AWS Update follows -----

Update at 6:37AM ET - Thanks to AWS and the Drupal Gardens night-shift team, many more Drupal Gardens sites are back online but unfortunately due to issues at AWS still many remain offline.  Read below for the most recent update from AWS.  Note that when AWS refers to "volumes" they mean cloud-based disk drives.  Again no data has been lost, they are just offline:

2:41 AM PDT (AWS Update) We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.  When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.

--------------

update at 11PM ET - looks like affected Drupal Gardens sites will be up no earlier than tomorrow AM.  We will have people up all night if AWS sees a breakthrough in the next few hours.  Read below for the update.

6:18 PM PDT (AWS Update) Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.

-------------

update @ 8:37PM ET

There is no new update from Amazon.  We have called them every hour and they continue to work on the issue that is preventing access to many Drupal Gardens sites.  If it is any consolation (we realize it is not much) this has impacted the largest properties on the internet including groupon, foursquare, NY Times comments, reddit, etc.  We have a call scheduled with top Amazon people @ 7AM tomorrow where we will get a more detailed update and evaluate other possible options.  

In the interim we are rolling out an improved message on all affected Drupal Gardens sites to inform your visitors that says: 

"The website that you're trying to reach is having technical difficulties and is currently unavailable. We are aware of the issue and are working hard to fix it. Thanks for your patience."

The latest update from Amazon's status page is below - some Drupal Gardens sites have come back - we are hopeful this will be fully resolved soon:

---

1:48 PM PDT (AWS Update) A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.

----------

Today we experienced our first sustained outage of Drupal Gardens sites, along with many other large web services because of an outage in one region of Amazon Web Services, where Drupal Gardens is hosted.  Initially the entire service was affected, and within a couple of hours, Amazon stabalized enough of their services so that all but one of the seven clusters were back up and running.  Unfortunately sites residing on one of our clusters are still down and we are waiting on Amazon to restore service to that cluster of servers.  We are communicating with Amazon in real time on this issue and they are working hard to resolve it ASAP.

We take uptime and high availability seriously which is why every Drupal Gardens site is running across multiple multiple web servers and multiple clustered database servers.  If any one or even several servers should go down all sites are unaffected.  Unfortunately this time an entire region of Amazon Web Services was affected.  We are exploring methods to mitigate these kinds of outages in the future.

Drupal Gardens Actions: 
We are following Amazon's efforts to correct the situation.  
Next Update:
We apologize for the inconvenience and will post back here and on Twitter @drupalgardens an alert when the Amazon issues are resolved and access to all Drupal Gardens sites are fully restored.

Status: Resolved

Comments

adminb April 21, 2011
1:45pm
Under paragraph 3 of your memo, "Amazon's efforts", the link is broken or either it is located in a server that is not working. BTW,is Amazon going to re-imburse people for the sales that were lost due to this outage (LOL). I can bet your bottom dollar that they have got a clause in their contract that covers that. Hope that I have not started a contraversial topic here. Carlyle adminb C
Moderator
Chris Brookins April 21, 2011
12:01pm

I have fixed the link.  Sorry about that.  

To answer your other question.  The AWS terms of service is here http://aws.amazon.com/agreement/ and the Drupal Gardens terms of service is here http://www.drupalgardens.com/tos

We will report back when we have more information on this incident.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

T@uandmii April 21, 2011
12:08pm

Please hurry, or I will cry! :-(

Moderator
Chris Brookins April 21, 2011
12:12pm

I pasted the latest update from Amazon Web Services (AWS) below.  Sorry if this is a foreign language to some of you, but it means basically they are still solving the issue that is impacting Drupal Gardens, and many other services.  BTW you can see these AWS updates for yourself as new updates become available by going to http://status.aws.amazon.com/ and clicking the "more" link next to "Amazon Elastic Compute Cloud (N. Virginia)" 

-------- Pasted notice from AWS follows -----

8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

adminb April 21, 2011
12:21pm
here is another interesting article to read: http://newenterprise.allthingsd.com/20110421/amazons-cloud-crashed-overn... Carlyle adminb
Moderator
Chris Brookins April 21, 2011
1:08pm
Moderator
Chris Brookins April 21, 2011
1:10pm

The latest update from Amazon's status page is below:

 

10:26 AM PDT We have made significant progress in stabilizing the affected EBS control plane service. EC2 API calls that do not involve EBS resources in the affected Availability Zone are now seeing significantly reduced failures and latency and are continuing to recover. We have also brought additional capacity online in the affected Availability Zone and stuck EBS volumes (those that were being remirrored) are beginning to recover. We cannot yet estimate when these volumes will be completely recovered, but we will provide an estimate as soon as we have sufficient data to estimate the recovery. We have all available resources working to restore full service functionality as soon as possible. We will continue to provide updates when we have them.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

ckopack April 21, 2011
1:24pm

This is such a drag and really makes me think twice about my hosting situation. We are going on 10 hours on the first full day of our "new" website being launched. My boss and co-workers are not happy with me.Is Acquia hosted on Amazon? Their site and Drupal Gardens seems to be working fine. 

Moderator
jeannie.finks April 21, 2011
1:37pm

As a result of the outage, a portion of our sites may be incurring a 502/504 Bad Gateway error message. Our operations team is working on updating that message while the sites affected are being brought back up.

A collection of website properties, including Acquia, are indeed hosted with Amazon and are being brought up in a sustainable, controlled method. We continue to monitor the outage situation.

---
Regards, Jeannie | Gardens Advisory Team

drpignotti April 21, 2011
1:37pm

What sort of refunds will you be providing and what steps will be taken to make sure this doesn't happen again?

adminb April 21, 2011
1:44pm
The problem appears to be Amazon's cloud-computing infrastructure.For those of you who like myself who is not computer hardware savy,if you go to tthe link below, you will find a very good explanation of 'cloud-computing infrastructure' http://en.wikipedia.org/wiki/Cloud_computing
Moderator
Chris Brookins April 21, 2011
1:45pm

avance6, we have meetings with Amazon.com executives (CTO, etc) coming up and will be discussing that very question.   When we have more information we will share with you.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Moderator
Chris Brookins April 21, 2011
1:47pm

 

The latest update from Amazon's status page is below:

---

11:09 AM PDT A number of people have asked us for an ETA on when well be fully recovered. We deeply understand why this is important and promise to share this information as soon as we have an estimate that we believe is close to accurate. Our high-level ballpark right now is that the ETA is a few hours. We can assure you that all-hands are on deck to recover as quickly as possible. We will update the community as we have more information.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

fredvasse April 21, 2011
1:50pm

After having hosted my own websites for the past ten years I decided to outsource now. After long search I decided to go for the professional service at Drupal Gardens. After building my site during the past months it went live a few days ago because this Easter weekend is the opening of our new restaurant. But the site is down for a long time and still no update when it will be live again! This is not what I expected from Acquia...

Webmaster, consultant Nieuwe Media en eigenaar van Theehuis Dennenoord.

Follow me on my blog :

adminb April 21, 2011
1:52pm
In reply to ckopack statement, I cannot understand why a large organiztion like amazon does not have a backup system so if something like what has happened, the backup automatically turns on. This is a good lesson and I just hope that Drupalgardens can learn from this and have an emergency plan so our websites would not go down. This is extremely critical for hosts and ISP providers. Carlyle
Moderator
Chris Brookins April 21, 2011
2:02pm

adminb, fredvasse, 

We have large backup systems in place for Drupal Gardens on amazon and no data has been lost.  The problem is that when problems exist in an entire amazon data center we can't display the sites there (just like the hundreds of other companies affected like foursquare, groupon, etc).  We are working on a plan to mitigate this risk in the future.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Moderator
Chris Brookins April 21, 2011
2:51pm

 

The latest update from Amazon's status page is below - we are hopeful this will be resolved soon:

---

12:30 PM PDT We have observed successful new launches of EBS backed instances for the past 15 minutes in all but one of the availability zones in the US-EAST-1 Region. The team is continuing to work to recover the unavailable EBS volumes as quickly as possible.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Susan MacPhee April 21, 2011
3:51pm

Thank you Acquia for doing your best. I'm sure service will be even better than before, which already exceeds any traditional hosting. Keep us posted.

"We take uptime and high availability seriously which is why every Drupal Gardens site is running across multiple multiple web servers and multiple clustered database servers.  If any one or even several servers should go down all sites are unaffected.  Unfortunately this time an entire region of Amazon Web Services was affected.  We are exploring methods to mitigate these kinds of outages in the future."

Moderator
Chris Brookins April 21, 2011
3:56pm

 

The latest update from Amazon's status page is below - some Drupal Gardens sites are coming back - we are hopeful this will be fully resolved soon:

---

1:48 PM PDT A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Michael Cisse April 21, 2011
5:33pm

I am still getting the error below. When is this going to be fixed? http://www.nylatestnews.com/

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

photoshopisland April 21, 2011
5:47pm

photoshopisland.com also still down. Traffic loss resulting is lost growth to my email list and customers. Who do I complain to?

Randall April 21, 2011
6:41pm

Yeah my clients have been calling me today also. What is weird now is that 2 sites of mine are down and 2 sites are up.. So hopefully things get restored.

Sad day for servers i guess :(

Moderator
Chris Brookins April 21, 2011
7:36pm

Folks, there is no new update from Amazon.  We have called them every hour and they continue to work on the issue that is preventing access to many Drupal Gardens sites.  If it is any consolation (we realize it is not much) this has impacted the largest properties on the internet including groupon, foursquare, NY Times comments, reddit, etc.  We have a call scheduled with top Amazon people @ 7AM tomorrow where we will get a more detailed update and evaluate other possible options.  

In the interim we are rolling out an improved message on all affected Drupal Gardens sites to inform your visitors that says: 

"The website that you're trying to reach is having technical difficulties and is currently unavailable. We are aware of the issue and are working hard to fix it. Thanks for your patience."


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Moderator
Chris Brookins April 21, 2011
8:48pm

 

Latest update from AWS - looks like affected Drupal Gardens sites will be up no earlier than tomorrow AM.  We will have people up all night if AWS sees a breakthrough in the next few hours.  Read below for the update.

------

6:18 PM PDT Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.

 


Chris Brookins
VP Engineering, Acquia - blog - twitter -

benbentzin April 21, 2011
11:08pm

That's it?  We will have over 24 hours of down time and all we know is "...looks like affected Drupalgardens sites will be up no earlier than tomorrow AM..."  Drupalgardens is supposed to be a next generation platform and all we rate is a 3 hour old update of gobbledygook after 21 hours of outage? Really? No regularly updated communication from Amazon?  No. here is how it happened and it will never happen again?  Just, you are down, suck it up?

josebasubi April 21, 2011
11:28pm

My site is(was) up and runnyng right now. Seems have to wait a bit more to use it, amprfff...

http://www.reina-tanaka.es/

collins April 21, 2011
11:50pm

Well, this is a ridiculous amount of downtime. I've been putting up websites for many years and I've never seen anything like this. I'm not talking about the sites going down, every hosting company has down time, but 24 hours and counting and no end in sight and no real explanation? Very unprofessional.

My main website has been up for well over a decade, and I think the longest it was ever down was an hour. Most problems have got solved in 20-30 minutes (daytime, nights, weekends, etc.), including clear explanations.

I have to say that just because you can sell books and electronics and teabags, that doesn't make you a web hosting company.

(I'm not at all commenting on DG, BTW, where there has always been a very high level of both responsiveness and expertise. Neither of which is shared by AWS)

"We take uptime and high availability seriously which is why every Drupal Gardens site is running across multiple multiple web servers and multiple clustered database servers.  If any one or even several servers should go down all sites are unaffected.  Unfortunately this time an entire region of Amazon Web Services was affected.  We are exploring methods to mitigate these kinds of outages in the future."

This is the part that's especially alarming, since obviously having hosting and "backups" at the same place is not a real backup strategy.  I know amazon has been claiming that their "Availability Zones" function independently, but this is clearly not the case. Here are two good articles about that:

http://www.theregister.co.uk/2011/04/21/amazon_web_services_outages_span...

http://www.theregister.co.uk/2011/04/21/amazon_cloud_probs/

This is like if I had all my important files on a flash drive, then for safety I copied them to another flash drive, but then I carried both flash drives in the same pocket and called it my backup plan.

--
http://www.u-town.com/collins

(a writing blog)

josebasubi April 21, 2011
11:50pm

Yay ! I managed to post a blog entry.  Seems it is slowly going back to service, but very slooooow.

ckopack April 22, 2011
12:10am

I have to agree with collins. My site is still down. I was embarrassed to even go to the office today. What a drag all day. This won't happen again. Sorry Gardens I think I'm better off without you.

3foot6 April 22, 2011
12:12am

Hi - please can you explain what an availablity zone is - I have been experiencing problems with my site for the past week - still down - I live in the UK

 

"1:48 PM PDT A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally."

 

josebasubi April 22, 2011
12:41am

DG should ask for a compensation because Amazon says they assure a 99.95% uptime/year.

http://bit.ly/fmxwAu

adminb April 22, 2011
1:01am
Folks it is now 1;46 am EST and my website is not up. I have read some of the comments and I agree with you all. In life we learn lessons the hard way. The incident has already happened and we have seen what it has done.It was not really the folks at Drupalgardens who created the problem. I am sure that they are re-thinking their strategy on how this situation will never happen again. We really do not know what happened at AWS but they will be paying a big price for this situation. Their cloud computing infrastructure has been criticized and they will have to take responsibility. Let's move forward and hope that this crisis will be resolved as quickly as possible. Let's not bash at each other and create an environment that is not healthy. Carlyle adminb
pelleaardema April 22, 2011
2:20am

I understand it's very early morning for you, but... any updates on the progress Amazon is making?

Suranas Jewelove April 22, 2011
3:50am
blaireallison April 22, 2011
4:43am

Looks like others have their website back up.

Mine still shows an error:  The website encountered an unexpected error. Please try again later.

www.loveguru.net

I'm sure you're working on it. Just updating it from my end.

Thanks!

josebasubi April 22, 2011
5:36am

Mine is UP and running like always.

http://www.reina-tanaka.es/

I hope you guys get your sites connected soon.

Also, I am going to do backup of my page in case.... (every week?)

Moderator
Chris Brookins April 22, 2011
5:44am

 

Update at 6:37AM ET - Thanks to AWS and the Drupal Gardens night-shift team, many more Drupal Gardens sites are back online, but unfortunately due to issues at AWS still many sites remain offline.  Read below for the most recent update from AWS.  Note that when AWS refers to "volumes" they mean cloud-based disk drives.  Again no data has been lost, they are just offline:

2:41 AM PDT (AWS Update) We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.  When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.

--------------


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Moderator
Chris Brookins April 22, 2011
6:40am

Update @ 7:40AM ET.

Many more sites just came back on-line.  We continue to monitor the situation and we will report back here when we have more information.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

carnellm April 22, 2011
7:27am

My main complaint, well other than the downtime of course, is that there was no communications from DG for so long. The customers of DG were on these forums for a good 6 to 10 hours complaining of the outage before DG ever offered any statement. Is there no one on duty during the night? And no automated alarms.

Additionally, no emails went out to the effected customers once the outage was discovered. So, any client would have to be surprised by finding out their site was down by an annoyed customer or worse yet, boss. That is not the way a data center should be run. In other words, it is being run reactively instead of proactivley.

I understand outages and I understand them being with another vendor so they are mostly out of your hands, but what I don't understand is the systems controls and lack of communication.

Netridder April 22, 2011
8:28am

I understand you are a very US-oriented business - therefore I am worried that you prioritize the US customers ahead of european customers.

My www.myname.drupalgardes.com is working but my www.holbaektennis.dk does not - Please confirm this is not the fact!

welshjs April 22, 2011
8:56am

Probably already said somewhere. At the root of the problem,is the fact that DG is cloud-based. Most customers of DG's service did not know that DG was hosted by Amazon. It certainly didn't occur to me that DG was hosted by a 3rd party. The feeling of powerlessness is heightned by the fact that DG was just another service waiting in line for the same information as everyone else. At least if your site is hosted by a bona-fide hosting service you have a better feeling that you're in control.

josebasubi April 22, 2011
9:18am

This is not the fact. I am using "es" in Spain and I am a customer living in Spain. And the site is working properly.

yoav April 22, 2011
9:57am

Would having CDN help in this situation?? I postphoned my migration to DG also due it's slow performance in EU/Israel. Wonder if CDN would have solved this situation where it seems it's all stored in one place. Blessings on DG efforts

Yoav A. Med and Media Project Management | Business Development | Expert Services

collins April 22, 2011
10:02am

I agree with carnellm.

1) There has to be some way to contact the DG support people in an emergency, other than just posting in the forum. The forum is, in general, a great way to get support and share ideas and solutions, and this forum in particular is one of the best I've ever seen, but, like all tools, it doesn't fit every need.

(BTW, I would be fine with emergency notification being one of the things people pay for, with freeloaders, like me, using the forum only.)

2) There has to be better communications to the users, via email, in an emergency situation.  For example, what if this site was down? How would emergency communication happen then?

All crises expose weaknesses in systems and plans, and I think this one has demonstrated that communications (both ways between DG and its users) need an overhaul.

--
http://www.u-town.com/collins

(a writing blog)

Moderator
Chris Brookins April 22, 2011
10:04am

A CDN would not have helped as Drupal is a content mgmt system and it generates all pages dynamically.  But it will help performance which is why we will be doing it.

The only thing that will help is for us to replicate DG sites across multiple separate datacenters which we are investigating - but that increases costs.  Is it worth more $ for the highest level of high availability for some % of gardens customers?  There are cheaper solutions also if some shorter amount of downtime is tolerable, and we are also investigating those too.


Chris Brookins
VP Engineering, Acquia - blog - twitter -

Moderator
jeannie.finks April 22, 2011
10:12am

@carnellm and fellow members,

We 100% agree with you in regards to the communications. There were several lessons learned from this particular unusual event and we were caught unprepared. I can speak on behalf of other Drupal Gardens staff, including Operations colleagues that worked tirelessly through the night to monitor and bring up servers, we are actively making changes to the communications process. If we do not get something 100% right the first time, we immediately take steps to correct and move forward.

We will be deploying efficient communication processes we've used for our other products and services to Drupal Gardens.

Kent Gale, our Senior Director of Client Support, will be issuing imminently a more comprehensive response.

Not to distract from the current conversation, but I thought some of you would like to see an an open letter from a CEO of another company impacted by the outage; I thought the last paragraph was particularly compelling about the theme of communication.

http://roman.stanek.org/2011/04/21/mr-jassy-tear-down-this-wall/

---
Regards, Jeannie | Gardens Advisory Team

ckopack April 22, 2011
11:43am

Thanks again DG for getting me back up and running. Chris, this is a great response to "where do we go from here?" If I were DG I would be seriously considering the credibility of AWS as a hosting provider. I myself would gladly pay a little more every month for the peace of mind of high availability. This outage was bad news for me and DG just the same. Neither of us had control over the situation,  but we do get to chose what we do moving forward. 

Susan MacPhee April 22, 2011
12:03pm

I honestly think only good can come of this. DG is such a great project, Acquia team, amazing talented. Better to happen now while in this growth spurt together.

Moderator
Chris Brookins April 22, 2011
12:04pm

Update @ 12:50PM ET More sites are coming back online all the time.  Others continue to be down.   We continue to pressure Amazon and stay on top of the situation.   We have also posted a formal apology and an announcement regarding compensation to those affected by this outage.  Amazon has posted the following 2 status updates:

----- AWS Update follows ------
 8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.
  
6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.

------------ 


Chris Brookins
VP Engineering, Acquia - blog - twitter -

collins April 22, 2011
12:20pm

Still down.  The font on the "Technical Difficulties" message has been improved.

--
http://www.u-town.com/collins

(a writing blog)

This topic has been closed to further comments.

Feedback