April 21, 2011
Update @ 11:56PM ET We believe all sites to be recovered now. If you see any errors, please let us know and we'll investigate further.
Update @ 4:34PM ET Amazon continues to enable more and more storage volumes. We understand it's been a very brutal time for all our customers. We'll keeping posting progress information as we go and look for side effects of the outage.
No further AWS status update at this time.
Update @ 12:50PM ET More sites are coming back online all the time. Others continue to be down. We continue to pressure Amazon and stay on top of the situation. We have also posted a formal apology and an announcement regarding compensation to those affected by this outage. Amazon has posted the following 2 status updates:----- AWS Update follows ------
8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they're recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.
6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.
Update @ 7:40AM ET Many more sites just came back on-line. We continue to monitor the situation and we will report back here when we have more information.
----- AWS Update follows -----
Update at 6:37AM ET - Thanks to AWS and the Drupal Gardens night-shift team, many more Drupal Gardens sites are back online but unfortunately due to issues at AWS still many remain offline. Read below for the most recent update from AWS. Note that when AWS refers to "volumes" they mean cloud-based disk drives. Again no data has been lost, they are just offline:
2:41 AM PDT (AWS Update) We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.--------------
update at 11PM ET - looks like affected Drupal Gardens sites will be up no earlier than tomorrow AM. We will have people up all night if AWS sees a breakthrough in the next few hours. Read below for the update.
6:18 PM PDT (AWS Update) Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.
update @ 8:37PM ET
There is no new update from Amazon. We have called them every hour and they continue to work on the issue that is preventing access to many Drupal Gardens sites. If it is any consolation (we realize it is not much) this has impacted the largest properties on the internet including groupon, foursquare, NY Times comments, reddit, etc. We have a call scheduled with top Amazon people @ 7AM tomorrow where we will get a more detailed update and evaluate other possible options.
In the interim we are rolling out an improved message on all affected Drupal Gardens sites to inform your visitors that says:
"The website that you're trying to reach is having technical difficulties and is currently unavailable. We are aware of the issue and are working hard to fix it. Thanks for your patience."
The latest update from Amazon's status page is below - some Drupal Gardens sites have come back - we are hopeful this will be fully resolved soon:
1:48 PM PDT (AWS Update) A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.----------
Today we experienced our first sustained outage of Drupal Gardens sites, along with many other large web services because of an outage in one region of Amazon Web Services, where Drupal Gardens is hosted. Initially the entire service was affected, and within a couple of hours, Amazon stabalized enough of their services so that all but one of the seven clusters were back up and running. Unfortunately sites residing on one of our clusters are still down and we are waiting on Amazon to restore service to that cluster of servers. We are communicating with Amazon in real time on this issue and they are working hard to resolve it ASAP.
We take uptime and high availability seriously which is why every Drupal Gardens site is running across multiple multiple web servers and multiple clustered database servers. If any one or even several servers should go down all sites are unaffected. Unfortunately this time an entire region of Amazon Web Services was affected. We are exploring methods to mitigate these kinds of outages in the future.Drupal Gardens Actions:
We are following Amazon's efforts to correct the situation.
We apologize for the inconvenience and will post back here and on Twitter @drupalgardens an alert when the Amazon issues are resolved and access to all Drupal Gardens sites are fully restored.