Is Amazon's cloud crash a cautionary tale for government?

Problem at EC2 data center takes Reddit, Quora and other popular sites off-line for an extended stretch

Federal agencies are moving some online operations to the cloud, whether they like it or not. After Amazon Web Services crashed April 21, they might like it a little less.

Amazon’s Elastic Compute Cloud, which provides hosting services for a growing number of Web operations, suffered an extended outage at one of its data centers, starting in the early morning and extending at least into the afternoon. It took some popular Web 2.0 sites with it. Several federal sites hosted by EC2 appeared to be unaffected.

The outage hit at 4:41 a.m. Eastern Time at Amazon’s data center in Northern Virginia and brought down EC2 customers such as Reddit, Quora and HootSuite. At 3:10 p.m., Reddit was partially up in emergency read-only mode in which some submissions were displayed but users could not log in. Quora and HootSuite were still down.


Related stories:

Agencies, choose your clouds -- here are the 3 basic options

At last, a solid definition of what a cloud looks like


In a series of updates on AWS’ status dashboard, Amazon said it was experiencing connectivity problems and latencies in the eastern United States.

Shortly before noon, Amazon said a “networking event early this morning triggered a large amount of re-mirroring of [Elastic Block Storage] volumes,” which provide storage for the company’s cloud customers. The re-mirroring created a shortage of capacity and made it “difficult to create new EBS volumes,” the update said.

About 1:30 p.m., Amazon reported “significant progress” but could not estimate when all the affected storage volumes would be recovered. As for the cause of the problem, the update noted that “we always know more and understand issues better after we fully recover and dive deep into the postmortem.”

The incident could be of concern to federal agencies, which are under orders from the Office of Management and Budget to move some of their operations to the cloud. OMB’s cloud-first policy, part of its 25-point plan for IT reform, requires agencies to move three applications to the cloud within 18 months.

Commercial cloud hosting services are one option for agencies. Earlier this year, the Treasury Department moved its Treasury.gov and other public-facing websites to EC2. Those sites, including MyMoney.gov and the IRS Oversight Board's site, were working fine during the recent crash.

Any system or website faces the possibility of an outage, whether it’s cloud-based or not. But when a cloud provider goes down, it can take its customers with it. One of the concerns about cloud computing has been the implications of having a single point of failure for multiple sites or services. Amazon’s troubles could provide ammunition for that argument.

About the Author

Kevin McCaney is a former editor of Defense Systems and GCN.

Reader Comments

Mon, Jan 30, 2012 Joomla Cloud Pakistan

Still can’t blame Amazon totally but they are liable for this one indeed. the fact that they claim perfect data back-ups and guarantee no data will be lost in their state-of-the-art cloud and couldn’t hold their promise is indeed Amazon’s fault. I do find it hard to believe that they don’t have “near-real-time” off-site data back-ups. In order to prevent a complete data loss should anything happen in one data center, a redundant data center should be set up to copy that data. This is how we set up our technology at Cloudways.com. If our main site goes down, we can use the data copied to another data center to recover. What I suspect happened with Amazon’s data loss is that the corruption of the data in one site was replicated to a remote data center before they caught it, hence the remote back-up was not good either. If this is true (Amazon has yet to offer an official explanation), then their monitoring of the data integrity system is to blame.

Fri, Apr 22, 2011 Jill

I would bet on Amazon's service being more reliable than the government's on premise solutions any day. The article also noted 3 companies that suffered issues, but not the government business that is there now. That tells me there are different redundancies available for business continuity.

Fri, Apr 22, 2011 Anon1234

The REAL issue for Feds here is that in the ABSOLUTE RUSH to force stuff into the Public Cloud, they have decided they no longer have to care about Service Level Agreements (SLA) around UPTIME! So, whereas Private or Hybrid Cloud providers (who usually get held to SLAs) would now owe the taxpayers/government/hootsuite/whoever a refund or penalty payment for the outage downtime, Amazon gets away with it scott-free! Yay, public cloud! So, Amazon's incentive to improve is WHAT, exactly? (shakes head)

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

What is your e-mail address?

My e-mail address is:

Do you have a password?

Forgot your password? Click here
close

Trending

  • Dive into our Contract Award database

    In an exclusive for WT Insider members, we are collecting all of the contract awards we cover into a database that you can sort by contractor, agency, value and other parameters. You can also download it into a spreadsheet. Our databases track awards back to 2013. Read More

  • Navigating the trends and issues of 2016 Nick Wakeman

    In our latest WT Insider Report, we pull together our best advice, insights and reporting on the trends and issues that will shape the market in 2016 and beyond. Read More

contracts DB

Washington Technology Daily

Sign up for our newsletter.

I agree to this site's Privacy Policy.