Advance Disaster Recovery Planning Pays Off in Atlanta Snowstorm

 

Fire extinguishers.  First aid kits.  Spare tires. Life jackets.  All of these items have one quality in common: we expect them to be there when we need them; we don’t usually think about them in the interim; and when we do need them, they had better work!

A Disaster Recovery (DR) facility falls into this category—generally out of sight and out of mind, but when something happens to the primary site, our business depends upon a timely and effective reconstitution of operations.  So it was with relief  (and a bit of pride) that when events forced Intelsat General to activate our DR plans last week during the snow event in Atlanta, our DR site and plans came through with flying colors.  We were able to maintain operations at IGC even though staff was not able to physically access our primary operations center in Ellenwood, Georgia, for about 24 hours.

To recap what happened, a few inches of snow and ice paralyzed the Atlanta region starting the afternoon of January 28 and continued until the 30th.  For two days the nation was riveted by televised images of wrecked and abandoned cars along all major thoroughfares around metropolitan Atlanta.

This prevented many of our staff from safely leaving home and travelling to our Intelsat Secure Operations Center (ISOC), starting at about 4 PM on the 28th.  To sustain operations, we implemented our DR plan. The remaining personnel stuck at the operation center arranged themselves into two shifts, sleeping and eating on site. We had planned ahead for this possibility, and had cots, food, and hygienic facilities available.

But these shifts were staffed at less than what is required for full daytime operations, so we instituted a partial activation of our DR site located at IGC headquarters in Bethesda, MD.  The next day, qualified technicians were on watch and available to take “rollover” calls from the Atlanta operations center, ensuring that all operations could proceed as planned, even though the Atlanta road network was paralyzed.

A defining feature of DR events is that there is no “one size fits all’ solution.  A good plan must accommodate all eventualities, from complete destruction of the primary site to limited staffing for a temporary period. In this case the equipment at the primary site—servers, routers, phone switches—was all working, so there was no need to activate the backup equipment that we maintain for this purpose.

The phone switch forwarded calls to our DR site when call volume was heavy, and these calls were answered and handled by backup staff in Bethesda. This was completely transparent to our customers.  If the crisis had continued, we could have rolled all calls to the DR site and continued operations indefinitely. In the event of destruction of the primary site or power outage, we have a completely redundant system located in Bethesda that could have taken over seamlessly. No heroics were needed to accomplish this, just calm implementation of the DR plan.

Some “out of the box” thinking did prove useful, however. As luck would have it, one of our major customers was rolling out a new satellite communications system that week.  This was a complex and expensive effort that required action by technicians stationed around the country, including at the Atlanta operations center.

Engineers and vendor reps had travelled there to execute and supervise the rollout.  As the second day of the crisis dawned, they found themselves stuck in their hotel rooms in Atlanta.  Would the test have to be scrapped and rescheduled?  This would affect dozens of technicians, engineers, and ships that had been staged for the test.  Fortunately one of our engineers at the ISOC was able to configure a VPN and WebEx that allowed the stranded technicians to continue the testing from their hotel rooms. The show must go on, as the saying goes, and it did!

By the evening of the 29th, roads had been cleared enough that a relief crew was able to get to the ISOC and restore full staffing.  We then deactivated the DR site. Operations had been sustained and a major customer exercise had been able to continue, in spite of the fact that for a while images from Atlanta looked like something out of Dante, only with snow instead of fire. The DR site once again faded into the background, to be ready for any future contingencies.

Related Blog Posts Back to Blog ›

Infrastructure