Last year, the U.S. Virgin Islands suffered unprecedented damage from two Category 5 hurricanes. These events crippled the IT infrastructure on all four islands, creating a unique opportunity for the Bureau of Information Technology, or BIT, to leverage resources and capitalize on lessons learned. Because the entire network suffered, damage to the phone system housed on the server, created a new dynamic in IT product and service delivery. Throughout the recovery and restoration period, BIT maintained service and support commitments beyond our usual scope, while trying to focus on recovery efforts — in essence, the bureau was challenged with “fixing the tire with the car rolling.”
On Sept. 6, Hurricane Irma, a Category 5 hurricane, decimated the islands of St. Thomas, St. John and Water Island in the U.S. Virgin Islands, causing significant damage to the islands’ power, water, and communications infrastructure. Two short weeks later, Hurricane Maria, another Category 5 storm, disrupted relief operations when it struck the island of St. Croix on Sept. 20, causing the same fate to the island’s infrastructure. The territory suffered a territory-wide blackout, downed communication towers, and destruction to thousands of residences.
In my role as director of BIT and chief information officer for the territory, I led the team through our emergency disaster preparation mode to secure facilities and equipment in both districts prior to impact. The performance of the recovery efforts taken by BIT personnel were critical to the entire territory depended on accurate assessment, reliable response time and innovative solutions. Without this three-tiered approach, the territory risked not only losing the confidence of its citizens but also risked a dramatic slowdown of all services. Our primary objectives were to secure the data server that services the entire government and ensure that the public safety Land Mobile Radio Network remain operational throughout and after the storms’ impact.
I assembled managers and support staff on both islands to quickly assess disaster recovery operations to account for critical utilities and hardware. The BIT staff utilized a standard system of delivery and recovery actions outlined in the Territory Emergency Operations Plan and reacted to unforeseen incidents as they occurred. The actions executed during the waning hours of the disaster provided input to be included in the next iteration of the plan.
The most critical area proved to be data storage. The territory relies on a locally-maintained server located in the St. Croix facility. Data and applications stored on this device are critical to territorial government operations. In addition, the device stored data files for approximately 36 agencies operating throughout the territory. The system utilized an on-premise backup solution housed in the same location as the server. Upon arrival in November 2016, I prioritized duplicating backups with the cloud solution, Microsoft Azure.
Prior to the storm and upon my arrival, the team worked to transfer this data to cloud storage resources. Time and resources proved to be a major impediment — a tedious and newly-developed back-up plan had to be executed while the team continued to provide daily services and support. This was coupled with the issue of limited cloud storage space available to accommodate a full backup set. However, a last-minute attempt to identify critical mission-essential programs proved to be a daunting task, so systems managers conducted a sweep to backup all files without setting priority.
The BIT employs four tower/radio personnel who operate from our St. Croix and St. Thomas offices. The towers facilitate the territory’s land mobile radio (LMR) relay equipment and microwave links that provide internet connectivity between the islands of St. Thomas and St. Croix.
Prior to Hurricane Irma, BIT tower personnel on both islands evaluated the status of all towers maintained by the organization, calibrated emergency responder radios, and prepositioned fuel and spare parts to mitigate power outages. Preparation proved essential and provided the head start needed to execute recovery efforts after impact. The new mission of coordinating telephony service recovery added to the Bureau’s list of newly acquired responsibilities.
Power proved to be the limiting factor along with inaccessible roadways caused by debris scattered throughout the islands. The National Guard and prepositioned federal disaster personnel assembled on St. Croix to react quickly to the predicted devastation. The governor dispatched all Cabinet personnel living on St. Croix to St. Thomas to assess the damage by military supplied helicopters. BIT operations continued despite major breaks in the power grids because the St. Croix office location remained intact.
Hurricane Maria proved to be the straw that nearly broke the camel’s back. St. Croix served as the hub and staging area for St. Thomas district relief operations. This single storm interrupted government-wide IT services as the main server and backup services failed after the storm subsided. Interrupted services include internet services to government agencies, and data and applications storage that provides critical services to the public. Some remain inaccessible months after the storm.
The Bureau, other government agencies, as well as citizens of the USVI still suffer from the impact caused by loss of critical applications and data services as a result of the storm. The Bureau disassembled the RAID drive array and mailed it to a data recovery service in the continental United States, but the array was returned weeks later with the company stating it was damaged beyond repair. There are other services that claim to have the ability to recover this data, but at a significant cost.
In summary, USVI leadership must update the Territory Emergency Operations Plan and BIT Disaster Recovery Plan.
Backup generators only provided sporadic power that strained IT equipment. Power failures on all four islands — St. Thomas, St. Croix, St. John and Water Island — lasted for more than six months in some areas. All BIT facilities are equipped with power generators, but these generators are designed to provide temporary relief and proved unreliable after extended periods of usage.
We also discovered the need to store data in three places. The Bureau experienced critical losses because of inadequate planning. BIT recommends organizations store data in the cloud, on local devices (if available) and in locations owned and operated by data owners. This key point can and will save organizations money and time and, most of all, will restore the faith of customers in emergencies. Of course, this requires a review of service level agreements and local policies, but consider this an essential element in disaster planning and recovery.
Finally, the major lesson learned throughout the recovery period is that of the critical role that training and education plays in developing and executing a disaster recovery plan. Team members of the Bureau of Information Technology agree that real-world training is an essential element that must be factored into the plan. BIT suggests that organizations “pull the plug” on all things automated and see how support staff and customers react. It is the only realistic way to assess an organization’s ability to react and recover from emergencies.
The United States Virgin Islands is setting the standard for disaster recovery. Lessons learned during Irma and Maria should stand as a projection platform for stateside disaster planning.