911 vendor Intrado takes responsibility for widespread outage
A 911 service outage that affected 14 states for more than an hour Monday evening was caused by a networking misconfiguration at the technology firm Intrado, the company told StateScoop.
The incident marks the latest in a series of outages in recent years that can be traced back to mistakes at Intrado, which acts as a subcontractor for larger companies that operate the nation’s telecommunications infrastructure. This most recent incident has captured the attention of federal officials and could prompt reform in public safety regulations.
In a statement, Intrado said the outages lasted for 77 minutes Monday before service was restored.
“Initial analysis identified an internal networking component that was not correctly forwarding traffic, resulting in the impairment of call delivery,” the statement reads. “Our investigation into this incident is ongoing and we remain focused on communicating transparently with our customers and stakeholders.”
Emergency services departments in the affected locations, which spanned from California to Pennsylvania, are investigating the outages to understand their full impact, such as the number of missed emergency calls. According to an early analysis of the outage in Minnesota conducted by Lumen, formerly called CenturyLink, 135 emergency calls made to 24 public safety answering points failed to reach a dispatcher. State officials there said they’re continuing to investigate and that they believe there are more missed calls than those identified in Lumen’s preliminary report.
If 911 outages are rare, large multi-state outages like the one this week are even less common. Brandon Abley, the technical issues director for the National Emergency Number Association, told StateScoop that a large outage happens maybe once every two years.
“Generally 911 systems are built to a high degree of reliability and resiliency,” Abley said. “But even with a very high reliability, having service impairments is something that just happens. It happens rarely and it should never happen, but given the number of dependencies in the number of domains and public and private entities involved, it is technology and even the most reliable technologies do have impairments from time to time.”
Two other complicating factors, he said, are that the nation’s emergency calling system is both highly federated and highly dependent on a small number of private companies that operate the systems.
“Everyone wants to know the answers right away and it’s just hard to give them because the United States does not have a single 911 system, so it’s very hard to trace the root cause,” Abley said. “There are a lot of interdependencies in the way that we provision and handle 911 services. There are a few large vendors that operate in a lot of markets and operate many 911 systems.”
Todd Miller, an executive at Rave Mobile Safety, told StateScoop that 911 outages can be caused by technical mishaps like the one that caused Monday’s outage or merely a shortage of capacity, which has been observed during disasters like school shootings and natural disasters, incidents in which calls spike and an overloaded system becomes inaccessible.
“Is that acceptable? No. All of us who are involved in emergency response would say that’s not acceptable,” Miller said of missed emergency calls. “We want to make sure that absolutely every call to 911 can be received and dispatched in an effective manner.”
This isn’t the first major outage caused by technical issues at Intrado.
Another multi-state outage in April 2014 that lasted for six hours affected counties in seven states and resulted in more than 6,000 failed 911 calls. During that outage too, networks operated by CenturyLink, Verizon and other telecommunications companies were affected. The root cause was a software bug in the call routing service provided by Intrado, then called West Safety Services. In its software the company had put a hard limit on an incoming-call counter of 40 million, and after that number had been reached, new calls were dropped until the company found the bug.
The Federal Communications Commission investigated the 2014 outage later that year and fined CenturyLink $16 million, Verizon $3.4 million and Intrado $1.4 million for their roles in the incident.
Again in August 2018, a mistake at West Safety Services caused an outage in Minnesota that lasted more than an hour and prevented nearly 700 calls from connecting. CenturyLink later attributed the outage to “human error” on the part of a West Safety employee who had “made a mistake while making a network configuration change.”
Monday’s incident immediately caught the attention of FCC Commissioner Jessica Rosenworcel, who opined on Twitter that 911 needs to work all the time. She reiterated the point in an email to StateScoop.
“When you dial 9-1-1, you need it to work,” she wrote. “The FCC needs to investigate and get to the bottom of what happened here so it doesn’t happen again.”