[ispnet-announce] ISPnet 09/15/18 unplanned service interruption

Bob Tinkelman bob at tinkelman.com
Sun Sep 16 17:35:45 EDT 2018


During a window from 13:10-14:10 on Saturday, Sept 15, there
were problems within the ISPnet network.

Different ISPnet customers were impacted to varying degrees
and for varying periods during that window.

Analysis of the events is not complete, but we can say that
it's possible there were two different issues causing
difficulties.

At 13:10, one of ISPnet's core switches at 85 Tenth Ave
started misbehaving, leading to a switch reload at 13:20.
The 13:10 event occured after an intentional config change
we made on the switch which appears to have triggered a bug
in cisco's IOS software.

Telehouse reported that around 13:35 they started seeing
"flapping" of bgp sessions across the NYIIX switches.  The
NYIIX is a system of interconnected switches operated by
Telehouse and used as a public peering point.  ISPnet
connects to the NYIIX at 85 Tenth Ave, with bgp-sessions to
approximately 100 peers.

Both problems (the ISPnet switch reload and the Telehouse
NYIIX issues) affected most ISPnet customers.

While the ISPnet switch that had the problem is not the
ISPnet switch that connects to the NYIIX, we asked Telehouse
to investigate whether the two issues could be linked.  They
have opened a ticket with their switch vendor but, as of
this time, Telehouse hasn't provided us with any feedback.

We saw full stable service restored to all ISPnet customers
(with two exceptions) by 14:10.  At 14:45, Telehouse
reported that the NYIIX switch fabric was stable.

We apologize to all our customers for the problems caused by
yesterday's events.  We will investigate (in a lab
environment) the particular changes that affected our core
switch so we can avoid the same problem in the future.

--
Bob Tinkelman <bob at ispnet.net>
ISPnet, Inc. http://www.ispnet.net
+1 (718) 464-4747  office
+1 (800) 806-NETS  toll free


More information about the ispnet-announce mailing list