Pelican Parts Forums - View Single Post

KFC911 · 11-20-2025, 12:13 AM

Quote:

Originally Posted by stealthn

It always surprises me when platforms this big cannot design their architecture properly so that a simple change can take their entire environment down…

It doesn't surprise me ... at all, and it usually isn't a design issue. My "toughest" network outages were extremely complex and due to unforseen "stupidity" by somebody ... not a design issue.

EVERYBODY knows you don't mix routing protocols (or static routes, etc.) in a complex network ... or else.

Then when some "rookie" does something in a remote location.... it immediately CRASHES every single backbone million $ router .... all at once ... no data to look either

.

I finally captured it on the BIG BLUE box .... and after 18 months of chronic outages I and a Cisco guy from RTP (not the average CEs we had on site every day) figured it out one evening in about an hour after I rubbed his nose in the data

.

Just one of many .... I "knew" what was going on for months... based upon observation, knowledge, and intuition. Getting to the cause was a head-scratcher ... until I captued an OSPF trace of the unpredictable, random, total network outages.

When something like a DNS server is "sickly", but not totally down ... it's not a design issue and will cripple EVERYTHING ....

97.648% of "my" network outages ... were DNS issues .... give or take

.

Don't miss it, but I loved the challenge of solving complex system/network issues/outages in a complex environment..... on stuff that I designed

.