|
Cars & Coffee Killer
Join Date: Sep 2004
Location: State of Failure
Posts: 32,246
|
So...A Coworker Screwed Up Big Today...
A little background...
I am an application developer, and am responsible for servicing the applications I have developed. Last March, I sent some SQL to our Data team. It had a mistake in it. Specifically, through the wonders of copy and paste, my update statement had an extra semicolon BEFORE the where clause. For those of you who don't know SQL, that means update every row in the table. It's bad. Well, someone on the data team noticed my mistake and corrected it in my request by putting corrected SQL below my bad SQL. It was run successfully in test environments. When the person who picked it up to run in production environments, they didn't notice the corrected SQL and ran the bad SQL, taking down a big application for about 2 hours. (BIG DEAL.) Fortunately the recovery was straight-forward and nothing was corrupted.
Because of this and a few other similar screw-ups, we were rewarded with more process. All SQL must be tested, reviewed three times, can't be run during the day. A big hassle as suddenly a significant amount of my work had to be done after 6 p.m.. The guy who actually made the screw up was "reassigned" (demoted) to a job.
A coworker friend that I have lunch with daily, who happens to be on the data team but is not the guy who ran the wrong SQL, was pretty merciless. He made it clear that he thought me an idiot and that his team member should have been fired. He made this known to his manager. Never mind that that the guideline in place at the time from his team was simply to tell them what they wanted done and they would write the SQL. I had quickly put some SQL together as a courtesy. (See above, now his team only accepts tested, reviewed SQL.) That's not to say I'm not culpable in the screw up, but I didn't have his "throw the book at 'em' attitude about the screw up.
So I come in to work this morning to a flurry of e-mails. Major outage. Recovery has been in process for 15 hours, should be done in 17 hours. Real bad stuff. My coworker friend sends us lunch buddies an e-mail saying that he has been working for 26 hours straight, and hopes to be able to go home at 9:00 a.m..
It turns out that he ran some SQL in production yesterday that caused the outage. Through the wonders of copy and paste, he had run an update statement that had an extra semicolon BEFORE the where clause. It didn't bring down the application, but stuff kept running causing further data corruption. He realized his mistake right away, but it took another two hours to identify the affected processes and stop them. It then took another 17 hours to formulate a backout plan, identify the corrupted records, and execute the backout plan.
I expect that he will avoid me for awhile. After the 9 months of (sometimes mean-spirited) teasing about my screw-up, I expect he feels pretty embarrassed. But that's not the worst part. Being a vocal advocate for strong punishment for people who make such mistakes, I imagine he will have some uncomfortable conversations with his boss. And his boss's boss. And probably his boss's boss's boss.
__________________
Some Porsches long ago...then a wankle...
5 liters of VVT fury now
-Chris
"There is freedom in risk, just as there is oppression in security."
|