I have 8 and a half working days to go before I leave the IT Service!!!1!! Here’s the story so far…
<fx style="whizzy" title="five months ago" />
| Google Recruiter |
Come and work for us! Apply! Apply! |
| Me |
No. I like Durham Just Too Much. |
| Recruiter |
Aw, g'wan, g'wan, g'wan |
| Me |
No. |
| Time passes |
| Recruiter |
Thanks for the conference. Cheerio. |
| Me |
Actually, I'd like to apply to work for you after all. |
| Recruiter |
Well-masked internal sigh |
<fx style="snazzy" title="two months ago" />
| Head of Department |
I know this will come as a shock to you but the role analysis we did says you're overpaid. |
| Me |
Thinks: Coo, my plan to resign tomorrow was timely. "Ummm, oh dear." |
| Head of Department |
You're taking the news well, I must say. |
| Me |
Thinks: Just you wait… "Thank you." |
<fx style="swooshy" title="the following day" />
| Head of Department |
I presume this is about the role evaluation… |
| Me |
Actually, no. I resign! |
| Head of Department |
Gasp! |
<fx style="floozy" title="back to the present" />
So here I am, having all but worked out my notice period. It would be a lie to say I'd loved every minute of my time here, but the overwhelming majority of the time has been some combination of fun, challenging, instructive and improving.
Some highlights:
The Cocos Islands Bug
The Cocos Islands form one of Australia’s two Indian Ocean Territories (the other being Christmas Island). But that’s not important right now.
Back in 2001, Solaris hosts running X were taking an age to get from the login page to the desktop. Eventually, it was tracked down to a problem with the font servers. We checked the font servers themselves, and they were fine. Then we noticed the font path that had been set. It went hostname.cc:7001. For some reason, all the Solaris hosts had switched from hostname.dur.ac.uk:7001 to .cc, our NIS domain.
Turns out: VeriSign decided to put a wildcard entry in for *.cc, and since it resolved, Solaris decided to use it.
Molten web server
The web server manager had gone on holiday, yet it was a Strategic Imperative to put the new (spanglly and dynamic) web site into production. It quickly became apparent that the web server wasn't up to the new job. A colleague and I found a minimal-impact solution: we put a caching reverse web proxy in front of the web server. This drastically reduced server load, almost to the original level.
Sadly, when the web server manager returned, he was decidedly unhappy about the new service architecture. However, it was only when we replaced the back-end web (Sun Ultra-60) server with three new Linux hosts, that he felt comfortable to remove it.
Exam marks
I had become the web server manager by this point in the narrative; the web servers were entirely adequate for all but the worst load. However, one Wednesday at the end of term, each server suffered a debilitating load spike, caused by 10% of the students' exam marks being released.
About 60% of them were due to be released the following Wednesday: an act which would have resulted in some impressively grumpy students if we didn't do something about it.
I was able to replicate the web servers onto a chroot inside each of the six extra servers we recruited as a scratch web cluster. (Just tar cf - / | ssh host sh -c 'cd /chroot && tar xpf -) and on the host, run chroot /chroot /etc/init.d/httpd start).
To spread the load between such a disparate set of hosts that we'd pulled together (from quad AMD64 hosts to Sun Ultra-10s) we used Apache 2.2’s load balancer, mod_proxy.
With this and some substantial code improvements on the Web Team’s part, we were able to serve just over 400 page impressions a second to a substantial proportion of the student population. This is around ten times more than the usual rate, and none of the hosts were under significant load this time.
Of course, these are the sort of everyday heroics in the life of a sysadmin that usually go unreported. C'est la vie!