Down and Up again

That was cheery!

Yesterday, one of my HDs started spewing errors, and took my server's /home directory down. That wouldn't have been too bad if I had followed my own advice and had complete backups. I could have installed a new drive, and restored everything from backups. But, no, I hadn't been smart; I had no backups for the /home directory.

Further, since /home sits on the same HD as the rest of the system, a down HD meant a down system. Arrgh!!

So, down it came, at about noon yesterday.

I've spent the last 30 hours rebuilding the system. I've completely lost /home, and there's nothing I can do about that. Fortunately, there wasn't anything important on it. The rest of the system is back in working order.

I replaced the dying HD with a spare 60Gb drive, partitioned it, and installed a fresh copy of Slackware Linux 12.1. To that, I applied all the outstanding patches, and removed any package that the original system didn't have.

Next, I restored /usr/local and reinstalled all my "local" packages.

I duplicated my original configuration, all changed scripts, users and groups, databases, mail services, webservices, and everything else that is exposed on the server.

Now, it's back in working order.


I knew that this day was coming. The machine is old and cantankerous. But, I've put off a hardware upgrade for various reasons.

I guess I can't put it off any longer. Tomorrow, I go buy a new system, and start seeding it with Slackware 13. In a couple of weeks or so, I'll swap out the old system and swap in the new one. And then we'll be stable once more.

Comments

I've got my new machine on order. It won't be a "high end" system; I have no need for MIPS and GFLOPS and MFPS. Instead, it will have a workhorse Athlon II X4 630 processor, 4Gb of RAM, a 1 Tb SATA hard drive, two 10/100 ethernet NICs, a DVD Writer, and the usual plethora of USB, SD/MMC, and Firewire connections.

They (National Computers, in Brampton) tell me it will be ready for me, tomorrow.

I received the new system on Tuesday, last week, and promptly started setting it up. I connected the system to my LAN, and installed Slackware 13 on it. I named it "Dante" because it, with my guidance, would travel through hell and purgatory to heaven, and become a better server for it.

Over the rest of the week, I migrated my existing server setup to Dante, putting in place everything that I could. For those facilities that could not be activated immediately (i.e. my DNS server, web server, mail server, etc.) I built activation scripts. I cloned all the user data, and built a duplicate of my server on Dante.

Today, I backed up my server and took it down. I used the backups to restore any changed user data on to Dante, and then shut it down as well. It took about four hours to back up and remove my old server, install Dante in it's place, and restore all the user data. The restore didn't go perfectly, but I expected that. I didn't lose any data; I just had to correct some restore scripts and rerun the affected restores.

Finally, first boot as my new server, and Dante worked great. One minor glitch in my webserver setup (a missing / caused an Alias to fail), but that was easily fixed.

So, now I've got a brand new system running a fully patched and up-to-date Slackware 13, and my hardware troubles should be behind me.