Hello folks. As you will no doubt have noticed, GeekPlanetOnline has suffered from a lot of down time in these past three weeks. Since we've always been an open community, I'd like to be honest about the reasons for this, and explain what we're doing to make sure that it never happens again.
The issues were caused by corruption in our server's file-system and problems with the virtulisation software that keeps it running. These issues in turn were caused by Server Gremlins. Yes, Server Gremlins. Our hosting company, Fasthosts, can't actually provide an explanation for why their entire VPS server network developed these issues three weeks ago, so I've decided to make up an entertaining one rather than bullshit anybody. Server Gremlins are microscopic in size, and look very similar to the crazy gangster mother from The Goonies, except they're shorter, bright purple and have testicles instead of ears. They like eating file systems, which is their natural food source, and we had a juicy one, so there you are.
Our most recent run of problems, which started Monday, was due to the corruption spreading and destroying all of our data, making it unrecoverable. As a result, we have had to return to our most recent database backup (itself ten days old because our automation wasn't in place yet) and restore the Podcast feeds from even older backups - which, luckily, were still in place at our old hosting package, allowing us to copy over everything published up until 20/08/2010. File by file, article by article, we were able to reconstruct the site and the feeds, re-uploading as many of the post-20/08/10 podcasts as I had in my iTunes folder (luckily I have so much pride in our site that I subscribe to all of our shows!). There are still a couple missing here and there, but we're working on it, and by and large we're back.
So what are we going to do to prevent this from happening again? Well, first and foremost we now have an automated off-site backup which runs daily, allowing us to recover in a couple of hours or less should we suffer any downtime in future. This data is secure and integrity is guaranteed by the provider, so we're pretty satisfied that we're guarded. In addition (and perhaps most importantly), the incomparable Nik Butler has offered to come on board as our SysAdmin and maintain our server for us, working with me (whose responsibility it ultimately is) to keep things running smoothly. I'd like to thank Nik for his kind help, which he's providing free of charge, and I encourage you all to say nice things to him on Twitter.
Now... as I'm sure you will agree, a lack of an explanation from our hosting company is pretty poor customer service, just as the downtime we've suffered as a result is pretty poor customer service for our visitors and podcasters. Nik and I both feel that these issues have constituted a breach of our Service Level Agreement (SLA) with Fasthosts, and are sniffing around for a new provider. If and when a move becomes viable, our backups mean that it should be fairly seamless for our users and volunteers.