Archive for the ‘Work’ Category

Escaping for a while

Sunday, April 2nd, 2006 in General, Work

In an attempt to allow my mind to rest from work-related matters I’m heading off to Cornwall for a couple of weeks.

I’ll be back for the Easter weekend when I’ll be shutting down all our systems at work for a campus-wide power shutdown.

Upgrading Debian

Tuesday, March 28th, 2006 in Computing, Work

If you’ve been following my blog you’ll know that I’ve been working on a new filestore project at work for a while now. After getting things working nicely on our Solaris machines, and finally moving my home directory over, I decided to tackle our Debian server. It quickly became apparent that I’d need to upgrade the machine, which was running Woody with a 2.4 kernel, to get to a decent IPsec and autofs setup.

Now, I’m not a Linux user, let alone a Debian one. So this was a new experience for me. After a quick nose around online, and with a few helpful pointers, I found some useful instructions on how to upgrade. It boils down to a fairly simple process;

  1. Make sure the system is running the latest Woody updates.
  2. Modify apt sources.list file to change woody to sarge.
  3. Run apt-get update.
  4. Install/update aptitude.
  5. Run aptitude -f --with-recommends dist-upgrade to do the full upgrade.

Then it’s just a case of fixing up any conflicting files and changes, and you’re done. I had to remove our backup software (lgtoclnt) and re-add it though, because it messed with the X packages.

I decided at this point to make sure Sarge worked before looking at the kernel. So I rebooted the system. I waited. And I waited some more. The console showed that it had gone through the BIOS and RAID POST, but nothing else. A brief trip back to the machine room showed a scary looking “LI” message, which I knew meant lilo wasn’t working.

At this point I consulted some friends who explained what I needed to do. A short while later, and with a freshly burnt boot CD, I had the system back up and running. To reinstall lilo I’d booted the CD up to the point where it loaded the aacraid drivers, switched to another terminal, mounted my root parition, chrooted, and run lilo.

By this point I’m starting to grumble about Linux/Debian being stupid. But, I move on. I discover that I’m also going to need to upgrade to 2.6 if I’m going to get IPsec support. After a short while of looking at rebuilding kernels, and boggling at the myriad of build options available, I decide to apt-get install kernel-image-2.6. That can’t be too hard, can it? A few moments later I’m left staring at an Oops message referring to a “kernel NULL point deference” which appears to have come from the install running dd.

Nasty. Anyway, to cut a long story short I tweaked the postinst script to stop it running dd, and that allowed me to get the kernel installed. Surprisingly it worked first time, but I did have to fix the modules list afterwards to silence some error messages.

Now a few hours later, and after discovering the difference between autofs4 and the Solaris automounter, I now have a working system. But I’m left wondering why I’d really want to be using Debian at all.

Now what? It’s too scary to use…

Saturday, March 25th, 2006 in Work

Its been months in the making, but it’s finally done. We have our new filestore ready to go. There’s still plenty to do, like rolling it out for the teaching machines and web filestore, but at least we’ve got the main part done.

So why has it taken so long? I spent a long time researching and testing the technologies involved. For example, choosing the file system was tricky. UFS doesn’t work well on large (>2TB) file systems, and VxFS doesn’t work with NFS and Quotas. I managed to solve that one by fixing the quota issue with VxFS. There was also the issue of how we backup this quantity of filestore, and working out how we’d make it available from the cluster to the user machines. In the end we opted for a single filesystem split in to chunks on the server side for backups and used the automounter to make these divisions transparent to the end users.

The other time consuming factor was the software development stage. We have automated systems for creating users on machines, so I needed to integrate this with the new filestore. This required writing code to facilitate the creation of directories, setting up of quotas, and automount map building.

Anyway, I’ve written about this before. So now it’s done what do we do next? The logical step is to test it on myself and/or the rest of the systems group. Personally I’m in of favour testing it on everyone else first, but that doesn’t seem fair :-)

The question is, am I brave enough to actually use it?

Why I absolutely hate spam

Tuesday, March 21st, 2006 in Computing, Work

If there’s one thing that drives my completely insane in the modern world of computing it’s spam. It consumes my time, day after day, and devours the resources of our mail systems. In my own mailbox I get a few hundred spam messages a day, most of which I’ll never even see, let alone read. Thankfully most of these are filtered, but there’s still at least 20+ which I have to manually deal with every morning.

At work the mail systems for the Computer Science department are processing around 20,000 incoming email messages every day. A remakable 61% of these are spam, which is quite an increase from 49% a year ago. We run two mail hubs to process the incoming email which means we’ve effectively had to buy and run one server just for processing the spam email. I don’t even want to start on the amount of time spent dealing with spam messages that make it through to our helpdesk systems.

Ever noticed how spam email comes from rather an ecletic selection of email addresses? Has one of those addresses ever been yours? If there’s one type of email even more annoying that spam it’s bounces generated as a result of spam, sometimes thousands of them. You’ve suddenly become an unwilling victim of spam. Your address abused, and maybe even your name tarnished. What gives spammers the right to do this? At least SPF and similar technologies go some way to preventing this.

And as if spam email wasn’t enough we now see it creeping in to many other Internet based systems. How long until there’s a spam comment on this weblog? Or a stack of spam referrer entries in my apache logs (and consequently my statistics)? Or until I receive the next random message on one of my messenger services?

Whilst I’m ranting, another thing I can’t stand are those pages of junk links that appear when you try and google for something, particularly if it’s a fairly common term. Thankfully google is trying to deal with that, but it’ll be a neverending battle.

It seems in the non-Internet world we can easily regulate junk messages. We used to get a fair amount of sales telephone calls and general junk mail through the front door. Within weeks of registering with the Mail Preference Service and the Telephone Preference Service these have completely stopped. I’m not naive enough to believe this could be done with the Internet, but it helps put things in to perspective.

One of these days I’m going to get sick of the battle and just say “screw ‘em all” and unplug my ADSL modem. After all, people keep telling me I should try reading more books.

Impending doom (for our filesystems, anyway)

Friday, March 17th, 2006 in Work

Over the past year or so the space usage on our research and web filesystems has pretty much doubled to the point where we’re dangerously close to running out of space. There’s currently about 1TiB of filestore available of which less than 10% remains unused.

Teaching filestore, however, has barely grown at all during the last year. I attribute this primarily to quota control, but also to the regular turnover of undergraduate students.

Fortunately we saw this problem arising quite a while ago, so we’ve had time to purchase new storage and infrastructure that should alleviate this problem and make it easier for us to expand the storage availability in the future.

Our new system consists of a pair of Sun StorEDGE 3511 arrays attached by fibre channel to our existing Veritas cluster. We’ll use VxFS for the filesystems, which could lead to some interesting new technologies like filesystem checkpointing; we could have a mount point of /yesterday to allow users to retrieve their files as they were at some point during the previous day, thereby reducing the need for us to do tape restores. VxFS also works quite happily with large filesystems, unlike Solaris UFS. The only problem we’ve found is that VxFS doesn’t support hard linking directories, but that’s not something we commonly, if ever, want to do. We also initially had problems integrating VxFS with the Solaris quota system over NFS, but we soon fixed that the “fun” way :-)

Currently the research and teaching servers have locally attached filestore, which means if we have a hardware failure in one of the main servers we’re unable to get at user filestore from any other systems (without moving cables). The new solution provides NFS mounts of the filestore directly to each of the servers, which will allow files to be accessed via secondary machines should one of the main servers die. This is all part of our long term plan to increase the resilience of our systems.

One other interesting point to note is the use of the Solaris automounter to individually mount user home directories. Soon there’ll be mounts a bit like this all over the place:

resfs.cs:/home/cur/tdb 1.5T 54G 1.4T 4% /home/cur/tdb

Which will make things much more interesting!