Archive for the ‘Computing’ Category

“I’ll build a new server; it’s got to be easier than patching up the old one…”

Thursday, June 19th, 2008 in Computing, FreeBSD

A few weeks back I started having problems with my file server at home. This machine is fairly important to us; it holds all our photos, music and other files. For years I’ve been bodging it together with various old parts scavenged from other machines and some new parts when needed. But, once again, it’d started to break. Disks were dropping out of the RAID unexpectedly, and the replacements were refusing to rebuild. Unsure of where the problem was I uttered the fateful words “I’ll build a new server; it’s got to be easier than patching up the old one…”. My colleagues were sceptical, but I ploughed on anyway. Maybe I should have listened to them?

It took the best part of a week to work out what I wanted. There were so many decisions to make: which RAID card, disks, motherboard, CPU, RAM, case, etc. I researched each one as much as I could, but there’s a bottomless pit of information on the Internet. Eventually I settled on a 3ware 9690SA RAID card with 4 Seagate ST31000340NS disks. The other bits were fairly decent to make sure the machine would have a good life, but not excessive.

The reason for choosing a hardware RAID solution over software RAID was simple – reliability. Now, I’m not knocking software RAID in principal (look at ZFS, for example), but the implementations for RAID 3-5 on FreeBSD aren’t great (yes, it has ZFS, but I’m not in the mood for trailblazing this time round). I wanted to stick with FreeBSD so I opted for the well known reliablity that 3ware cards provide. And the 5 year warranty on Seagate disks made them an attractive choice.

The purchasing process wasn’t as simple as it could have been. I ordered from dabs.com, span.com (they specialise in storage stuff) and overclockers.co.uk. I’ve used all three companies before, so I wasn’t too concerned about problems. The bulk of it was ordered from Dabs – it looks like they’re back to being competitive on prices. The problems started almost immediately; Dabs held my order over an issue with my address. It’s happened once before and that put me off Dabs for some time, but we use them all the time at work, so I had hopes they’d be better now. It took a working day to resolve that issue… and then next day I get an email to say my credit card company has declined the order. On the phone to them and through to their security department; seems buying lots of stuff online is unusual… not for me it isn’t. Anyway, that was resolved and then I had more waiting for Dabs to try the transaction again. Eventually I got impatient and tried their online chat thing and the matter was resolved in minutes. Meanwhile the parts ordered from the other two suppliers were sitting on my desk.

Eventually it all arrived and I took it home. Ruth wasn’t overly impressed when I cleared off the dining room table and covered it in computer parts, but I assured her it wasn’t for long. That was a couple of weeks ago – it’s all still there.

I spent a weekend putting things together and testing it all out. I routed every cable neatly and tied them carefully to the case to ensure nothing moved about. Airflow was good and the additional fans in the case were doing a great job of keeping things cool (not sure about their blue LEDs though…). All was looking good and I was enjoying the process.

Then I tried to use the RAID card. The first problems hit when I turned on the motherboard’s RAID, which I’d intended to use to mirror the system disks, whilst the 9690SA was plugged in. I’d gone for a Asus P5E3 and expected both RAID systems to work happily together, but sadly I was wrong. I experienced unusual problems such as the machine hanging on the Intel Matrix Storage (the onboard RAID) screen and disks randomly disappearing from both arrays. In the end I gave up and turned off the onboard RAID; I figured the FreeBSD RAID 1 (gmirror) is pretty solid, so I’d use that.

Thinking I’d got over the worst of the problems I moved on to setting up the 9690SA. Things looked good for a while; the interface was clear and everything was easy to set up. It wasn’t until I started trying to put data on that I noticed problems. Here’s a snippet from the error log (largely for the benefit of Google):

E=0200 T=08:26:00 : Cable CRC error
SATA Device. port = 0x0
task file written out : cd dh ch cl sn sc ft
                      : 00 70 00 00 00 1200 00
  task file read back : st dh ch cl sn sc er
                      : 00 00 00 00 00 8441 00
E=0200 T=08:26:00 P=0h: Soft reset drive
E=0200 T=08:26:00 P=0h: exitCode = 1013
Port retry not allowed
E=0200 T=08:26:00 P=0h: Prepare for command retry
exitCode = 1013

At first I wasn’t sure what to make of this. Maybe it was the cable or connection, but on all four drives? It was a special 4-in-1 (SFF8087) cable, but it still seemed odd. I logged the case with 3ware’s technical support and got back a response suggesting I try another cable. Well, duh, I could have figured that myself. I was hoping they might be able to point out any other less obvious potential causes.

So, I purchased another cable. It took a couple of days to arrive and did absolutely nothing to resolve the problem. Sigh. At the same time as this was going on I had another problem – it’s only with hindsight that I know to separate the two:

E=0204 T=18:34:36     : Port timeout (ext)
SATA Device. port = 0x2
task file written out : cd dh ch cl sn sc ft
                      : 00 04 00 00 00 00 00
Send AEN (code, time): 0x9, 06/10/2008 18:34:36
Drive timeout detected
(EC:0x09, SK=0x04, ASC=0x00, ASCQ=0x00, SEV=01, Type=0x71)
phy=6
  task file read back : st dh ch cl sn sc er
                      : 00 00 00 00 00 00 00
E=0204 T=18:34:36 P=2h: Soft reset drive
E=0204 T=18:34:36 P=2 : Inserting Set UDMA command
E=0204 T=18:34:36 P=2h: Check power cycles, initial=40, current=40
E=0204 T=18:34:36 P=2h: exitCode = 1013
Port retry not allowed
E=0204 T=18:34:36 P=2h: Prepare for command retry
exitCode = 1013
E=0204 T=18:34:36 U=0 : Retrying command

These errors happened less frequently, but eventually caused I/O to hang and the controller to reset. Again I logged this with 3ware’s technical support and got back a bunch of not so helpful responses. They suggested moving the card in the machine, testing the disks, checking the power supply, and so on. All valid points, but what annoyed me was they could only ask me to check one at a time… and they could only reply to me once a day. Plus I’d already done everything they suggested. It took a week to go through this nonsense.

In the mean time I spent a lot of time experimenting, fiddling, and web searching. Eventually I found the following two pages, although it took me a while to realise their significance:

https://www.3ware.com/3warekb/article.aspx?id=15385
https://www.3ware.com/3warekb/article.aspx?id=15171

The first of the articles explicitly mentions my controller card and drives, so it seemed to be the right thing to do. But I had the SN04 firmware on my drives and they wanted me to apply AN05. I asked both 3ware and Seagate to clarify the differences, but neither gave satisfactory answers. Seagate managed to give me the SN05 firmware to try, but it didn’t help. In slight desperation, and without anyone giving me much help, I decided to take a punt on the AN05 firmware.

IT WORKED!

There was a lot of tension for the next few hours whilst I continued testing, but eventually I was satisfied that the AN05 firmware solved the problem. Later attempts to clarify with Seagate why SN05, which they gave me, didn’t work and AN05, which 3ware pointed me at, did work, got nowhere. Seagate support actually admitted that they basically don’t know.

So on to the next issue. The second article suggested limiting the speed of the drives to work around the drive timeout issue. It’s definately a workaround, but it was worth a shot. I’d already removed the jumpers from the drives that limited them to 1.5 Gb/s, and they were a nightmare to do – I’ve never seen such small and fiddly jumpers on a disk… it was completely unnecessary given the available space. This time I decided to do the limiting in the 9690SA’s software.

ONCE AGAIN, IT WORKED!

So at this point I’m happy. Things are looking good. That last fix is definately a workaround, and I’ve told 3ware they need to fix it. It’s a bug, and bugs need fixing. I’m now using the array to store my data on, it’s nice and quick (a 512MB write cache helps!), and I have plenty of space. And Ruth might get the dining room table back soon… assuming I can work out how to lift this massive machine (did I mention the case was quite big?).

But I’d like to finish this post with a rant. It turned out that the solutions to my problems were both in the 3ware knowledge base. Now maybe I should have searched harder initially, but it took me some time to find these articles. But more to the point, 3ware support should definately have known about these issues and should have directed me straight to them. I wasted a week of my time messing around with them, and I’m not happy about it. The card is great (apart from the aforementioned bug), but the support sucks. It will seriously make me think twice about going with 3ware again.

I hope this post will fill in the whole story to those I’ve been ranting at recently, and maybe it’ll help someone else on the Internet out if/when they hit the same problem. That’s assuming they can read this lengthy post in less time that in takes to figure out the solution themselves ;-) .

Good night.

  • Share/Bookmark

Connecting to an LDAP server using Kerberos authentication in Perl

Friday, January 18th, 2008 in Computing

It took me a while to figure this code out, and there seemed to be a lack of complete examples on the web to do exactly this, so I thought I’d document it.

I needed to connect to an LDAP server using a Kerberos principal for authentication from within a Perl script. This meant that it needed to do it without any external input, so it couldn’t rely on a password being entered or someone doing a kinit first.

The code is fairly simple. It basically gets the right credentials using a pre-initialised keytab and then sets up the relevant objects and uses them to bind to an LDAP server.

#!/usr/local/bin/perl -w    

# How to connect to an LDAP server using GSSAPI Kerberos auth.    

use strict;    

use Net::LDAP;
use Authen::SASL qw(Perl);
# This module makes doing the kinit much easier
use Authen::Krb5::Easy qw(kinit kdestroy kerror);    

# Location of the keytab which contains testuser's key
# exported in kadmin by: ktadd -k /tmp/test.keytab testuser
my $keytab = '/tmp/test.keytab';
# Where to store the credentials
my $ccache = '/tmp/test.ccache';    

$ENV{KRB5CCNAME} = $ccache;    

# Get credentials for testuser
kinit($keytab, 'testuser@CS.UKC.AC.UK') || die kerror();    

# Set up a SASL object
my $sasl = Authen::SASL->new(mechanism => 'GSSAPI') || die "$@";    

# Set up an LDAP connection
my $ldap = Net::LDAP->new('ldap.cs.kent.ac.uk') || die "$@";    

# Finally bind to LDAP using our SASL object
my $mesg = $ldap->bind(sasl => $sasl);    

# This should say "0 (Success)" if it worked
print "Message is ". $mesg->code ." (". $mesg->error .").\n";    

# Clear up the credentials
kdestroy();

Hopefully this will help someone else out. Comments welcome :-)

  • Share/Bookmark

IPv6 connectivity – changing brokers

Sunday, January 6th, 2008 in Computing

It’s been nearly 2 years since I intially set up my IPv6 connectivity, and back then I had some problems with the BT Exact IPv6 tunnel broker. Now it seems that without much notice the service has been taken down permanently, so I’ve just spent quite a few hours moving over to a new provider – SixXS.

My initial impression of SixXS is that it’s much more polished than the BT service was. They have many PoPs (although a tunnel is only associated with one nearby PoP), a decent website, and all the facilities that I need. They work on a credit based system which means to use a facility you need credits. You get some credits when you set up an account, and you gain more by running a reliable tunnel. It’s an interesting idea… it encourages you to look after your setup.

Handily I got some bonus credits for my work on some Open Source projects, so I got both my home network and my colo server setup in one go. The process was as good as identical to the BT service, so there were no real problems – just the tediousness of updating configs, DNS entries and firewalls.

So there we have it – SixXS++ :-)

  • Share/Bookmark

eBay “Customer Support”

Monday, November 26th, 2007 in Computing, General

Recently I changed my wife’s email address and user ID on eBay. It was pretty painless using their web interface… at least, that’s what I thought.

The problems came a couple of weeks later when she was still receiving solicited promotional material to her old email address. I figured it wouldn’t be that hard to find out why, so I filled in a web form asking if they could check things out. This was their first reply:

Since you have completed the change of address request, be assured that all the eBay emails are sent to your registered email address. The only possibility in this situation is that your ISP (AOL in your case) might have linked both the email addresses to your account. So, we’d suggest you to contact your ISP and confirm if this is the case. However, if this is not the case on their end, then you will need to send us an eBay email with the header.

Interesting. I had a word with her ISP, which was pretty easy given it’s me. Last time I checked I’m pretty sure I don’t run AOL either (thank goodness!). I took a look at the headers, and they look pretty conclusive to me (interesting bits only):

Received: from smfcamppool09.emailebay.com ([66.135.215.238]
	helo=smfitemap04.smf.ebay.com) ...
DomainKey-Signature: s=main; d=reply3.ebay.com; c=nofws; q=dns;
	b=NR1bQ5kTLijbb5Mc3TmFcKdB+BLWEb1YZvYiyvzns2iWz8iyi
	JVBCXP3ERh+lxAYiwwR3kbd94Zg3xyPvcW8CDscQaHYizuzh5vd
	59IOlVCKr1qwAYNvDHTmxMx5RL18;
From: "eBay" <eBay-INTL@reply3.ebay.com>
Subject: [her userid], knock his Christmas socks off this year
	with eBay
Received-SPF: pass (carrick.bishnet.net: domain of
	reply3.ebay.com designates 66.135.215.238 as permitted
	sender) client-ip=66.135.215.238;
	envelope-from=eBay-INTL.403108935.71560.0@reply3.ebay.com;
	helo=smfitemap04.smf.ebay.com;

So I sent that off to them and awaited their next reply. Here’s what they said:

Thank you for your reply. I understand that you are concerned about changing of the email address on eBay.

While checking your account status, I noticed that you have successfully changed your email address from ‘[old address]‘ to ‘[new address]‘ on Oct 27, 2007.

Your new email address ‘[new address]‘ is now enabled on eBay.

Looks like they’ve completely missed the point and have decided just to state the obvious instead. So, once again, I explain that the problem is that email is still going to the old address.

In comes the next reply:

I really want to help you resolve this issue because I know how important it is for you to have this matter settled. However, your message didn’t include the email header, which I need in order to take action.

Now it looks like they’re repeating themselves. Funny thing is, if you scroll down their email you’ll see they’ve quoted the last time I said I provied the headers. So, with a few rants about their inability to read the case history, I give them the entire email, headers and all, again.

And here’s were it starts getting really good. I had to read this a few times to believe they actually said it:

Thank you for writing back again regarding the unsolicited email you received. I’m sorry that this matter hasn’t yet been resolved.

I’ve checked the information you sent us and I can confirm that the email was not sent by eBay, and is not endorsed by eBay in any way. However, it appears to have been sent by another eBay member.

How to cut down on spam emails …

Now hold on a minute. For a start, I’ve never said it’s unsolicited, it’s just going to the wrong address. And now, to top it all off, they’re trying to say they never sent it and that another member did!

I took a few minutes to cool down before calmly asking them to explain how exactly they came to that conclusion. I also suggested that if they can’t answer my questions they should consider escalating the query to someone who can.

Finally I manage to make contact with one of the (I summise) 20% of their staff who know what they’re talking about:

Please understand that when you change your email address on eBay it will remain in our database for the next 30 days and once this period is over, you won’t receive any email at your old registered email address as our system releases that email address from the database.

It took a week and 10 emails for that conclusion to be reached. It’s not rocket science, is it?

All I can say is that I’m glad it wasn’t something important…

  • Share/Bookmark

CSProjects is unleashed (at last)

Thursday, October 18th, 2007 in Computing, Work

I started working on CSProjects quite a few months ago.

Problems started early on. I began by bringing our software up-to-date. This included Apache, Python, Subversion, Trac and mod_python. It took some time, but I didn’t experience any problems… until I tried to run them. Seemingly at random, but quite frequently, the Apache children would get a Bus Error. I googled around and discovered this was a fairly common problem, but none of the solutions (mostly involving library versions, particularly expat) seemed to make any difference.

After a few weeks of recompiling, stripping things down to the bare bones, turning on debugging and staring endlessly in to the output of gdb, I struck upon a solution. And annoyingly it wasn’t in any of the things I’d be staring it, but instead it appeared in the form of mod_wsgi. This wonderful piece of code does a similar job to mod_python, so I dropped it in and hoped for the best. Nope, it still crashed. But what saved me was the documentation – the author wrote, and I quote:

Do note though that some versions of the Subversion Python bindings apparently have problems when being used from within secondary Python sub interpreters rather than the main Python interpreter. The result of this will be strange Python exceptions or the Apache child processes could even crash.

To avoid such problems, the Trac application should be forced to run within the main Python interpreter. This can be done using the WSGIApplicationGroup directive with the value ‘%{GLOBAL}’.

This was precisely my problem. So I did as suggested and to much relief everything worked. And mod_wsgi a’int half good too… in my opinion it’s much better than mod_python.

At this point I had all the software working. So I took a month off. Literally.

When I returned I had to move all of our frickin’ servers. But after that I got back to CSProjects.

With the help of Adam Sampson I got down to the business of bringing these software packages together in to something we could offer our users. We did a lot of coding and a few weeks later the final CSProjects was published. Then we had to change it all and another week later CSProjects was published again. Today we launched it to our users and we already have a whole bunch of people using it. Which brings a satisfying end to a few months of work.

Oh, and the logo. I did that (mostly – I got a little bit of help). Sometimes the simple things work best…

CSProjects

  • Share/Bookmark

A new server and a new RAID setup

Friday, September 1st, 2006 in Computing, FreeBSD

So my current hosted server is getting a bit old. It’s not got enough RAM, and the disk in it is failing (yes, I did have RAID, more on that later). So it’s about time to get a replacement in.

The guys over at Netrino have just installed a new machine for me. I say new, but it’s not a brand spanking new bleeding edge state of the art all singing all dancing machine costing a million pounds. It’s just an Intel Celeron 2Ghz, with 1GB of RAM, and two 80GB hard disks. The main thing is the increased RAM, and two new (and hopefully working) disks.

Things didn’t get off to a good start on day 1 – they didn’t have a FreeBSD CD to hand. They “kindly” left me a USB dongle containing a variety of Linux installers, so I had a play with them in the hope that I could somehow bootstrap a FreeBSD install from one. After some googling I found the Depenguinator that claimed to do exactly what I needed. A few hours later I discovered it didn’t – probably because its not been updated for more recent FreeBSD versions.

On day 2 things got off to a better start – a FreeBSD 6.1 CD arrived in the CD-ROM drive and booted nicely. I had a quick play around to check everything – particularly the network card – worked, and thankfully it all did. Next came the installation.

One area I’ve had quite a few problems with on FreeBSD is the software RAID provision. You’ll see in one of my previous posts that I had some fatal problems with gvinum, and since then I’ve had other problems recovering a RAID 5 failure using it. Another alternative is ataraid, which worked fine for me up until FreeBSD 6. Since then I’ve not been able to get it to resync a failed disk properly – it hangs at 0% forever. So those two solutions are written off.

Other than hardware RAID this leaves me with a clear choice: gmirror. I’ve been using gmirror on another machine for some time now, and I’m pleased with the results. Following this guide it’s easy to apply it after installation, which is a definate selling point – I can’t stand solutions that require dumping and restoring. On my other machine its also had no problems resyncing, so another box ticked (or not ticked on the reasons-to-avoid list, actually).

I’m left wondering here how hard it would be to add support for gmirror (and maybe some of the other geom providers) to the FreeBSD sysinstall program. I experimented with setting up the mirror by hand before running the installer, but it failed to notice it. If this functionality could be added it’d be a real selling point for FreeBSD.

So, with the install complete, and the disks mirrored, I’m ready to move on to building and configuring. First up is updating the world and kernel, then installing all the software. I’ve not really figured out how I’ll copy everything across though…

  • Share/Bookmark

NFS Performance, continued

Thursday, July 13th, 2006 in Computing, Work

Back in May I wrote about the performance problems we were having with our new NFS based user filestore. It’s been a while since then, and the problems have continued. We have noticed that it appears to be load related – not just the network, but also the machine. This suggests that our theories about IPsec causing the slow down may be correct.

Our original plan was to try a private network which would remove the need for IPsec and also remove any latency added by routing the traffic between our subnets. This still seemed like a good plan, so I asked around and another department kindly lent us a brand new gigabit switch. We’ve connected this to one of our NFS clients and to the cluster node that’s currently running our filestore.

So far we’ve noticed some serious performance boosts. There’s only a few of us using it, so it could just be that it’s a lightly loaded connection – time will tell on that one. The bottom line is that it seems to be quicker than the IPsec connection ever was, so hopefully we’re on to a winner. We’ve also got a few staff testing it out, and their responses have been positive so far.

The next step after this testing period is to look at the costs of doing this properly with our own equipment. One of the key things we’ve been doing recently is increasing the redundancy of our systems, so it’d be fairly daft to do this with just one switch. We’d need at least two, with every cluster node connected to both, and every client that we want optimum performance on connected to both. Obviously there’ll be other clients that are less important and they can continue to use the existing infrastructure.

Of course, I’ve got absolutely no idea where we’ll put these switches, or how we’ll wire them in – things are pretty tight in our racks at the moment. Suppose there’s got to be a challenge somewhere :-)

My only worry with all this is what we’ll do if it doesn’t work. I don’t have any other ideas that’d make it go quicker – to be frank, you can’t really get any quicker than a directly connected switch. Lets hope we don’t have to worry about it.

  • Share/Bookmark

slimp3slave – finally working

Thursday, July 13th, 2006 in Computing

In my last post about setting up a slimserver I said that I was having trouble getting slimp3slave working:

Whilst it doesn’t appear to have any problems, I didn’t have much success with the players. mpg123 got confused by the stream, and madplay kept skipping the beginnings of tracks when I hit next on the server. This could be a problem with slimp3slave – I’ll need to investigate.

The problem did turn out to be with slimp3slave. I discovered that when skipping a track the stream is restarted which caused slimp3slave to start up a new player. The problem was this is that it did it before the old one had exited, thus causing the new one to die because it couldn’t access the sound device. There’s another bug here – it didn’t notice the new player dying and tried to write to it, which resulted in lots of SIGPIPE messages.

So I looked at the code for shutting down the player and noticed that it wasn’t using the right close function. This change fixed it:

RCS file: /home/pdw/vcvs/repos/slimp3slave/slimp3slave.c,v
retrieving revision 1.10
diff -u -r1.10 slimp3slave.c
— slimp3slave.c 12 Apr 2004 08:04:52 -0000 1.10
+++ slimp3slave.c 22 Jun 2006 21:21:31 -0000
@@ -394,7 +394,7 @@
}

void output_pipe_close(FILE * f) {
- fclose(f);
+ pclose(f);
}

unsigned long curses2ir(int key) {

I have sent this change to the author, so maybe it’ll get integrated.

Now I have a working streaming system. The only remaining problem seems to be the wireless networking to the client dropping out from time to time – a wire would fix that one :-)

And in the past couple of days I’ve even got a client (softsqueeze on Windows this time) running at work that’s streaming the music over my ADSL connection. Very handy!

  • Share/Bookmark

Streaming music around my home

Sunday, June 18th, 2006 in Computing

This weekend I thought I’d have a go at setting up SlimServer, an application that streams music to various audio devices. It’s primarily design to work with the SqueezeBox, a hardware device that can wirelessly stream the music, but there are various bits of software that can use it too.

The server itself was a doddle to set up. There is a FreeBSD port of it that does all the work for you. Once installed you just run it then browse to port 9000 on the server to access it. It didn’t take long to index my MP3 collection, and then it was ready to go.

I started out by using Winamp to stream the music. That worked absolutely fine, but I wanted something I could run under FreeBSD. The idea was that I’d find an old piece of hardware that I could run headlessly in the lounge hooked up to our speaker system.

Various applications existed to do the job. The obvious choice is SoftSqueeze, a virtual SqueezeBox application provided in the SlimServer distribution. It’s java based, which makes it mildly more effort to get going on FreeBSD, but works pretty well. It has a headless mode too, which is ideal for what I want.

Next up there’s slimp3slave, which is a small C application that does the same job as SoftSqueeze. It uses an external application such as mpg123 or madplay to actually play the audio, so it’s a fairly small app. Whilst it doesn’t appear to have any problems, I didn’t have much success with the players. mpg123 got confused by the stream, and madplay kept skipping the beginnings of tracks when I hit next on the server. This could be a problem with slimp3slave – I’ll need to investigate.

Unfortunately SoftSqueeze isn’t faultless either. Whilst it plays fine, if you leave it idle for a long period of time something goes wrong and it refuses to play. I need to debug this further – it’s likely a FreeBSD related issue, since I know it works for other people on other platforms.

To finish the installation off I installed the MusicIP listener tool which can generate playlists based on any track you give it. When first started it does a scan of all your collection, which takes forever, and builds a database of information about tracks. It then uses this to match similar tracks together. It’s working surprisingly well so far.

The only problem with the MusicIP tool is that it’s a linux binary. This meant activating linux emulation on the server and installing the base linux port. To use the client application (not actually needed, though, since you can do everything through the server) you need java too – a linux one. I only had success with the blackdown 1.4 version.

This lot is controlled via a web browser. This is fine if you’re sitting at a PC and streaming to an application on your machine. But what about the headless machine? Fortunately I have an iPAQ with wireless, and it does the job of a remote control perfectly.

Longer term, if I can’t solve the problems with SoftSqueeze or slimp3slave I’ll consider buying a SqueezeBox. They’re expensive though; £170 for a wired version and £210 for a wireless one. Until I can justify the expense (ie. I’d acually use it) I won’t be forking out for one, though.

  • Share/Bookmark

I don’t have a good history with FreeBSD RAID…

Saturday, June 3rd, 2006 in Computing, FreeBSD

I’ve never got on well with software RAID systems on FreeBSD. I’ve tried gvinum (previously I used vinum), gmirror, and ataraid, all with varying degrees of success. The latest machine I built is using gmirror, and so far I’m happy.

However, over the past few days I’ve been having problems with a system I built a couple of years ago. It originally used vinum on FreeBSD 5.2.1, but I recently upgraded it to 5.5 and switched to gvinum. A week or so ago I noticed that the second disk in the mirror was marked stale – I guessed it was an artifact of the upgrade to 5.5. So on Tuesday I decided to resync it.

It went fine to start with, until syncing one partition produced a disk read error. This marked the whole original disk as bad, and I’d only half synced to the second disk. Thinking back I knew this disk had an error on, and I’d fully intended to replace it. Shame I didn’t do it at the time. Next I rebooted the machine to recover the disk from dead to stale, so I could force it back online. This is where the problems started.

GEOM_VINUM: subdisk swap.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk root.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk var.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk usr.p1.s0 state change: down -> stale

That’s what welcomed me during bootup. Not too bad I hear you say? Well, that’s all I saw after that – it didn’t boot any further. I tried various things such as unloading the geom_vinum module, booting single user, booting the other disk, pulling one disk, but nothing worked.

In desperation I booted an older kernel. It worked! Well, when I say worked, I mean it booted past this point and asked me for a root partition – but at least I could work with that. It wasn’t immediately obvious why it had worked; my theory is that it wasn’t the fact it was an older kernel, but that it was a different kernel version to the modules on the disk, making it refuse to load the geom_vinum module.

So after getting things running again I decided to update to 6.1. I figured help would be more limited when running 5.5, and I could see changes had gone in to gvinum in 6.1. After a few hours this was done, but the result was the same; I booted to single user, typed “gvinum start”, and got the same message. Oddly this time the machine wasn’t entirely dead – I could still reboot it. But maybe this was because I’d launched it manually.

Regardless of the cause of the problem I’m now stuck. I’ve got everything running off one disk fine, but I can’t get the RAID going. The only possibility I can see is redoing the RAID configuration, but to do this I’ll need to blast the existing config off the disks, and I’m nervous about that.

The other option I’m considering is replacing the machine and starting again (it’s getting old now anyway). Maybe this time I’ll go for a hardware RAID solution, though :-)

  • Share/Bookmark