I’ve never got on well with software RAID systems on FreeBSD. I’ve tried gvinum (previously I used vinum), gmirror, and ataraid, all with varying degrees of success. The latest machine I built is using gmirror, and so far I’m happy.
However, over the past few days I’ve been having problems with a system I built a couple of years ago. It originally used vinum on FreeBSD 5.2.1, but I recently upgraded it to 5.5 and switched to gvinum. A week or so ago I noticed that the second disk in the mirror was marked stale – I guessed it was an artifact of the upgrade to 5.5. So on Tuesday I decided to resync it.
It went fine to start with, until syncing one partition produced a disk read error. This marked the whole original disk as bad, and I’d only half synced to the second disk. Thinking back I knew this disk had an error on, and I’d fully intended to replace it. Shame I didn’t do it at the time. Next I rebooted the machine to recover the disk from dead to stale, so I could force it back online. This is where the problems started.
GEOM_VINUM: subdisk swap.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk root.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk var.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk usr.p1.s0 state change: down -> stale
That’s what welcomed me during bootup. Not too bad I hear you say? Well, that’s all I saw after that – it didn’t boot any further. I tried various things such as unloading the geom_vinum module, booting single user, booting the other disk, pulling one disk, but nothing worked.
In desperation I booted an older kernel. It worked! Well, when I say worked, I mean it booted past this point and asked me for a root partition – but at least I could work with that. It wasn’t immediately obvious why it had worked; my theory is that it wasn’t the fact it was an older kernel, but that it was a different kernel version to the modules on the disk, making it refuse to load the geom_vinum module.
So after getting things running again I decided to update to 6.1. I figured help would be more limited when running 5.5, and I could see changes had gone in to gvinum in 6.1. After a few hours this was done, but the result was the same; I booted to single user, typed “gvinum start”, and got the same message. Oddly this time the machine wasn’t entirely dead – I could still reboot it. But maybe this was because I’d launched it manually.
Regardless of the cause of the problem I’m now stuck. I’ve got everything running off one disk fine, but I can’t get the RAID going. The only possibility I can see is redoing the RAID configuration, but to do this I’ll need to blast the existing config off the disks, and I’m nervous about that.
The other option I’m considering is replacing the machine and starting again (it’s getting old now anyway). Maybe this time I’ll go for a hardware RAID solution, though
Related posts:
- A new server and a new RAID setup So my current hosted server is getting a bit old. It’s not got enough RAM, and the disk in it is failing (yes, I did have RAID, more on that later). So it’s about time to get a replacement in. The guys over at Netrino have just installed a new machine for me. I say new, [...]...
- Upgrading Debian If you’ve been following my blog you’ll know that I’ve been working on a new filestore project at work for a while now. After getting things working nicely on our Solaris machines, and finally moving my home directory over, I decided to tackle our Debian server. It quickly became apparent that I’d need to upgrade [...]...
- Upgrading from FreeBSD 5.2 to 5.5 (RELENG_5) I’ve been putting off upgrading my remotely hosted server from FreeBSD 5.2.1 for a while now, but after I started getting random problems I decided I had to move forward. To start with there were a few reasons putting me off doing the upgrade: From version 5.3 of FreeBSD vinum was pretty much broken, so I’d need [...]...
- Router rebuild (or, an excuse to play with IPv6?) So recently my router decided it didn’t want to whir its fans anymore and consequently gave up on life. It’s a dual CPU machine and both CPU fans had managed to wedge. After fixing them and getting things running again I heard klunking noises coming from the front of the case; one of the disks [...]...
- Erm, whoops? I’d finally finished migrating everything off the old myrtle disk arrays, so I was feeling quite pleased. I’d just unplugged the last array from myrtle and plugged it in to the test machine for wiping. Then I tried to log in to the machine room SunRay, but strangely it didn’t work. I checked the console logs for [...]...
I have to be honest. Were I to need a server machine, I would probably run Solaris on it, if only because I know that Disksuite (or whatever its called now) works and works well. Other things in Solaris my be a pain in the bum, but at least it does RAID very well indeed.