Archive for the ‘FreeBSD’ Category

FreeBSD stuff

Wednesday, July 16th, 2008 in Computing, FreeBSD

I’ve done a bit of work on my FreeBSD ports lately. Firstly, after building my new server, I got round to upgrading from SlimServer to SqueezeCenter. This also meant sorting out ports for all the plugins I use. This didn’t take too long, and you can find them all over here. So far I’m liking SqueezeCenter, and I’d highly recommend it (and a SqueezeBox, of course).

I also maintain a port for a suite of software called KRoC. KRoC is written and maintained where I work, so apart from making it available to FreeBSD users I also have an interest in supporting the work done by our department. I’ve been waiting some time for a 1.5.x release of KRoC, but I finally got impatient. I automated the production of snapshots from their stable branch, and updated the port to build from that. I also run a FreeBSD 7 machine in their buildbot system to further test KRoC on my favourite operating system :-)

And in other FreeBSD news, I cast my vote in the FreeBSD Core elections. It’s hard to know who to vote for, but I gave their statements a good read and made a decision. Good luck to them all!

“I’ll build a new server; it’s got to be easier than patching up the old one…”

Thursday, June 19th, 2008 in Computing, FreeBSD

A few weeks back I started having problems with my file server at home. This machine is fairly important to us; it holds all our photos, music and other files. For years I’ve been bodging it together with various old parts scavenged from other machines and some new parts when needed. But, once again, it’d started to break. Disks were dropping out of the RAID unexpectedly, and the replacements were refusing to rebuild. Unsure of where the problem was I uttered the fateful words “I’ll build a new server; it’s got to be easier than patching up the old one…”. My colleagues were sceptical, but I ploughed on anyway. Maybe I should have listened to them?

It took the best part of a week to work out what I wanted. There were so many decisions to make: which RAID card, disks, motherboard, CPU, RAM, case, etc. I researched each one as much as I could, but there’s a bottomless pit of information on the Internet. Eventually I settled on a 3ware 9690SA RAID card with 4 Seagate ST31000340NS disks. The other bits were fairly decent to make sure the machine would have a good life, but not excessive.

The reason for choosing a hardware RAID solution over software RAID was simple - reliability. Now, I’m not knocking software RAID in principal (look at ZFS, for example), but the implementations for RAID 3-5 on FreeBSD aren’t great (yes, it has ZFS, but I’m not in the mood for trailblazing this time round). I wanted to stick with FreeBSD so I opted for the well known reliablity that 3ware cards provide. And the 5 year warranty on Seagate disks made them an attractive choice.

The purchasing process wasn’t as simple as it could have been. I ordered from dabs.com, span.com (they specialise in storage stuff) and overclockers.co.uk. I’ve used all three companies before, so I wasn’t too concerned about problems. The bulk of it was ordered from Dabs - it looks like they’re back to being competitive on prices. The problems started almost immediately; Dabs held my order over an issue with my address. It’s happened once before and that put me off Dabs for some time, but we use them all the time at work, so I had hopes they’d be better now. It took a working day to resolve that issue… and then next day I get an email to say my credit card company has declined the order. On the phone to them and through to their security department; seems buying lots of stuff online is unusual… not for me it isn’t. Anyway, that was resolved and then I had more waiting for Dabs to try the transaction again. Eventually I got impatient and tried their online chat thing and the matter was resolved in minutes. Meanwhile the parts ordered from the other two suppliers were sitting on my desk.

Eventually it all arrived and I took it home. Ruth wasn’t overly impressed when I cleared off the dining room table and covered it in computer parts, but I assured her it wasn’t for long. That was a couple of weeks ago - it’s all still there.

I spent a weekend putting things together and testing it all out. I routed every cable neatly and tied them carefully to the case to ensure nothing moved about. Airflow was good and the additional fans in the case were doing a great job of keeping things cool (not sure about their blue LEDs though…). All was looking good and I was enjoying the process.

Then I tried to use the RAID card. The first problems hit when I turned on the motherboard’s RAID, which I’d intended to use to mirror the system disks, whilst the 9690SA was plugged in. I’d gone for a Asus P5E3 and expected both RAID systems to work happily together, but sadly I was wrong. I experienced unusual problems such as the machine hanging on the Intel Matrix Storage (the onboard RAID) screen and disks randomly disappearing from both arrays. In the end I gave up and turned off the onboard RAID; I figured the FreeBSD RAID 1 (gmirror) is pretty solid, so I’d use that.

Thinking I’d got over the worst of the problems I moved on to setting up the 9690SA. Things looked good for a while; the interface was clear and everything was easy to set up. It wasn’t until I started trying to put data on that I noticed problems. Here’s a snippet from the error log (largely for the benefit of Google):

E=0200 T=08:26:00 : Cable CRC error
SATA Device. port = 0x0
task file written out : cd dh ch cl sn sc ft
                      : 00 70 00 00 00 1200 00
  task file read back : st dh ch cl sn sc er
                      : 00 00 00 00 00 8441 00
E=0200 T=08:26:00 P=0h: Soft reset drive
E=0200 T=08:26:00 P=0h: exitCode = 1013
Port retry not allowed
E=0200 T=08:26:00 P=0h: Prepare for command retry
exitCode = 1013

At first I wasn’t sure what to make of this. Maybe it was the cable or connection, but on all four drives? It was a special 4-in-1 (SFF8087) cable, but it still seemed odd. I logged the case with 3ware’s technical support and got back a response suggesting I try another cable. Well, duh, I could have figured that myself. I was hoping they might be able to point out any other less obvious potential causes.

So, I purchased another cable. It took a couple of days to arrive and did absolutely nothing to resolve the problem. Sigh. At the same time as this was going on I had another problem - it’s only with hindsight that I know to separate the two:

E=0204 T=18:34:36     : Port timeout (ext)
SATA Device. port = 0x2
task file written out : cd dh ch cl sn sc ft
                      : 00 04 00 00 00 00 00
Send AEN (code, time): 0x9, 06/10/2008 18:34:36
Drive timeout detected
(EC:0x09, SK=0x04, ASC=0x00, ASCQ=0x00, SEV=01, Type=0x71)
phy=6
  task file read back : st dh ch cl sn sc er
                      : 00 00 00 00 00 00 00
E=0204 T=18:34:36 P=2h: Soft reset drive
E=0204 T=18:34:36 P=2 : Inserting Set UDMA command
E=0204 T=18:34:36 P=2h: Check power cycles, initial=40, current=40
E=0204 T=18:34:36 P=2h: exitCode = 1013
Port retry not allowed
E=0204 T=18:34:36 P=2h: Prepare for command retry
exitCode = 1013
E=0204 T=18:34:36 U=0 : Retrying command

These errors happened less frequently, but eventually caused I/O to hang and the controller to reset. Again I logged this with 3ware’s technical support and got back a bunch of not so helpful responses. They suggested moving the card in the machine, testing the disks, checking the power supply, and so on. All valid points, but what annoyed me was they could only ask me to check one at a time… and they could only reply to me once a day. Plus I’d already done everything they suggested. It took a week to go through this nonsense.

In the mean time I spent a lot of time experimenting, fiddling, and web searching. Eventually I found the following two pages, although it took me a while to realise their significance:

http://www.3ware.com/KB/article.aspx?id=15385
http://www.3ware.com/kb/article.aspx?id=15171

The first of the articles explicitly mentions my controller card and drives, so it seemed to be the right thing to do. But I had the SN04 firmware on my drives and they wanted me to apply AN05. I asked both 3ware and Seagate to clarify the differences, but neither gave satisfactory answers. Seagate managed to give me the SN05 firmware to try, but it didn’t help. In slight desperation, and without anyone giving me much help, I decided to take a punt on the AN05 firmware.

IT WORKED!

There was a lot of tension for the next few hours whilst I continued testing, but eventually I was satisfied that the AN05 firmware solved the problem. Later attempts to clarify with Seagate why SN05, which they gave me, didn’t work and AN05, which 3ware pointed me at, did work, got nowhere. Seagate support actually admitted that they basically don’t know.

So on to the next issue. The second article suggested limiting the speed of the drives to work around the drive timeout issue. It’s definately a workaround, but it was worth a shot. I’d already removed the jumpers from the drives that limited them to 1.5 Gb/s, and they were a nightmare to do - I’ve never seen such small and fiddly jumpers on a disk… it was completely unnecessary given the available space. This time I decided to do the limiting in the 9690SA’s software.

ONCE AGAIN, IT WORKED!

So at this point I’m happy. Things are looking good. That last fix is definately a workaround, and I’ve told 3ware they need to fix it. It’s a bug, and bugs need fixing. I’m now using the array to store my data on, it’s nice and quick (a 512MB write cache helps!), and I have plenty of space. And Ruth might get the dining room table back soon… assuming I can work out how to lift this massive machine (did I mention the case was quite big?).

But I’d like to finish this post with a rant. It turned out that the solutions to my problems were both in the 3ware knowledge base. Now maybe I should have searched harder initially, but it took me some time to find these articles. But more to the point, 3ware support should definately have known about these issues and should have directed me straight to them. I wasted a week of my time messing around with them, and I’m not happy about it. The card is great (apart from the aforementioned bug), but the support sucks. It will seriously make me think twice about going with 3ware again.

I hope this post will fill in the whole story to those I’ve been ranting at recently, and maybe it’ll help someone else on the Internet out if/when they hit the same problem. That’s assuming they can read this lengthy post in less time that in takes to figure out the solution themselves ;-).

Good night.

A new server and a new RAID setup

Friday, September 1st, 2006 in Computing, FreeBSD

So my current hosted server is getting a bit old. It’s not got enough RAM, and the disk in it is failing (yes, I did have RAID, more on that later). So it’s about time to get a replacement in.

The guys over at Netrino have just installed a new machine for me. I say new, but it’s not a brand spanking new bleeding edge state of the art all singing all dancing machine costing a million pounds. It’s just an Intel Celeron 2Ghz, with 1GB of RAM, and two 80GB hard disks. The main thing is the increased RAM, and two new (and hopefully working) disks.

Things didn’t get off to a good start on day 1 - they didn’t have a FreeBSD CD to hand. They “kindly” left me a USB dongle containing a variety of Linux installers, so I had a play with them in the hope that I could somehow bootstrap a FreeBSD install from one. After some googling I found the Depenguinator that claimed to do exactly what I needed. A few hours later I discovered it didn’t - probably because its not been updated for more recent FreeBSD versions.

On day 2 things got off to a better start - a FreeBSD 6.1 CD arrived in the CD-ROM drive and booted nicely. I had a quick play around to check everything - particularly the network card - worked, and thankfully it all did. Next came the installation.

One area I’ve had quite a few problems with on FreeBSD is the software RAID provision. You’ll see in one of my previous posts that I had some fatal problems with gvinum, and since then I’ve had other problems recovering a RAID 5 failure using it. Another alternative is ataraid, which worked fine for me up until FreeBSD 6. Since then I’ve not been able to get it to resync a failed disk properly - it hangs at 0% forever. So those two solutions are written off.

Other than hardware RAID this leaves me with a clear choice: gmirror. I’ve been using gmirror on another machine for some time now, and I’m pleased with the results. Following this guide it’s easy to apply it after installation, which is a definate selling point - I can’t stand solutions that require dumping and restoring. On my other machine its also had no problems resyncing, so another box ticked (or not ticked on the reasons-to-avoid list, actually).

I’m left wondering here how hard it would be to add support for gmirror (and maybe some of the other geom providers) to the FreeBSD sysinstall program. I experimented with setting up the mirror by hand before running the installer, but it failed to notice it. If this functionality could be added it’d be a real selling point for FreeBSD.

So, with the install complete, and the disks mirrored, I’m ready to move on to building and configuring. First up is updating the world and kernel, then installing all the software. I’ve not really figured out how I’ll copy everything across though…

I don’t have a good history with FreeBSD RAID…

Saturday, June 3rd, 2006 in Computing, FreeBSD

I’ve never got on well with software RAID systems on FreeBSD. I’ve tried gvinum (previously I used vinum), gmirror, and ataraid, all with varying degrees of success. The latest machine I built is using gmirror, and so far I’m happy.

However, over the past few days I’ve been having problems with a system I built a couple of years ago. It originally used vinum on FreeBSD 5.2.1, but I recently upgraded it to 5.5 and switched to gvinum. A week or so ago I noticed that the second disk in the mirror was marked stale - I guessed it was an artifact of the upgrade to 5.5. So on Tuesday I decided to resync it.

It went fine to start with, until syncing one partition produced a disk read error. This marked the whole original disk as bad, and I’d only half synced to the second disk. Thinking back I knew this disk had an error on, and I’d fully intended to replace it. Shame I didn’t do it at the time. Next I rebooted the machine to recover the disk from dead to stale, so I could force it back online. This is where the problems started.

GEOM_VINUM: subdisk swap.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk root.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk var.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk usr.p1.s0 state change: down -> stale

That’s what welcomed me during bootup. Not too bad I hear you say? Well, that’s all I saw after that - it didn’t boot any further. I tried various things such as unloading the geom_vinum module, booting single user, booting the other disk, pulling one disk, but nothing worked.

In desperation I booted an older kernel. It worked! Well, when I say worked, I mean it booted past this point and asked me for a root partition - but at least I could work with that. It wasn’t immediately obvious why it had worked; my theory is that it wasn’t the fact it was an older kernel, but that it was a different kernel version to the modules on the disk, making it refuse to load the geom_vinum module.

So after getting things running again I decided to update to 6.1. I figured help would be more limited when running 5.5, and I could see changes had gone in to gvinum in 6.1. After a few hours this was done, but the result was the same; I booted to single user, typed “gvinum start”, and got the same message. Oddly this time the machine wasn’t entirely dead - I could still reboot it. But maybe this was because I’d launched it manually.

Regardless of the cause of the problem I’m now stuck. I’ve got everything running off one disk fine, but I can’t get the RAID going. The only possibility I can see is redoing the RAID configuration, but to do this I’ll need to blast the existing config off the disks, and I’m nervous about that.

The other option I’m considering is replacing the machine and starting again (it’s getting old now anyway). Maybe this time I’ll go for a hardware RAID solution, though :-)

Finding the time

Friday, April 21st, 2006 in FreeBSD

It seems I’ve never got enough time these days.

I’m a FreeBSD ports committer, but recently I’ve hardly done anything. All I’ve managed to do is keep my own ports updated. It’s quite frustrating because I want to be more involved. Then there’s the other projects like libstatgrab, they don’t even get a look in.

I blame the “day job”; if it’s not using my actual time it’s occuping my mind…

Router rebuild (or, an excuse to play with IPv6?)

Sunday, March 19th, 2006 in Computing, FreeBSD

So recently my router decided it didn’t want to whir its fans anymore and consequently gave up on life. It’s a dual CPU machine and both CPU fans had managed to wedge. After fixing them and getting things running again I heard klunking noises coming from the front of the case; one of the disks in the mirror had failed. I rapidly copied everything off the remaining disk, but didn’t have a spare to hand. Next morning the remaining disk went too. I wasn’t having much luck really, but on the positive side I did have a full backup.

After a day or so of fiddling with hardware I got something that resembled a working machine; I’d gone through a stack of various old disks by this point, most of which were dead. For a while I’d been pondering a fresh install for the machine, so this was the perfect opportunity. I decided to think about what I wanted it to do - this is what I came up with.

  1. Obviously needs ADSL connection (via rather old, but working, USB modem)
  2. I’d quite like a VPN connection to work for various (but not all) work servers
  3. IPv6 routing both internally and out to the world
  4. Internal NIC with my private and public address ranges
  5. A second internal NIC for my wireless network
  6. A better firewall setup (I decided on PF in the end)

Rather predictably I decided to do all this with FreeBSD. Nothing exciting about the install, other than I used gmirror this time. I’m still trying to find the best RAID solution on FreeBSD. So far I think gmirror has impressed me most compared to ataraid and gvinum.

So most of the things I wanted the router to do are things it did before. The new things were the VPN, IPv6 and PF. Those are what I’ll write about.

Setting up the VPN was straightforward. I installed the net/pptpclient port, bunged the sample config and my credentials in /etc/ppp/ppp.conf, and knocked up a quick RC script (let me know if you’d like a copy). I also added specific entries to ppp.conf for the hosts I wanted to route over the VPN, rather than letting it route whole subnets.

Something worth noting about ppp is the -unitN flag. Using this you can make sure ppp always uses the same numbered tun device. For example, my VPN connection has -unit1 ensuring it is always tun1. This makes firewall configuration a bit more manageable.

I’ve also knocked up a slightly better RC script for starting the ADSL connection (compared to the one provided with net/pppoa) that checks the line is up before returning. This allows subsequent startup scripts to be pretty much guaranteed access to the Internet. Again, let me know if you’d like a copy.

The next task was getting the IPv6 connection going. I decided to use the BT IPv6 Tunnel Broker service. In retrospect this might not have been the best choice; it’s been down for the last few days. I’ll let you know how I decide to proceed with that, but I’m reluctant to change because I’ll get a whole new address range. Getting this set up was pleasantly simple, particularly when compared with my past experiences trying to set up an IPv6 tunnel. Upon registering I was allocated an IP range and given a FreeBSD-compatible script to bring the link up. I decided to set things up more permanently using the excellent guide on the FreeBSD Diary website and the details from the broker’s script.

Surprisingly with the relevant tunneling, routing, and advertisments going setting up clients was a doddle. On my FreeBSD desktop machine I turned on ipv6_enable in rc.conf and it sprang to life (after a reboot). Even on our Windows systems it was as simple as running “ipv6 install”.

This finally left PF. Now that I’ve finished setting it up I can happily say it seems much nicer than IPFW, but I won’t pretend the journey was easy. It took a while to get my head around the differences, the main one being last-match versus first-match rules. I still need to figure out some of the ALTQ stuff though; my last attempt left me throttling internal traffic to 0.5Mb/s :-)

Upgrading from FreeBSD 5.2 to 5.5 (RELENG_5)

Wednesday, March 15th, 2006 in FreeBSD

I’ve been putting off upgrading my remotely hosted server from FreeBSD 5.2.1 for a while now, but after I started getting random problems I decided I had to move forward.

To start with there were a few reasons putting me off doing the upgrade:

  1. From version 5.3 of FreeBSD vinum was pretty much broken, so I’d need to switch to gvinum.
  2. The threading library changed from libc_r to libpthread.
  3. A handful of key libraries in /lib and /usr/lib had their versions bumped.

The first problem is pretty straightforward to work around. Changing “vinum_load” to “gvinum_load” in /boot/loader.conf, and changing “vinum” to “gvinum” in /etc/fstab was all it took. To be on the safe side I did a fsck of the filesystems after rebooting in to 5.5.

The other two problems can be dealt with after a reboot by rebuilding all the applications on the machine. This is as simple as running “portupgrade -af”, but I chose to do it in chunks so I could get key things up and running quickly. One trick I used was to touch a file in each directory under /var/db/pkg, which would get removed when the package was upgraded. This allowed me to easily see what I still needed to do.

Most of this information was gathered from looking at mailing list archives and most importantly from reading /usr/src/UPDATING.

So, what problems had I been having before this that forced me to do the upgrade?

  1. When piping a message to a command in mutt the pipe would never close. So, for example, piping to cat would display the message and then just hang.
  2. PHP would hang when applications used the PHP mail() function; this forked sendmail and piped the message to it. I suspect this is the same problem as above.
  3. Various things, most noticeably MySQL 5, would not compile. The problem was the recent libtool upgrade; libtool filters out -lc_r linker arguments.

After doing the upgrade these problems went away. The whole procedure was far less painful than I had been expecting.

Of course, the next challenge is upgrading to RELENG_6. But maybe I’ll leave that for another couple of years… :-)