Pocket GPS World charge for speed camera data!

I’ve been using the speed camera database at Pocket GPS World with my satnav system for a while now. With the necessary tools installed on my PC it would automatically pull the latest data and install it on my PDA. Fantastic system, and no hassle at all to keep updated.

When doing an update today I thought to check the Pocket GPS World website. To my amazement they’re now charging for the database. It’s not that expensive in reality, but I don’t like services that start off free to get everyone hooked and then start charging.

So I’ve forked out a massive 2 quid to get the latest database, but whether I’ll keep doing that I don’t know.


The T3 lives?

After yesterdays saga I was looking forward to an easier day today, but I didn’t get it.

At the end of my last post I was trying to disable the primary controller in the array. It took a while, but it didn’t help. However, after some more discussion with Paul at Sun we noticed a lot of errors for the primary loop. I disabled that and the errors instantly stopped. Success!

So I then had a couple of options:

  1. Let the on-site engineer come today and try replacing parts, which would causing more downtime for myrtle.
  2. Leave the array as it is until after the bank holiday weekend and hope it keeps working.
  3. Start the planned migration to new filestore immediately, and fix the hardware later.

I decided that the third option was best, and cancelled the engineer that was arranged for today.

Today didn’t start so well though. At 7:15 Sun Dispatch phoned to confirm the ETA for the engineer. I explained it should have been cancelled, which was fine except for the parts had already being shipped. At 7:20 I get a call from a couple of DHL drivers who are sitting in the Maths car park at work. I was at home only just awake. Fortunately they agreed to bring the parts to my house instead.

Arriving in the office this morning I noticed a failure on one of our cluster nodes. It looks like hardware, so that should be easy enough to fix. Thankfully there’s three other nodes that fairly seemlessly took over the workload of this node. This won’t affect the migration of filestore to the cluster.

I’m now in the process of copying people off the old arrays. It’s going quite slowly – maybe the array isn’t running at full capacity, I’m not sure. Once this is done we can take the arrays of myrtle and let Sun fix them.


A T3 goes bang

We have a fairly long standing hatred of the Sun T3 storage arrays, and last night they once again proved why we feel that way.

At around 7pm last night I noticed a lot of SCSI errors on myrtle (our staff and research Solaris server) which I quickly tracked down to a problem with one of the attached T3 arrays. I was rather surprised to see what I found in the T3 logs:

W: u1d6 SCSI Disk Error Occurred (path = 0x0)
W: Sense Key = 0xb, Asc = 0x47, Ascq = 0x0
W: Sense Data Description = SCSI Parity Error
W: Valid Information = 0x2049a82
N: u1ctr ISP2200[0] Received LIP(f7,e8) async event

And pages and pages of the above and other fairly obscure looking messages. It seemed every single disk had a failure on it, which was quite unlikely. I tried to power cycle the array but it refused to shut down.

Thankfully this machine has a gold level support contract with Sun, so I phoned up their “UK Mission Critical Solution Centre” for some assistance. We didn’t really achieve too much other than sending logs back and forth, and prodding a few things. Eventually, seemingly by itself, the array decided that it would disable one of the disks and then everything seemed to go quiet. It was gone 10pm by this point, so I was quite relieved by the spontaneous fix.

It had tried to rebuild on to the hot spare, but that had failed too. So we were left with a slightly creaky, but working, raid 5 array with no redundancy at all. I mounted the file system up and scheduled a full backup overnight, and surprisingly by the morning it was still working. We still had disk errors though, but only for one disk which was now disabled:

N: u1d6 sid 111 stype 2024 disk error 3

Later today a Sun engineer arrived to replace both of the disks that had shown errors (one of which was the hot spare). With both replaced rebuilds started with a lot of error messages. We decided it was best to power everything down and kick the rebuilds off again.

The array went round in a loop a few times: sync to spare, sync back, sync to spare, sync back. Eventually it stopped, and I reconnected it to the host system, which of course didn’t detect it. Time for another reboot 🙂

And, much to my annoyance, that didn’t work. It seems the luns are fine when unmounted, but as soon as the OS gets at them we get problems. Back on the phone with Sun and they’ve agreed to send new parts for just about everything, but that’ll mean another 12 hours or so without home directories on myrtle (for half the users).

I’m trying one last thing, though – disabling the primary controller. It probably won’t work, but it’s worth a try.

Did I mention I hate T3s?


Car service

Yesterday my car went in for it’s first yearly service at Invicta Motors in Canterbury. All in all not a bad experience; they gave me a discount for being a valued customer, cleaned the car for me (inside and out!), and had it waiting by the door when I got there. They even fixed the mudflap clip I managed to knock off 🙂

My only complaint really is about communication. I tried to book the service online (via the Ford website – I hadn’t found the Summit one then), but it seems that vanished in to the ether – I ended up booking again over the phone. They also said they’d phone me when the car was ready, but they didn’t.

Anyway, that’s all made better by the free pen they gave me 😉


It’s that time of year again: car insurance

A few days ago my car insurance renewal quote came through from Ford Insure. I’d already done some online quotes to get an idea what I would be expecting to pay, so I was quite surprised to see the Ford quote come in at 50% more then the majority of the other quotes.

Even when I queried this with them they basically said “we’re better than the rest, so you have to pay more”. It started to get amusing when I pointed out that their own online quoting system had given me a far cheaper price. The response went something like: “Oh, you can’t trust those online systems, they’re never right”. Confidence inspiring, eh?

So, I was after a new insurer. I had a list of insurers from last year, so first off I went through them and got a load of quotes (all online, it’s easier that way, but pretty tedious). Then I discovered InsureSupermarket, which performs a similar function to Confused.com. It’s interface was straightforward, and it gave me a good set of quotes which were similar to what I’d got myself. Interestingly, though, the prices through here were cheaper – even when I used it’s links to go through to the insurer’s site and complete the quote. Maybe that can be attributed to the random nature of insurance, or maybe they get a special discount.

At the end of this process I’m left with two fairly similar prices from RAC and Norwich Union. Not much to chose between them really, other than whether I like yellow or orange. Then I recall my breakdown cover is with RAC. I punch those details in too and the RAC quote drops some more, and it says I can become a “Plus” member. RAC it is then.

Just as I finish writing this I see that “RAC plc has been acquired by Aviva plc” and that it’s being “integrated into the Norwich Union Insurance division”. Maybe that’s why the choice was so hard? 🙂

Reflecting on what I’ve written I can see that I’ve made the process sound pretty easy. It’s not – it’s a minefield. It’s hard and repetitive work going through all those sites to get quotes, and differentiating between what you get is an impossible task. The interfaces seem to come in about three or four different varieties, but all with their own subtle differences. I’m left wondering how anyone is really meant to make an informed decision without weeks of research.

At the end of the day insurance companies don’t want to help you, they just want your money. I really hope I don’t have an accident, because I’ll be screwed over for years…


Finding the time

It seems I’ve never got enough time these days.

I’m a FreeBSD ports committer, but recently I’ve hardly done anything. All I’ve managed to do is keep my own ports updated. It’s quite frustrating because I want to be more involved. Then there’s the other projects like libstatgrab, they don’t even get a look in.

I blame the “day job”; if it’s not using my actual time it’s occuping my mind…


What a weekend

On Thursday I arrived back from Cornwall after a fairly lengthy drive, and to get back in to the swing of things I dived right in to the deep end at work.

This weekend there was a complete power shutdown on the campus for some “essential electrical work”. This required us to shut down all our machines, wait a few hours, and then start them all up again. Doesn’t sound too hard, does it?

That’s what I thought anyway. So, to spice things up a bit I figured I’d patch Solaris on all our servers, patch the OBP firmware on all the Sun kit, and update our Veritas cluster with a maintenance pack. My logic behind doing this is that all these things require downtime, and the cluster in particular would be quite disruptive. So what better time to do it than when everything is already down? Doing it the usual way would require downtime on Tuesday mornings for the next month.

On Friday I began patching all the machines I could safely reboot without impacting any of our users. This would have been made easier if our console servers were working, but a quick drive to work fixed that one. This was closely followed by another drive to work to turn the keylock on the servers so I could update the OBP firmwares. At the end of the day I was left with just a few core machines to patch.

Saturday morning started early, some time around 6.30am. I stumbled over to my desk at home to be presented with a dead X session – it looks like Xorg crashed during the night. After about 15-20 minutes of faffing I had everything back running again, and all the relevant tools opened up. I started patching the remaining machines, and the last few OBP firmwares. After a quick shower I popped up to work for around 8am. The power was scheduled to go off at 9am, so I had an hour to make sure everything was shut down. No real problems there, and finished with time to spare. Then I twiddled my thumbs for a further 30 minutes waiting for the power to actually go off.

It’s remarkably eerie to be in the machine room in the dark with most things off. The silence is only broken by the beeps from dying UPSs and the relatively quiet whiring from the core networking equipment (which has an impressively large UPS). Anyway, I digress…

Then follows the boring bit – waiting for the power to come back on. I decided to go in to town, do some shopping, then head off to Sainsburys for the weekly shop. By the time I’d done all this, and watched a bit of TV, it was 2pm and time to head back to work.

The power was scheduled to come back on by 5pm at the latest, but handily it came on earlier. Sometime around 3pm the lights came back on and the air conditioning kicked off with a massive roar. We waited for a further 30 minutes to get the all clear from maintenance before starting to power things on though. I used this time to move a machine, repatch some cables, and get the networking back online.

Earlier I mentioned I also wanted to patch the cluster. I decided to do this as the first thing after bringing the power back online. I’d already arranged for the cluster to not fully start up, so all I needed to do was bring the relevant machines back online and kick off the patch installer. Patching went fine, albeit taking a while; I found a Snickers bar was a good way to fill this time. Next I had to start the cluster up, which wasn’t so easy. After getting it running I couldn’t start any services – it kept returning something similar to the following:

Service group has not been fully probed on the system

It took a fair amount of head scratching, and a bit googling to realise that I needed to take a copy of the latest types.cf and put it in my VCS config directory. Did I miss that in the upgrade documention, or was it just not there? Either way, after doing that the cluster started up without any further problems.

Next I powered up all the remaining systems in order, which took a while. I did have a couple of problems though:

  1. One of the mirror service machines paniced on boot. Trying a different kernel fixed that, but the config really should have been right in the first place.
  2. Our web server has a failing system disk. It’s mirrored, so it’s not a big deal, but the disk keeps limping rather than failing – the result being a pretty slow machine.

At around 7pm I’d got everything going again, so I headed to the office to quickly check my email. When my email client refused to load I was too exhausted to care, so headed home for dinner (and Dr Who).

An hour or so later I went back to try and figure out what was going on. Handily a colleauge had also seen the problem and queried whether lockd was running. That made sense; I assume my mail client couldn’t lock it’s mailbox on the NFS server. A quick check revealed it wasn’t running on the cluster NFS server. I haven’t investigated why, yet, but I hope it’s something I’ve done wrong rather than yet another bug.

By the time that was sorted, and I’d read all my email, the only remaining thing to do was log the disk fault with Sun. I was surprised to have an engineer get back to me so late on a weekend, but I guess that’s the advantage of Gold support. I eventually gave up conversing by email at around midnight and went to bed.

I awoke later than usual on Sunday thinking everything was fine. I wandered over to my computer, which hadn’t crashed this time, and was rather annoyed at what I saw; a whole bunch of error messages about not being able to contact work servers. A load of things went through my head:

  • Has myrtle crashed? hard to tell, can’t get at the console.
  • Has the power gone off again? can’t get at any of our machines, but I can get to a service one, so seems not.
  • Has the networking died? can get to service equipment and can ping at least one of our routers.

So back up to work, again, to see what’s going on. I concluded it was a problem with the service router that our router is connected to (it’s happened before), so I pulled the cable out. After a short pause the failover link from our second router to the second service router kicked in, and most stuff started ticking again. There are still routing problems, though, which means amongst other things that we’re only getting some emails.

Back at home I waded through a whole mass of emails generated by the network outage, and find a few from Sun. They’ve also concluded the disk in the webserver is dead, so hopefully we’ll get that replaced on Tuesday.

That leaves me with a few things to sort out:

  1. Investigate why the cluster didn’t start lockd.
  2. Find out what happened to the networking.
  3. Get the disk changed in the webserver.

But they can certainly wait until I’m back at work.


An end to comment spam

I finally decided to put an end to comment spam – I’ve installed a captcha. Whilst I was away on holiday I received around 3-5 spams a day, so rather than just deleting them I figured I needed to prevent them.

The wordpress docs are a really useful resource. I decided to go with SecureImage since it looked fairly good, and wasn’t complicated to set up. First I installed ImageMagick, then I dropped the php in place. I did have to tweak it a little bit to replace the short PHP tags with the full ones, but then my configuration isn’t standard.

Since doing that I’ve had no spam. I’ve also had no other comments, but hopefully that doesn’t mean I’ve broken my blog 🙂


Comment spam :-(

Not that long ago I questioned how long it would be until I got comment spam on this blog. It didn’t take long for that to be answered; over the past day or two I’ve had 3 of them. Admittedly that’s not too many, but given this blog has only been running for two weeks that’s a pretty depressing start. How did they manage to find me so fast? I guess Google is useful to spammers too.

Moaning anside, it won’t affect anyone reading this blog since I’ll catch the spam comments at the moderation phase. But it’s yet more work I have to do to combat spam. I really am sick of it.

SIGH. Make that 4…