The T3 lives? | Tim Bishop

After yesterdays saga I was looking forward to an easier day today, but I didn’t get it.

At the end of my last post I was trying to disable the primary controller in the array. It took a while, but it didn’t help. However, after some more discussion with Paul at Sun we noticed a lot of errors for the primary loop. I disabled that and the errors instantly stopped. Success!

So I then had a couple of options:

Let the on-site engineer come today and try replacing parts, which would causing more downtime for myrtle.
Leave the array as it is until after the bank holiday weekend and hope it keeps working.
Start the planned migration to new filestore immediately, and fix the hardware later.

I decided that the third option was best, and cancelled the engineer that was arranged for today.

Today didn’t start so well though. At 7:15 Sun Dispatch phoned to confirm theÂ ETA for the engineer. I explained it should have been cancelled, which was fine except forÂ the parts had already being shipped. At 7:20 I get a call from a couple ofÂ DHL drivers who are sitting in the Maths car park at work. I was at home only just awake. Fortunately they agreed to bring the parts to my house instead.

Arriving in the office this morning I noticed a failure on one of our cluster nodes. It looks like hardware, so that should be easy enough to fix. Thankfully there’s three other nodes that fairly seemlessly took over the workload of this node. This won’t affect the migration of filestore to the cluster.

I’m now in the process of copying people off the old arrays. It’s going quite slowly – maybe the array isn’t running at full capacity, I’m not sure. Once this is done we can take the arrays of myrtle and let Sun fix them.

(Visited 103 times, 1 visits today)

Related posts:

Leave a Reply