After yesterdays saga I was looking forward to an easier day today, but I didn’t get it.
At the end of my last post I was trying to disable the primary controller in the array. It took a while, but it didn’t help. However, after some more discussion with Paul at Sun we noticed a lot of errors for the primary loop. I disabled that and the errors instantly stopped. Success!
So I then had a couple of options:
- Let the on-site engineer come today and try replacing parts, which would causing more downtime for myrtle.
- Leave the array as it is until after the bank holiday weekend and hope it keeps working.
- Start the planned migration to new filestore immediately, and fix the hardware later.
I decided that the third option was best, and cancelled the engineer that was arranged for today.
Today didn’t start so well though. At 7:15 Sun Dispatch phoned to confirm the ETA for the engineer. I explained it should have been cancelled, which was fine except for the parts had already being shipped. At 7:20 I get a call from a couple of DHL drivers who are sitting in the Maths car park at work. I was at home only just awake. Fortunately they agreed to bring the parts to my house instead.
Arriving in the office this morning I noticed a failure on one of our cluster nodes. It looks like hardware, so that should be easy enough to fix. Thankfully there’s three other nodes that fairly seemlessly took over the workload of this node. This won’t affect the migration of filestore to the cluster.
I’m now in the process of copying people off the old arrays. It’s going quite slowly – maybe the array isn’t running at full capacity, I’m not sure. Once this is done we can take the arrays of myrtle and let Sun fix them.
Related posts:
- A T3 goes bang We have a fairly long standing hatred of the Sun T3 storage arrays, and last night they once again proved why we feel that way. At around 7pm last night I noticed a lot of SCSI errors on myrtle (our staff and research Solaris server) which I quickly tracked down to a problem with one of the [...]...
- The end of the T3 saga So after copying everyone off the limping T3 arrays I arranged for a Sun engineer to return to site to fix it properly. Sun Dispatch had a bit of a moan because I’d had the parts for too long, but they realised it’d make most sense to keep the parts on site rather than collect [...]...
- “Any idea WTF is going on?” “Any idea WTF is going on?” is what I read on my phone as I stumbled out of bed this morning. It was from one of my colleagues who, for some reason I can’t understand, seems to like getting in to work at a ridiculous hour in the morning. Still half asleep I plodded through [...]...
- Erm, whoops? I’d finally finished migrating everything off the old myrtle disk arrays, so I was feeling quite pleased. I’d just unplugged the last array from myrtle and plugged it in to the test machine for wiping. Then I tried to log in to the machine room SunRay, but strangely it didn’t work. I checked the console logs [...]...
- Increasing our storage provision During the summer we started getting tight on storage availability. It seems that usage on our home directory areas constantly increases – people never delete stuff (me included!). We were running most of our stuff through our Veritas Cluster from a pair of Sun 3511 arrays and a single 3510 array. Between them (taking mirroring [...]...