{"id":36,"date":"2006-05-12T11:43:16","date_gmt":"2006-05-12T11:43:16","guid":{"rendered":"https:\/\/www.bishnet.net\/tim\/blog\/2006\/05\/12\/erm-whoops\/"},"modified":"2010-11-11T13:01:57","modified_gmt":"2010-11-11T13:01:57","slug":"erm-whoops","status":"publish","type":"post","link":"https:\/\/www.bishnet.net\/tim\/blog\/2006\/05\/12\/erm-whoops\/","title":{"rendered":"Erm, whoops?"},"content":{"rendered":"<p>I&#8217;d finally finished migrating everything off the old myrtle disk arrays, so I was feeling quite pleased. I&#8217;d just unplugged the last array from myrtle and plugged\u00c2\u00a0it in to the test machine for wiping. Then I tried to log in to the machine room SunRay, but strangely it didn&#8217;t work.<\/p>\n<p>I checked the console logs for myrtle and was surprised to see it counting &#8220;12%&#8230; 13%&#8230; 14%&#8221;. I glanced up and saw my colleagues attempting to come in to the machine room and tell me something, but for some reason were unable to open the door. Scrolling\u00c2\u00a0back\u00c2\u00a0over the console logs I\u00c2\u00a0saw what it was up to:<\/p>\n<blockquote><p>panic[cpu2]\/thread=2a100105d40: md: Panic due to lack of DiskSuite state\u00c2\u00a0database replicas. Fewer than 50% of the total were available,\u00c2\u00a0so panic to ensure data integrity.<\/p><\/blockquote>\n<p>That made immediate sense to me, and I gave myself a bit of a kick. The RAID system we use for internal disks, DiskSuite (actually Volume Manager now, but it seems they haven&#8217;t updated this error message), has state databases stored on every disk. On myrtle we had 6 &#8211; two on the internal disks, and one on each of the four disk arrays. You need <em>at least<\/em> 50% for things to work.<\/p>\n<p>A\u00c2\u00a0week or so ago I removed the first pair of arrays without any problems. At that point we had 4 out of 6 databases. Today I removed the last 2 giving us only 2 remaining, which is less than 50%, and the machine dutifully paniced itself.<\/p>\n<p>Fixing it was made tricky by the fact that it could no longer mount the root filesystem because the RAID wouldn&#8217;t start. Thankfully the arrays were still to hand, so I just plugged them back in. After booting I removed the databases from the arrays, and added an additional one on each of the internal disks &#8211; this gives us 4 in total, 2 on each disk, which is what we normally do.<\/p>\n<p>I also used the handy opportunity to mount the new filestore directly on \/home and \/proj, rather than using symlinks.<\/p>\n<p>I&#8217;ll end this post with a bit of a rant. I can understand why the system won&#8217;t boot with less than 50% of the state databases &#8211; it has no way of knowing if they represent the correct state of things. But, what I don&#8217;t understand is why it needs to panic the system when it has less than 50%. It knows the remaining ones are\u00c2\u00a0valid because they&#8217;re currently in use. In fact panicing just makes it harder for the sysadmin to deal with the problem. Or am I missing something?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;d finally finished migrating everything off the old myrtle disk arrays, so I was feeling quite pleased. I&#8217;d just unplugged the last array from myrtle and plugged\u00c2\u00a0it in to the test machine for wiping. Then I tried to log in to the machine room SunRay, but strangely it didn&#8217;t work. I checked the console logs for myrtle and was surprised &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-36","post","type-post","status-publish","format-standard","hentry","category-work"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/posts\/36","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/comments?post=36"}],"version-history":[{"count":1,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions"}],"predecessor-version":[{"id":377,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions\/377"}],"wp:attachment":[{"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/media?parent=36"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/categories?post=36"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bishnet.net\/tim\/blog\/wp-json\/wp\/v2\/tags?post=36"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}