Thu Feb 26 14:18:59 PST 2009

New Versions of hmcclinic and hmcthesis LaTeX Classes Available

I've updated the Clinic and thesis LaTeX classes to explicitly load the booktabs and natbib packages, and to define bibliographic punctuation as described in the Clinic and thesis handbooks; that is, in theses or Clinic reports, you no longer need the lines


in the preamble of your document's master file.

The new versions are available on the math Linux cluster (in /shared/local/share/texmf/tex/latex/{hmcclinic,hmcthesis}), where they will be used automatically for typesetting Clinic reports or theses on a math cluster machine.

For use on your own machine, you can download a tar.gz or zip archive from the following URLs:

As always, the very latest (development) versions are available from our Subversion repository, at

hmcclinic class
hmcthesis class>

As usual, if you have any questions or problems, please let me know.

Posted by Claire Connelly | Permalink

Fri Feb 20 11:05:19 PST 2009

Hex Back In Service

hex, a compute server operated by the mathematics department and the Quantitative Life Sciences Center, is now back in service.

I got the original machine back on Wednesday and have completed updates and reinstalled it in our machine room. All data from the loaner machine has been restored to the /scratch partition, which is current as of 2009 February 18.

Please report any problems or other issues to

Posted by Claire Connelly | Permalink

Mon Feb 16 12:04:36 PST 2009

Continued Mirror Issues

Mixed up in all the issues with the main array (which runs off a dedicated array-controller card) was a failure of one of the hard drives in the software mirror supporting the OS. We saw a few minutes of downtime this morning as I identified and pulled the problem drive. Once I get the replacement drive, there will be a few more minutes downtime while I install the drive and get the mirrors synced up again.

Thanks for your patience.

Posted by Claire Connelly | Permalink

Thu Feb 12 09:27:53 PST 2009

Mirror Back In Service

I've now brought the mirror server, back online. The array checked out okay, which was good, but getting the machine back up was complicated by the Grub configuration on the OS disks somehow getting lost. Once I'd managed to get that working, and the machine actually booting consistently, I was able to fsck the file systems on the various mirror volumes.

Most of the mirrors were fine, but there was some corruption on the RPMforge mirror, which I cleaned up.

I then ran some resync jobs to get everything up to date. Most of those ran fine, but I wasn't able to get the CentOS mirror updated on that run (too many users on the server), and it was quite late, so I decided to just let the machine be overnight, and to let the morning syncs run. As those seemed to be in order, I restarted the web server and resumed serving mirrors this morning.

As ever, if you encounter any problems, please let me know.

Posted by Claire Connelly | Permalink

Wed Feb 11 12:20:55 PST 2009

Mirror Array Status

I've upgraded the firmware on the controller, as the update included fixes for some issues with SATA drives being incorrectly marked as dead as well as some other fixes that could be useful. After some plugging and unplugging of drives and tinkering with the controller's UI, I theoretically have a valid and active array.

Just to be careful, however, I'm going to keep the machine offline while I run a consistency check, which will take something on the order of eight to ten hours. Once that's complete, I'll bring the machine back up and get the mirrors synced up to upstream again before I bring the web server back online (and thus start serving content again).

Posted by Claire Connelly | Permalink

Tue Feb 10 22:06:06 PST 2009

Mirrors Down Again

Sadly, I'm not surprised that the same problem recurred. Sorry for the inconvenience; I'll take a look at the machine when I'm back in the office.

Posted by Claire Connelly | Permalink

Tue Feb 10 17:18:23 PST 2009

Mirror Hiccups

This morning I found that the array on the mirror had some serious issues, with three (out of eight) disks offline. The disk in slot 6 had some sort of glitch around 12:30 AM, causing the controller to take it offline. A minute later, the disk in slot 5 had a similar problem, and the controller took it offline as well. Twenty minutes later, the disk in slot 1 failed, as well, taking the array offline.

I physically removed and reseated the drives, and also opened the case and checked all the cables in case any of them had been loosened. To my surprise, one of the drives (slot 1) came back, and I was able to get the other two drives to start rebuilding.

It's not yet clear to me whether the array will completely recover or not -- we could still lose it if we have additional drive failures. Keep your fingers crossed.

More Details

The drives in question are all 1 TB Seagate 7200.ES2 drives, which are theoretically at risk due to the Seagate firmware bug that was recently discovered. I had checked the firmware on these drives, and it was earlier than the problematic firmware. Checking the serial numbers against a new tool Seagate added also shows that these drives should not be subject to the firmware problem.

At the moment, my suspicions are on the array-controller card. I've seen cheaper controllers have similar issues with dropping a drive, and it may be that whatever happened with the first drive affected the controller and resulted in the other two drives going offline. In any case, I will be keeping an eye on the situation.

Posted by Claire Connelly | Permalink

Tue Feb 10 17:03:23 PST 2009

Status Update on hex

Shortly before the semester started, I got the CPU expansion board back from the vendor. Apparently it had tested out fine, so I reinstalled it, and found additional problems. Eventually they sent out some technicians to take a look in person, which resulted in them taking the whole machine back with them.

Happily, they were able to loan us a roughly equivalent machine, which is now online as hex with a similar OS and software load. The replacement machine has 32 GB of RAM, just like our hex, but it actually has 24 processor cores (four six-core Intel CPUs!). Also very nice from my perspective is that this machine is only one rack unit (about 1.5") high instead of five.

Until we get the original machine back, we'll continue to use the loaner. As usual, if you have problems or questions, please ask.

Posted by Claire Connelly | Permalink