Mon Nov 17 12:04:33 PST 2008

More on hex

I have pulled out the CPU expansion board from hex and sent it back to the vendor, who is going to try to get a replacement from the manufacturer. In the meantime, hex is running with eight cores (four CPUs) and 16 GB of RAM, and it seems to be stable.

Please let me know about any problems, and check back here for updates.


Posted by Claire Connelly | Permalink | Categories: News, System Maintenance, Amber

Wed Jun 4 15:06:43 PDT 2008

Systems Work: Saturday, June 7, and Sunday, June 8

When

I will be doing some systems work this weekend, June 7--8.

Work will probably begin around 11:00 AM on Saturday, June 7, and will continue for several hours. If necessary, additional work may be done on Sunday, June 8, within a similar block of time.

What Will Be Affected

The work will disrupt most of our networked services, including e-mail, file service, interactive sessions, and the web server for periods of several minutes to an hour over the course of the work.

I also want to make sure that all of our Macs are running the latest security updates, so will be updating these machines during this time period as well.

What You Should Do

If you're using a Mac or Linux system that mounts file systems from our servers, before you leave on Friday evening,

  • Save all open files;
  • Close all applications;
  • Log out;
  • Leave your machine running.

Why

This work is necessary for us to ensure the security and improve the stability of the overall system. In particular, I am hoping that ongoing issues with our web server will be resolved as a result of this work.

I will do my best to keep as much of the system functional as possible for as much of the time as I can, but there will still be some outages.

Additional Background

Last semester we had some serious issues with interactions between the NFS support on our new file server and on our workstations and older servers, exacerbated by the HVAC failure. I was able to stabilize things, but we still see some flaky behavior (especially From the web server, which needs to be rebooted periodically).

On the Linux server side, I plan to update to the latest kernel releases and do some experimentation to see if everything will work together happily. I will need to reboot various servers and workstations an arbitrary number of times to explore all the possible interactions.

For Macs, I will install the latest updates, most of which require the machines to be rebooted. As Tiger (Mac OS X 10.4) has problems when an NFS server disappears and reappears, these machines would need to be rebooted anyway.

Comments/Problems/Other Issues

As usual, if there are problems with the scheduling of this work, requests or any other comments, please let me know.

Updates/Status Reports

As usual, updates on the status of the systems and progress reports will be posted to the ``sysblog'', on our web server at

http://www.math.hmc.edu/computing/blog/>

Thanks for your cooperation!


Posted by Claire Connelly | Permalink | Categories: Mail, News, System Maintenance, Linux, Macintosh, Website, Amber

Fri Feb 29 01:51:54 PST 2008

Server Work May Require Workstation Reboots (YMMV)

I ended up doing some fairly significant work in the machine room Thursday afternoon and evening, which involved rewiring the entire rack. In order to be sure that some of the systems were working properly, I rebooted several of the machines in the rack, including the department's main file server (gytha) and our parallel compute server hex. As a result, some workstations -- especially Mac OS X machines -- may be confused about their NFS mounts. If you have problems logging in or if you can log in but you can't access your home directory or applications or other materials stored in /shared/local, please reboot the machine and try again.

I'm about to go to bed, but I will be reachable at home or by cell tomorrow if there are any unforeseen issues.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux, Printers

Sat Jul 7 13:36:41 PDT 2007

Reboot Complete

As anticipated, the system ran fsck to check the disks on most of the home-directory partitions, taking on the order of half an hour to complete.

The partitions came up clean and the system rebooted. The tape drive is working again, and I am flushing the previous day's backups to tape.

We now resume normal service. However, if you come across any problems, please let us know by sending e-mail to system@math.hmc.edu.


Posted by Claire Connelly | Permalink | Categories: News, System Maintenance

Fri Jul 6 17:45:15 PDT 2007

System Outage, Saturday, 2007 July 7, 1:00 PM

The Facts

The department's main server will be rebooted Saturday afternoon to clear a stuck IO process.

Services affected will include

  • Logins on department Linux and Macintosh machines
  • File services to Linux and Macintosh machines
  • Print services for Linux
  • E-Mail (incoming and outgoing)
  • Web service for personal accounts (~user URLs)

Length of Outage: Approximately one hour.

A Bit More Detail

The SCSI driver for our tape drive is stuck in a low-level IO loop that can't be interrupted. As we can't use the tape drive until this process is cleared, and the only way to clear it is by rebooting, we need to reboot the server.

The minimum time for a reboot of this system is around ten minutes based on various hardware tests and initializations. The actual reboot will probably take longer, especially if the system needs to run checks on disk partitions, in which case the reboot time could extend to around forty minutes or so. Rebooting can also reveal unforeseen consequences of some configuration changes, which can add additional delays before all services are available.

I will send messages to all logged-in users about ten minutes before I start the reboot. If you happen to log in shortly after the reboot, don't expect that the system will remain up unless the department's system blog has been updated with a message stating that the system is back up and maintenance is complete.

As usual, we apologize for any inconvenience that this downtime will impose, but occasional maintenance is required to keep the system running.


Posted by Claire Connelly | Permalink | Categories: News, System Maintenance

Sat Jan 20 18:05:56 PST 2007

Repairs Complete

I have completed the repairs to our primary server, and everything should be working as usual. If not, please let me know!

END-----

Posted by Claire Connelly | Permalink | Categories: News, System Maintenance

Fri Jan 19 16:10:42 PST 2007

System Offline for Maintenance Saturday, 2007 January 20

We have a hardware problem with our main server, esme, which requires me to take the machine offline in order to replace some parts.

I will shut the server down at 2:00 PM tomorrow, Saturday, January 20. The work will either take about twenty minutes or will require much more extensive part swapping, which could take several hours. Please check this blog (which will remain available) for updates and notification about everything being back on line.

Because the problem is with our primary server, e-mail, logins, and printing will not be available during the outage. Home directories will also not be available, so class websites hosted out of professors' home directories (which is most of them) will also not be available during the outage. Our web server is a separate machine and will remain available, with all content not kept in home directories.

Sorry for the inconvenience and short notice; I've only just received the necessary parts.

END-----

Posted by Claire Connelly | Permalink | Categories: News, System Maintenance

Mon Jan 23 12:22:50 PST 2006

PHP Support Ends

As announced in November, 2005, support for the use of PHP, a popular, but problematic, web-programming language, has now ceased.

Any pages that relied on the Apache PHP module being available will no longer render properly.

If the lack of PHP poses a problem for you, please let me know and we can look into alternatives that pose less of a security risk for our system.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Website

Fri Jan 20 16:24:02 PST 2006

Updated Firefox to 1.5

I've updated the version of Firefox in /shared/local to 1.5, which is the latest release.

You can run Firefox by typing firefox at a terminal prompt or by creating a GNOME Panel launcher by right-clicking on a panel, choosing Add to Panel, then choosing Custom Application Launcher and filling in the fields in the dialog box that will appear.

You can find a Firefox icon in /shared/local/firefox/icons. The canonical path to the application until such time as it is installed by default on individual machines is /shared/local/firefox/firefox. For most people (unless you've tinkered with your PATH), just putting firefox in the Command field will do the trick.

Among other improvements, Firefox 1.5 supports RSS, Atom, and other feed protocols in a much more convenient way than previous versions of Firefox did.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

Wed Jan 18 12:11:42 PST 2006

Changes to Guest and Emeritti Accounts

Despite the dramatic-sounding title of this entry, I expect that there will be little or no actual change in the way that the system works for about 99% of the affected users.

As another short-term way of dealing with the ongoing disk space crisis on /home/faculty, I have migrated the emeritti and former-faculty accounts that had their home directories in /home/faculty to a new partition on the server.

Practically speaking, there should be no real impact from this change for anyone, even the people whose accounts were moved, as I have added links to preserve the appearance of the file system.

If your account has been moved (you can tell by logging in and running pwd, which will tell you your present working directory) and you notice some issues, or if you try to reach a personal web resource (i.e., one that has a URL similar to http://www.math.hmc.edu/~someaccount) that is no longer available, please report the problem to me so I can track it down and fix it.

The Nitty-Gritty Details

If you're interested in the details of what was done, most of it is pretty visible, kind of like post-surgical scars.

/home/guests is now a link farm, with symbolic links pointing to actual directories that are located in /home/guests-one or /home/guests-two. The account database has been set up so that home directories for migrated accounts are in /home/guests (that is, they point to the links that point to the real directories).

Because of the limitations of NFS, we now have to export /home/guests, /home/guests-one, and /home/guests-two, and mount all three of those shares on each machine that is available for general use.

The original directories in /home/faculty have been replaced with links that point to the directories in /home/guests, so any web-related links will still work.

Potential Issues: Hard-Coded Home Directory Paths

Because of the links, everything should work as it always has. At some point down the road, however, I hope to be able to add some additional disk space, which will allow me to do some rejuggling of account locations. At that point I will probably try to clean up some of the remaining links to make everything neat and less complex.

With the removal of the links, scripts or other materials that refer to hard-coded, complete paths to your home directory or directories within your home directory may break. In other words, if you had a script that looked for files in your home directory and specified them as

/home/faculty/username/some/directory/or/file

(where username is your username), but your physical home directory is now located in /home/guests-two/username, and is referred to by the system as /home/guests/username, you will have problems when one or more of the links is removed or changed.

If you're working with shell scripts, the best way to refer to your home directory is with the environment variable $HOME, which is pretty much guaranteed to resolve to the correct answer no matter what shell you or your script use. For many modern shells (and scripts written in those shell's language), you can use the tilde (~) to refer to your home directory, but $HOME is safer and more likely to work no matter what. (You'll want to use ~ on the command line, of course.)

The Future: Solutions in the Pipeline

This mess will be cleaned up after we've obtained more disk space, which is on the agenda for a departmental computing-committee meeting on Friday. I hope that we will be able to find the money to move quickly on that project, and that I will be able to put additional disk space online over spring break (March 10 - 19).

In the meantime, keeping an eye on your disk usage and avoiding excessive disk usage (which I would define as usage that's significantly more than others with home directories in your partition) is, and will always be, a good thing to do that will benefit everyone else all the time, and you when you have a sudden, short-term need for a larger amount of disk space.

Thanks for your cooperation.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

Thu Jan 12 13:54:59 PST 2006

Thunderbird Updated to 1.5

Version 1.5 of Mozilla Thunderbird, the Mozilla Foundation's e-mail client, was released today.

I have installed it in /shared/local/thunderbird, where it takes the place of the previous release (which was 1.0.7). The old release will still be available in /shared/local/thunderbird-1.0.7 for at least a couple of weeks.

Please enjoy the new release, and let me know about any problems that you have with it.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

Fri Dec 30 17:52:58 PST 2005

Power Back; Most Systems Running

The testing is complete. I have restarted all workstations and printers, so everything should be back to normal.

We did suffer one casualty, a faculty workstation whose power supply died. It's not clear that the problem was caused by the testing, but I have called the vendor for a replacement. (I have notified the machine's primary user by e-mail, so if you didn't get e-mail from me about your machine dying, your machine should be working -- please let me know if it isn't!)


Posted by Claire Connelly | Permalink | Categories: System Maintenance

Fri Dec 30 10:52:28 PST 2005

Systems Unavailable During Power Outage Period

Power-system and generator testing by the Claremont Colleges Physical Plant staff will affect all non-UPS-backed workstations on the mathematics department network. Because we cannot be certain about the duration of the outages, we will also be shutting down nonessential workstations.

The following systems will be unavailable:

  • Scientific Computing Lab Workstations (including the shell.math.hmc.edu alias)
  • Clinic Lab Workstations
  • Faculty Workstations
  • Networked Printers

We expect that the servers and other equipment in our machine room (backed by UPSs and local generators) will continue to operate during the outage period. ponder.math.hmc.edu should be available for checking e-mail and other simple usage during the outage.

The testing is scheduled for completion at 4:00 PM. If all goes well, the affected systems should be back soon after that time. If there are problems, systems will either be back by 6:00 PM or will not be running until sometime tomorrow, Saturday, 2005 December 31.

Please check back here, http://www.math.hmc.edu/computing/blog/, for updates on system status.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

Fri Dec 23 00:19:15 PST 2005

Power Outage Scheduled

The Claremont Colleges have scheduled some power-related work during the holiday break. The outages on December 27 and 29 do not affect our systems, but the outage on December 30 affects the entire campus.

These outages are to replace the last of the "G&W boxes" that were responsible for the (unscheduled) power outage earlier this semester.

I'm checking with Theresa Potter to verify whether the outages will affect our servers. They will definitely affect our workstations, however.

Impact

At a minimum, power will be interrupted for significant periods of time between 12:00 PM (noon) and 4:00 PM (or later -- Theresa's message indicates that the end time is approximate). Because workstations in faculty offices, the Clinic lab, and the scientific-computing lab have short-run or no UPSs, they will not be available during this time period.

During this time, no workstations will be available, including individual office and lab machines and the shell.math.hmc.edu alias.

If machine-room power will be interrupted, mail, file, print, and web services will also be interrupted.

If machine-room power is maintained, people using POP or IMAP will be able to access their mail. Web service should also continue to be available, as should ponder.math.hmc.edu.

Timeframe

I will shut down any workstations (including faculty, Clinic, and scientific-computing lab machines) that are running on Friday morning (around 10:00 AM).

If it turns out that the outage will affect the servers, I will arrange to monitor them in person or remotely and shut some or all of them down if that seems to be required.

If I'm not already on campus to deal with things when the outage ends, I will come by in the evening to check on the situation and (possibly) restart machines. Given the previous record on electrical work requiring power outages, I am not expecting it to end at the scheduled time. If power is not restored until sometime after early evening on Friday, I will come in and try to restart machines on Saturday.

Check back here for updates and notice of restored service.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

Wed Nov 30 17:38:14 PST 2005

Maple Updated to 10.02 (and x86_64 Support)

I have updated the version of Maple installed on our server to 10.02. It's set as the default, so just typing maple or xmaple should launch the latest version.

If you have problems, (1) please tell me, and then (2) run the previous version by specifying the full path to the maple or xmaple executables, as in /shared/local/maple10.01/bin/{maple|xmaple}.

I noticed that there was support for the AMD 64/x86_64 64-bit processors in the update, but found that I didn't have the original installation media for the 64-bit version of Maple. I got a copy from CIS, so we now have both 32-bit and 64-bit versions of Maple available for your use (assuming you're using one of the 64-bit workstations the department has, of course).

To be honest, I'm not sure what having the 64-bit version buys you, as Maple is a symbolic math application rather than a major number cruncher, but 64-bits must be cooler than 32-bits, right?


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

Thu Nov 24 22:49:24 PST 2005

Server Reboot Successful

I came in today and rebooted our main server, esme. As I had expected, the home partitions needed checking. Once that process had finished, however, the machine came back up and was running just fine.

I was able to move the new tape library onto esme and verify that it works. Very cool.

While I was working with the machine, I took the opportunity to update various firmware packages (BIOS, SCSI RAID, etc.). As far as I can tell, those updates worked fine, too.

I rebooted the scientific-computing laboratory machines. Faculty and Clinic workstations should probably also be rebooted; I will look at rebooting the Clinic machines over the next couple of days. Faculty should reboot their machines sometime next week (ideally when I'm in the office, just in case there are any issues).

Thanks for everyone's patience and cooperation. As usual, if you notice any problems with the systems, please send mail to system@math.hmc.edu describing the problems you're having.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

11.23.2005 10:38

Server Reboots Over Thanksgiving Holiday

I will be rebooting the department's server systems over the Thanksgiving holiday. Exactly when, I'm not sure, but our main server, which provides file, print, mail, and some other services, hasn't been rebooted in over 200 days. As there's been a major update (from CentOS 3.5 to CentOS 3.6) during that period, we're more than due for a reboot.

If you were planning on running processes over the Thanksgiving holiday, please contact me immediately. As of right now, no one has spoken to me about any such processes (which, you'll recall, is a requirement of the department's long-jobs policy), so I'm assuming it's safe for me to reboot the systems whenever it's most convenient for me to do so.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

11.16.2005 14:20

New Tape Library Arrives!

This summer, we learned about a matching grant program run by IBM. Mudders who had gone on to work at IBM had donated money to be given to Mudd, with IBM matching those funds. Altogether, the donation was around $40,000.

After the department chairs hashed out who would get how much of the pool of funds, the mathematics department opted for a 3581 Tape Autoloader, a device that contains a single Ultrium LTO 3 drive and a robotic tape carousel that can hold eight Ultrium 3 tapes.

Each Ultrium 3 tape can hold 400 GB of data uncompressed, or up to 800 GB of data if compression is used. Our current backup system, which uses DLT IV tape, can hold 40 GB uncompressed, 80 GB compressed per tape, so the new system represents a tenfold increase in capacity.

The eight-cartridge carousel also means less tape changing -- the system can be set up to cycle through the tapes in order or to select particular tapes based on the ``slot'' in which they're loaded.

I'm currently in the process of testing the new tape system. It's installed in our rack, but actually getting the servers to talk to it and make it do what we want is going to require a bit of fiddling. I hope to have it online by next semester.

This new tape library clears the way for increasing our disk-space capacity. Now that we can back up larger amounts of data, we can start working toward obtaining additional disk space, knowing that we will be able to protect that data.

This IBM 3581 Tape Autoloader, with rack-mount kit and a SCSI cable, sells for $9,293. We would like to say ``Thank you, IBM,'' and especially to thank the Mudders now working there for contributing to this fund.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

11.14.2005 10:25

End of Support for PHP

I am in the process of converting the only system pages that make use of PHP to a form that does not use PHP. Once that conversion is complete (probably by the end of the day on Monday, 2005 November 14), I will be removing all support for the use of PHP on the department's web server(s).

PHP is a server-side programming language that allows developers to write web pages with computer code embedded in them. It is widely used in the hobbyist market for writing web log, bulletin board, and forum-type applications. Unfortunately, PHP appears to be insecure by design, as numerous security holes continue to be found in the core PHP Apache module even though the system is about ten years old and has undergone several major rewrites and reimplementations.

Note that I am not speaking of insecure code written in PHP -- such buggy code is trivial to produce in any language. But we are still seeing numerous flaws in the Apache module that implements the core language itself. Such flaws can open up the entire server to attack, and the risks are greater than the benefits.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Website

11.11.2005 10:43

Unexpected Power Outage in SciCompLab

As I've mentioned before, work is underway to replace the "brains" of the air conditioning system in the department's machine room and hook it into the general HVAC monitoring system.

Among the other things we keep in that room is a magic button that cuts the power for the servers in that room, a relic of the days when water-cooled mainframe computers might need to be shut down all at once to prevent electrocution.

These days, of course, our systems are air cooled. And they're all on UPS power, which means that hitting the panic button just switches them onto battery power. But the button is still there, waiting to be pressed....

Which is what happened this morning. The Physical Plant folks working on the air conditioning accidentally triggered the power cutoff. To make matters worse (and more confusing), the cutoff doesn't just affect the power in our machine room, but also the power in the scientific-computing lab, the publications room, and, I believe, at least one or two of the biology labs nearby.

Our servers, with their UPSs, were fine. But any jobs that were running on the scientific-computing lab machines were stopped when the power went out and the machines crashed. The machines rebooted, as they were set to, but they didn't restart your jobs -- you'll have to restart them yourselves.

Before you do that, however, I encourage you to review the department's policy on long jobs. Let me summarize for you: You're not supposed to leave processes running when you're not sitting in front of a machine unless you check with me first. The lab machines are meant for use by people sitting in front of them first, with people logging in remotely to run interactive jobs next. Long, unattended jobs should be run so that they don't dominate the processing power of the machine when someone is trying to do things at the console.

That means that you should

  1. Tell me that you have a job that needs to run for a long period of time.
  2. Run your job with the nice command, as in

    nice -n 19 your_process_name

Ideally, you should also write your code so that it periodically writes out its status and results, and can resume by reading in that information and starting from where it left off. Writing such code is a bit more difficult, but it might save you from having to redo hours of computations when the power fails, someone reboots the machine because its running too slowly, or other unforeseen events stop your job from running.

If you're still looking for reasons to tell me about your long jobs, let me point out that I routinely update packages on the lab machines for security issues, and some of those updates require a reboot to take effect. If I don't know your job is running, I might reboot the machine without checking with you first. Letting me know means that I can maintain a list of machines to avoid rebooting without notice.

Please remember that the lab machines are a shared resource, and sharing requires that everyone using them behave responsibly and respect the other users.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

11.02.2005 15:01

A/C Work in Progress

It turns out that there are some significant issues with the air-conditioning unit in the mathematics department machine room. They're being looked into by F&M and the CUC Physical Plant HVAC people.

There should be no disruption of services, but should it become necessary for something major to be done, I will let people know as far in advance as I can. I would also hope that we could arrange for any significant disruption to occur on a weekend or over a break period.

Thanks for your patience during this work.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

11.02.2005 14:54

Amber Cluster Move Complete

The Amber cluster has been successfully moved into its new home. Tim and I will probably be doing some additional shuffling around over the next few weeks or months, but we should be able to either make those disruptions short enough as to be unnoticeable or announce the disruptions in advance.

There may still be some issues that users might notice that I'm not seeing; if you have any issues, please report them to system@math.hmc.edu.

Thanks for your patience and cooperation!


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

11.01.2005 11:19

Amber Cluster Move Scheduled

The mathematics and computer-science departments' Beowulf cluster, Amber, is going to be moving from the mathematics department's machine room to the much more commodious CS machine room.

We will be moving the cluster sometime tomorrow, Wednesday, 2005 November 2.

If all goes perfectly, the cluster move will be simple and quick. If things get a bit more complicated, we will have to disassemble and reassemble the cluster, which means disconnecting sixteen computers (power & Ethernet), moving them in groups of three or four, then reconnecting everything in the new location, which will require at least an hour, maybe longer.

To make the process as easy as possible, we're asking that anyone who is actively using the cluster stop their work by 10:00 AM on Wednesday. We will post here when the cluster is back up.

(People who are authorized to use the Amber cluster have already received e-mail messages at their math addresses with this information, and will also receive a message when the cluster is running again.)

The Amber cluster has sixteen Dell PowerEdge 400SC nodes, each with a 2.8 GHz Pentium 4 processor and 1 or 1.25 GiB of RAM. The nodes communicate over a gigabit Ethernet switch. The cluster is running CentOS 3 with various additional cluster-related software packages (notably LAM/MPI). Use of the cluster is limited to faculty, students, and staff of the colleges who are doing computationally intensive research, especially research that requires or can take advantage of parallel-computing techniques.

Amber cluster nodes were purchased with funds from several CS faculty members. Systems integration and support is provided by the mathematics department.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

10.11.2005 13:06

Fixed Java Path for {,t}csh Users

In the process of cleaning up the path for most users, I inadvertantly left most people using the java and similar scripts installed by the libgcj package. As those scripts don't actually do anything, that situation wasn't ideal. ;-)

I've added some code to the global.tcshrc and global.cshrc files, so you should now have /shared/local/java/bin added to your path if you're using the tcsh. (If you're not sure what I'm talking about here, then you are using the tcsh and shouldn't have to do anything.) If you're using another shell, you're already having to massage your path; you're welcome to take a look at the global.tcshrc file (which is in ~setup/global.tcshrc) to see how my code works.

The longer term solution, I think, is to figure out how to add the /shared/local/java binaries to the alternatives system such that they're used in preference to the libgcj scripts, but have a lower priority (and are replaced by) the binaries from a locally installed Java package.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

10.03.2005 10:13

A/C Working, Cluster Back

Late last week F&M was able to take a look at our machine-room air conditioning. It turned out that there was a loose wire in the thermostat that was periodically breaking contact and resetting the system. (At a guess, it's possible that as the room cooled down, the wire contracted and broke contact. Once the room warmed up again, the wire expanded and the system worked again.)

Whatever the exact details were, the air conditioning is now running again, and I have restarted the Amber cluster. Please let me know if you have any issues with the cluster.

In related news, I have swapped out the thermally compromised drive from our backup array with a new drive I'd purchased for that purpose a few months ago. The array is now working as expected, as is our disk-based backup system.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

09.23.2005 12:13

Air Conditioning Problems Continue

The air conditioning unit for our machine room is continuing to have problems. I have entered the room twice and found the controller flashing OFF. The buttons on the controller don't seem to work, and I have to power off the whole system before the controller responds again and the air conditioner runs.

I have reported the problem to F&M, but until they can fix it, I will have to keep the Amber cluster offline.

Sorry for any inconvenience.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

09.20.2005 11:32

Power Outage Effects

Yesterday's power outage was fun for all, but it's had some negative effects on our computing services.

Our servers are all supplied with power through Uninterruptible Power Supplies -- big batteries, basically. The most important servers (the ones that provide file, print, e-mail, and web service) actually have two power supplies, each of which has a UPS that is, in turn, connected to a different power source. One of those sources has a local generator, so when the power went out, the servers first went on UPS, then were able to run off the generator power.

Unfortunately, our air conditioning unit for our server room, while separate from the Libra complex's air conditioning, does rely on actual power. And all the air conditioning systems share a set of chillers and other support equipment. That equipment is the source of the current air conditioning outage, and that is affecting our machine room.

The primary effect you may have noticed is that our Amber Beowulf cluster is offline. Sixteen machines generate a lot of heat, and therefore the Amber cluster will be offline until air conditioning is restored.

A secondary effect is that our disk-based backup system is suffering a thermal-related issue with one of the drives in the array. I first noticed the problem just before the power outage -- the air conditioning in the machine room wasn't working, and the machine had gotten hot enough for the drive to seize up. I was able to cool the system down and get the array rebuilding, but then we had the power failure followed by the air conditioning failure, so our disk-based backups will be offline until such time as the temperature drops, as well.

Note that we also do regular tape backups, so we do still have backups, it's just that they are slightly less current and much less convenient. Please try not to delete anything important until we have A/C back!

Personal UPS Units and Power Outages

Faculty and staff (mostly) have small UPS units for their desktop machines. Those UPSs are meant to smooth the transition between line power and generator power during power outages and not to allow you to continue working through a significant power outage. These UPS units will not keep a typical desktop system running for more than about 5-15 minutes at the longest.

Because yesterday's power outage occurred on campus, the disruption prevented both line and generator power from being distributed. Thus your UPSs may have been run down to the point that your machines shutdown (or crashed when the power stopped).

You should save your work as soon as a power outage occurs. If the power isn't back after about three minutes, you should log out and shut your machine down manually.

Once the power is back, the UPS batteries will begin charging again. It should be safe to work with your machine once the power is back, but be aware that if additional power outages occur while the battery is charging, your machine will have less runtime than it did when the battery was fully charged.


Posted by Claire Connelly | Permalink | Categories: System Maintenance

09.15.2005 12:20

Maple Updated to 10.01

I have updated our network install of Maple to 10.01, as mentioned in a previous entry.

I also have updates for standalone copies, so if you have one and you haven't updated by choosing the ``Check for Updates...'' option from the Tools menu, you can download the updates and apply them manually using the links in the previous entry.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

09.02.2005 15:26

Updated MATLAB to R14, Service Pack 3

Just in time for the new semester, both MathWorks, makers of MATLAB, and MapleSoft, makers of Maple, came out with new updates for their products.

MATLAB

Details about Service Pack 3 are available. The network install has been updated; if you type matlab to start MATLAB on a math Linux system, you'll get the service pack 3 version.

I haven't yet figured out what the best way of distributing updates to locally installed copies of MATLAB is; although it sounds like we're going to basically need to reinstall the app on each machine.

See our MATLAB support page for information and ways to run different versions (including the classroom or research licenses or older versions).

Maple

Details on version 10.01.

As of this writing, Maplesoft only has updates for the single-user version of Maple. If you have Maple 10 installed on your system and you would like to update it yourself, you can download the updates from our site (this link will only work for machines with an hmc.edu address) or get the updates direct from Maplesoft.

We will be updating our network install of Maple as soon as the media are available. Check back here for updates.

Our Maple support page may have additional information you might find interesting or useful.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

08.30.2005 18:30

Firefox, Thunderbird, and Nvu

Bringing us along into 2005, I have installed Firefox, the official Mozilla standalone web browser; Thunderbird, the Mozilla project's standalone mail client; and Nvu, the Linspire web-page editor, which happens to be based on Mozilla code.

All of these programs are installed in the /shared/local partition, and should be usable from any math department Linux system by simply typing firefox, thunderbird, or nvu, respectively.

If you want to add an icon to your GNOME Panel or KDE Kicker, please do so. Icons are hiding in sneakily named icons directories inside the installation directories in /shared/local.

I would strongly encourage you to consider using Thunderbird with our IMAP server, imap.math.hmc.edu, which will allow you to read mail with Thunderbird while you're at a machine in your office or one of the labs, but also read mail from a text-based mail client if you're so inclined, and read mail using an IMAP mail client from home.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

08.22.2005 16:17

wuffles is printable

You should be able to print to wuffles now using math department Linux systems, Macintoshes with Mac OS X, and Windows machines.

Please see my earlier message for details on the name and IP address of the printer. Drivers are available from our wuffles page.

Note for Linuxy types trying to set things up on their own: I haven't been able to get the copier to behave using straight CUPS and the PPD file; it seems to work just fine when I install the BrightQ drivers available from canon.codehost.com. I still want to get the thing working without the additional software, but in lieu of the looming start of the semester, I'm tabling it 'til later.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Printers

08.17.2005 19:14

Support Information, Drivers, Available for New Copier

I have added support information, including drivers and some of the secondary applications (e.g., scanner-interface software) for the new Canon imageRunner 8070 copier to the department's computing website.

The copier will be called wuffles, at least on the math network, and has the IP address 134.173.34.138 for those of you playing from home.

wuffles will, we hope, be up and running on Friday.

Enjoy!


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Printers

08.12.2005 11:41

New Copier Coming...

You may have heard (or seen, as it's in the hallway) that we're getting a new networked copier. The new machine is a Canon imageRunner 8070, and will be replacing our existing imageRunner 5000, fluffy.

The new copier has several major improvements over the old model, including

  1. Much faster -- 80 ppm instead of 50 ppm
  2. Higher resolution -- 2400x600 dpi instead of 600 dpi
  3. More stapled sets -- 100 stapled sets (or 1000 pages) instead of 30 sets (or 1000 pages)
  4. Thicker stapled documents -- 100 pages instead of 50
  5. Better Mac support! (Including a Mac port of Command Workstation and scanner software; alas, they're Java.)
  6. Hole-punching on-the-fly, eliminating the expense of buying prepunched paper and problems feeding it through the machine.
  7. No longer a departmental copier; this one is a ``production'' copier. (I think that basically means that if we spent a bunch more money we could get several additional bits that would allow it to produce neatly trimmed booklets with heavier, colored covers; or to generate form letters ready for stuffing and mailing after being folded, stapled, and so forth. We don't have any of that stuff, of course.)

From the manual, it sounds like it could potentially do a whole slew of additional things, some of which might actually be useful, however, we apparently haven't actually paid to turn any of that additional functionality on. We won't know for absolute certain what we have and what we don't until we can plug the thing in and get it running.

The downside of the newer, faster, stronger model is that it uses more electricity. As a result, we will need to have the electrical socket rewired. As fluffy also uses more juice than your average household incinerator, and there isn't room for two crazy sockets in the same box, we will have to take the old copier offline, then have the electrical work done, then have Canon come out and assemble and configure the new copier. That pretty much guarantees a downtime of a day or two while we coordinate several teams of workers. Oh, and Canon shipped us (or ordered us) the wrong finishing unit, so we can't really go ahead until we have the right one anyway.

You'll hear more when we know it -- in the meantime, I am in the process of assembling webpages with pointers to the software that you'll need to use the new copier. I'll announce that here once it's in place.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Printers

07.21.2005 17:49

New Software: Maple 10

Apparently the problems I was having getting Maple 10 working on the cluster were related to the problems with getting Maple 10 running on other machines. But they're sorted out now, so I have made Maple 10 the default Maple installation on the mathematics cluster!

Maple 10's interface has changed dramatically. Not surprisingly, it has lots of new features (and probably some new bugs, too). So I am keeping Maple 9.5 around for a while.

Running maple or xmaple will now launch Maple 10. If you need to run the older version, you can do so by typing the full path to the command, as in

/shared/local/maple9.5/bin/{,x}maple

or by adding the old path to your PATH environment variable using one of the following methods:

setenv PATH /shared/local/maple9.5/bin:$PATH (for tcsh, csh)
or
set PATH=/shared/local/maple9.5/bin:$PATH (for bash, zsh, sh, etc.)

I expect that Maple 9.5 will be removed sometime before the end of the fall semester or whenever CIS's license server stops working for Maple 9.5.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Linux

07.21.2005 10:17

Chiller Back, Amber, Too

The chiller is back up, which means that we have air conditioning in the machine room again, so I've restarted the Amber cluster.

Sorry for the disruption in service, but some things are out of my hands. The servers have to come first.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

07.21.2005 09:27

Amber Cluster Offline Due to Air Conditioning Issues

Tom Shaffer, the college's plant engineer, has informed us that the separate air conditioning system that supplies cooling for various labs and computer machine rooms is offline. As a result, I have taken down the Amber cluster until air conditioning is restored.

I may to take some additional servers offline in the near future, but we'll keep our fingers crossed that it won't come to that.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

07.19.2005 17:03

Amber Cluster Move, Part I

Some tiny number of you may have noticed that the Amber cluster was off line for a couple of hours. During that time, the cluster was completely dissassembled, stacked up in the hallway, and then moved to its new (temporary) location in the department's small machine room.

The cluster was moved because with the summer heat, my office was running around 85 - 90° Fahrenheit, which isn't healthy for people or computers. This move is temporary because the department's machine room is now completely filled up with machines, leaving little room for humans to move around and do any maintenance.

The plan is still for the cluster to move to the CS machine room by the end of the summer. It will remain on the mathematics department subnet, and will continue to be available to people who are in the amber group.

The old cluster will be retired; at this point I'm thinking that we will probably maintain the head node in some form so that people who haven't already done so can retrieve their data, but the rest of the machines will probably be stripped or scrapped outright. If you're in the market for a Pentium II machine, let me know and we might be able to hook you up.


Posted by Claire Connelly | Permalink | Categories: System Maintenance, Amber

06.08.2005 10:46

New Intel FORTRAN 90 and C++ Compilers Available

I have installed the latest versions of Intel's FORTRAN and C++ compilers for 32-bit architectures in /shared/local/intel.

You can use the compilers by running

For the Intel C++ compiler

/shared/local/intel/bin/iccvars.csh
/shared/local/intel/bin/iccvars.sh

For the Intel FORTRAN compiler

/shared/local/intel/bin/ifortvars.csh
/shared/local/intel/bin/ifortvars.sh

These commands set various environment variables (PATH, MANPATH, LD_LIBRARY_PATH, etc.) to include directories needed to run the compilers. Their effects end when you quit the shell you run them in (e.g., log out, close the terminal window). If you should find yourself using these compilers all the time, you can add the contents of these files to your own startup files.

The main advantages of the Intel compilers over the GNU Compiler Collection (gcc) compilers is that, in theory, Intel compilers take better advantage of the quirks in various CPU models. In practice, most code will not see a significant performance change when compiled with the Intel compilers, but there are exceptions. YMMV.

The Intel FORTRAN compiler also supports FORTRAN90 and FORTRAN95, whereas g77, the GNU FORTRAN compiler, only supports FORTRAN77 (as its name implies).

I have also installed the Intel Math Kernel Library, which provides mathematical functions optimized for use on Intel processors.

Documentation for these compilers and the Intel Math Kernel Library is available in /shared/local/intel/doc, and includes PDF and HTML manuals and training material.

Please note that our license for using these materials requires that they be used solely for noncommercial purposes. If you're planning to compile code that you hope to make money on, please use the standard GCC compilers or download your own Intel compilers. (Even better, don't do commercial work on our systems.)

Enjoy!


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

05.16.2005 18:11

Old Accounts Purged

I've just purged the system of accounts that were marked as expired as of 2004. The purge has gained us about 14 GB of space on the /home/students partition, and bits and pieces elsewhere.

If, by chance, I accidentally deleted an account that should still exist, please let me know as soon as possible. I can still restore such accounts from our disk or tape backups.

Faculty folks: If you end up working with a student whose account has been removed, we have a tape archive of the older accounts, so we can restore their contents if need be. There will be a delay, however, as I plan to store that tape in another physical location.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

05.05.2005 18:07

Stupid MATLAB Tricks

There's been a problem with the open file dialog in MATLAB ever since we upgraded to MATLAB 7. The problem manifests itself as follows: you click on File->Open from the menu bar or you click on the open icon on the tool bar. You get a file picker dialog. You move to the directory where your .mat file lives, then click it to select it and click open or simply double-click the file name. One of two things then happens: You get a dialog telling you ``File not Found'' or you get an error message similar to

java.lang.InterruptedException
at javax.swing.filechooser.FileSystemView.getFiles(Unknown Source)
at javax.swing.plaf.basic.BasicDirectoryModel$LoadFilesThread.run(Unknown Source)

It turns out that this problem is somehow triggered by the LANG environment variable (it looks like something to do with Unicode). There are a couple of workarounds:

  1. Type the full filename and path in the dialog
  2. Use the load or edit commands in the MATLAB Command Windows
  3. Navigate to your working directory (either before you start MATLAB, with cd in your terminal window or with the navigation buttons in the MATLAB Current Directory browser pane) and open files from the Current Directory browser
  4. Unset the LANG environment variable before starting MATLAB

I have replaced the link to the latest MATLAB binary in /shared/local/bin with a small script that unsets the LANG environment variable and then starts MATLAB. The change should be transparent to end users, but it should be possible to open files directly from the open file dialog with this change.

Note that if you start MATLAB in any way other than using the matlab in /shared/local/bin, this change won't help you. You can check to see what your shell thinks it should run when you type matlab by typing the following:

linux% which matlab

You should see

/shared/local/matlab/bin/matlab

If you don't, you can get the same effect by typing something similar to

For csh variants:

linux% ( unsetenv LANG ; /path/to/my/matlab ) &

For Bourne-shell variants:

linux$ ( unset LANG ; /path/to/my/matlab ) &

Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

05.05.2005 12:23

Server Burps

Last night around 5:30 we had some issue with the department's main server. They originally manifested as problems reading mail; investigation showed that there was something up with NIS, which caused problems logging in, lsing files, and so forth.

The server seemed to be thrashing badly; most of the systems resources appeared to be devoted to the kswapd daemon. ypserv was running, but not listening to any network ports.

After trying various less drastic means to try to get the system working properly (including dumping it down to single-user mode and then back to network-server level, which initially seemed to work but didn't last), I rebooted the server. When it came back up, it (of course) had to run a check on the various /home filesystems. As these total around 200 GB, this process took a considerable amount of time. Once the checking was complete, the server came back up and appears to be running normally at this time.

I had been thinking about scheduling a reboot for this system in the near future, after classes and exams were over, so actually having to reboot it wasn't the worst thing in the world. (It had been running for 186 days without a reboot.) I do, however, apologize for any inconvenience you may have experienced when the server was unavailable.

If you notice any problems, please let me know ASAP so that I can take a look at them and get them resolved as quickly as possible.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

04.21.2005 16:09

Change Sucks

There's probably no good time to upgrade your operating system, and I think that goes quadruple or more for a systems administrator. Suddenly your familiar working environment is completely different. Icons and menus have changed or are in different places. Some bits are missing. New functionality has replaced the old, familiar (working) functionality. Keys are remapped so they don't do the same things. Programs you used to have aren't there any more, because you don't have packages for this OS....

Of course, given a couple of days to concentrate, you could clear up the problem in no time. Just write that script to clean up old SRPMs and build shiny new RPMs you can install. Take the time to port your old configuration files over to the new system. Figure out where they moved things (and speculate on why). But a couple of days off are pretty rare in this biz, and when a user asks you a question, it's hard to say no. So you stumble along from issue to issue (A computer just died! No, two! Someone needs an application built! Someone else needs some technical advice on a paper they're writing! The printers aren't working! The mail system is broken!) and gradually piece your world back together.

All that is my way of saying, relax, I'll get back to you as soon as I can. As soon as I can get my editor work, the mail server to send mail, printers to print, TeX to TeX, and so on.... Just relax....


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

04.11.2005 09:33

End of Support for Red Hat 7.3

The mathematics department hasn't had any systems running Red Hat Linux 7.3 for almost a year now. Accordingly, we are announcing the end of support for Red Hat Linux 7.3, and we are removing packages built for Red Hat Linux 7.3 from our mirror server.

The removal of these obsolete packages will free up some space for supported systems and will also allow us to clean up our directory structure a bit.

Support for Red Hat Linux 9 and the Fedora Legacy packages for RHL 9 will continue until sometime this summer, when the last of our RHL 9 systems will be retired, replaced, or rebuilt.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

04.01.2005 12:40

gaspode -- our new color printer -- installed!

I have installed gaspode in the Math Workroom (Olin-1264). It should be available for general use as of this writing.

You can obtain drivers, instructions, and other useful information for using this printer from its new webpage.

I have also added similar pages for the other ``public'' printers. They're all accessible from our printing support page (which has been up for some time).

Enjoy!


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Printers

03.30.2005 16:20

New Color Printer Arrives!

We have taken delivery of a new color printer, gaspode, a Hewlett Packard Color LaserJet 5550dtn.

This printer was a gift from Hewlett Packard and its Hardcopy Technologies Lab's director, John Meyer. We would not have received this generous gift without the work of Professor Mike Raugh, our department's Clinic Director.

gaspode replaces winter, our Minolta-QMS magiColor 6100. The new printer is much faster (up to 27 pages per minute in black and white or color) and uses HP's imageRET technology to achieve resolutions of up to 3600 dpi.

Information on printing to the new printer is available on our math computing support website. (Please use this link, as this information is not yet tied into the site as a whole; I expect to create similar pages for each printer in the near future.)

Please thank Mike Raugh for obtaining this printer for the department.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Printers

03.23.2005 11:53

Updated Java Fixes Security Hole

I have updated the versions of Sun's Java Software Devlopment Kits (SDKs) to the latest versions -- 1.4.2_07 and 1.5.0_02. The permission-elevation problem in the 1.4.2 series is addressed in the 07 update.

The standard Java remains 1.4.2. To use Java 5 (really 1.5), you will have to run the binaries by typing their full pathnames or add the Java 5 directory to your PATH. I recommend that rather than using the release-specific directory, you use /shared/local/java5, which is a link that will be updated to point to the latest version installed on the system.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

03.22.2005 13:42

MATLAB R14SP2 Released

MathWorks has announced the release of Service Pack 2 for Release 14 of MATLAB. I will be installing it as soon as I get hold of the media, but in the meantime, you can read about the changes in this release.

Please note that I will be installing the new release in parallel with the existing release. To use it, you will need to specify the complete path to the new version of MATLAB on the command line, or add the new directory to your PATH before the old directory.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux

03.01.2005 13:29

Mathematica TEXMF Tree Available

I've added the TEXMF tree provided by Wolfram for use in compiling TeX documents exported from Mathematica to the system.

The files are located in /shared/local/share/Mathematica/texmf; to use them, modify your TEXMF environment variable by adding that directory. You will probably want something like

setenv TEXMF "$TEXMF, \!\!/shared/local/share/Mathematica/texmf" (*csh)
or
export TEXMF="$TEXMF, \!\!/shared/local/share/Mathematica/texmf" (Bourne/Korn variants)

which adds the files from the Mathematica TEXMF tree after the rest of the files in the standard TEXMF path.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, LaTeX

02.21.2005 17:39

New Fugu (1.1.2) Available

Those of you using the Fugu SCP/SFTP client for Mac OS X should update to the latest version of the program.

It's available from the upstream site or from yum.math.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Macintosh

02.21.2005 13:10

RHEL 4 Released; CentOS 4 Imminent

Red Hat released version 4 of their Red Hat Enterprise Linux products last week. RHEL 4 is based on Fedora Core, Red Hat's ``free'' distribution, and includes features such as GNOME 2.8, SELinux, and the 2.6 Linux kernel.

RHEL 4 also drops the Mozilla suite in favor of Firefox and Thunderbird, and changes a whole bunch of other stuff in ways I haven't yet discovered.

CentOS 4 will be coming out soon, incorporating these changes.

I have been running a release candidate of CentOS 4 on a machine in my office, and thus far my impression is that it has many shiny improvements over CentOS 3, but that the changes may cause some issues if they aren't handled carefully. I expect to install CentOS 4 on my workstation and run it for a while before making a decision about rolling the new version of the OS out onto desktops. (Among other things, there's a fair amount of locally built and deployed software that will need to be rebuilt, updated, or replaced before a rollout can happen.)

Exactly when we upgrade workstations to CentOS 4 is unclear at this time, although it's likely that the upgrade will happen this summer at the latest, and probably sooner than that for lab workstations.

I may update the Amber cluster sooner, to see whether the changes affect some problems that have been seen there. Our servers will remain on CentOS 3 until I can see clear evidence that updating them would add enough valuable features to be worthwhile.

As usual, if you have any questions or comments, please feel free to write to me at cmc@math.hmc.edu.


Posted by C.M. Connelly | Permalink | Categories: System Maintenance, Linux, Website