I ended up doing some fairly significant work in the machine
room Thursday afternoon and evening, which involved rewiring the
entire rack. In order to be sure that some of the systems were
working properly, I rebooted several of the machines in the rack,
including the department's main file server (gytha)
and our parallel compute server hex. As a result, some
workstations -- especially Mac OS X machines -- may be confused
about their NFS mounts. If you have problems logging in or if you
can log in but you can't access your home directory or applications
or other materials stored in /shared/local, please
reboot the machine and try again.
I'm about to go to bed, but I will be reachable at home or by cell tomorrow if there are any unforeseen issues.
As anticipated, the system ran fsck to check the
disks on most of the home-directory partitions, taking on the order
of half an hour to complete.
The partitions came up clean and the system rebooted. The tape drive is working again, and I am flushing the previous day's backups to tape.
We now resume normal service. However, if you come across any
problems, please let us know by sending e-mail to system@math.hmc.edu.
The department's main server will be rebooted Saturday afternoon to clear a stuck IO process.
Services affected will include
Length of Outage: Approximately one hour.
The SCSI driver for our tape drive is stuck in a low-level IO loop that can't be interrupted. As we can't use the tape drive until this process is cleared, and the only way to clear it is by rebooting, we need to reboot the server.
The minimum time for a reboot of this system is around ten minutes based on various hardware tests and initializations. The actual reboot will probably take longer, especially if the system needs to run checks on disk partitions, in which case the reboot time could extend to around forty minutes or so. Rebooting can also reveal unforeseen consequences of some configuration changes, which can add additional delays before all services are available.
I will send messages to all logged-in users about ten minutes before I start the reboot. If you happen to log in shortly after the reboot, don't expect that the system will remain up unless the department's system blog has been updated with a message stating that the system is back up and maintenance is complete.
As usual, we apologize for any inconvenience that this downtime will impose, but occasional maintenance is required to keep the system running.
I have completed the repairs to our primary server, and everything should be working as usual. If not, please let me know!
END-----We have a hardware problem with our main server,
esme, which requires me to take the machine offline in
order to replace some parts.
I will shut the server down at 2:00 PM tomorrow, Saturday, January 20. The work will either take about twenty minutes or will require much more extensive part swapping, which could take several hours. Please check this blog (which will remain available) for updates and notification about everything being back on line.
Because the problem is with our primary server, e-mail, logins, and printing will not be available during the outage. Home directories will also not be available, so class websites hosted out of professors' home directories (which is most of them) will also not be available during the outage. Our web server is a separate machine and will remain available, with all content not kept in home directories.
Sorry for the inconvenience and short notice; I've only just received the necessary parts.
END-----As announced in November, 2005, support for the use of PHP, a popular, but problematic, web-programming language, has now ceased.
Any pages that relied on the Apache PHP module being available will no longer render properly.
If the lack of PHP poses a problem for you, please let me know and we can look into alternatives that pose less of a security risk for our system.
I've updated the version of Firefox in
/shared/local to 1.5, which is the latest release.
You can run Firefox by typing firefox at a terminal
prompt or by creating a GNOME Panel launcher by right-clicking on a
panel, choosing Add to Panel, then choosing Custom Application
Launcher and filling in the fields in the dialog box that will
appear.
You can find a Firefox icon in
/shared/local/firefox/icons. The canonical path to the
application until such time as it is installed by default on
individual machines is /shared/local/firefox/firefox.
For most people (unless you've tinkered with your
PATH), just putting firefox in the
Command field will do the trick.
Among other improvements, Firefox 1.5 supports RSS, Atom, and other feed protocols in a much more convenient way than previous versions of Firefox did.
Despite the dramatic-sounding title of this entry, I expect that there will be little or no actual change in the way that the system works for about 99% of the affected users.
As another short-term way of dealing with the ongoing disk space
crisis on /home/faculty, I have migrated the emeritti
and former-faculty accounts that had their home directories in
/home/faculty to a new partition on the server.
Practically speaking, there should be no real impact from this change for anyone, even the people whose accounts were moved, as I have added links to preserve the appearance of the file system.
If your account has been moved (you can tell by logging in and
running pwd, which will tell you your present working
directory) and you notice some issues, or if you try to reach a
personal web resource (i.e., one that has a URL similar to
http://www.math.hmc.edu/~someaccount) that is
no longer available, please report the problem to me so I can track
it down and fix it.
If you're interested in the details of what was done, most of it is pretty visible, kind of like post-surgical scars.
/home/guests is now a link farm, with symbolic
links pointing to actual directories that are located in
/home/guests-one or /home/guests-two. The
account database has been set up so that home directories for
migrated accounts are in /home/guests (that is, they
point to the links that point to the real directories).
Because of the limitations of NFS, we now have to export
/home/guests, /home/guests-one, and
/home/guests-two, and mount all three of those shares
on each machine that is available for general use.
The original directories in /home/faculty have been
replaced with links that point to the directories in
/home/guests, so any web-related links will still
work.
Because of the links, everything should work as it always has. At some point down the road, however, I hope to be able to add some additional disk space, which will allow me to do some rejuggling of account locations. At that point I will probably try to clean up some of the remaining links to make everything neat and less complex.
With the removal of the links, scripts or other materials that refer to hard-coded, complete paths to your home directory or directories within your home directory may break. In other words, if you had a script that looked for files in your home directory and specified them as
/home/faculty/username/some/directory/or/file
(where username is your username), but
your physical home directory is now located in
/home/guests-two/username, and is referred to
by the system as /home/guests/username, you
will have problems when one or more of the links is removed or
changed.
If you're working with shell scripts, the best way to refer to
your home directory is with the environment variable
$HOME, which is pretty much guaranteed to resolve to
the correct answer no matter what shell you or your script use. For
many modern shells (and scripts written in those shell's language),
you can use the tilde (~) to refer to your home
directory, but $HOME is safer and more likely to work
no matter what. (You'll want to use ~ on the command
line, of course.)
This mess will be cleaned up after we've obtained more disk space, which is on the agenda for a departmental computing-committee meeting on Friday. I hope that we will be able to find the money to move quickly on that project, and that I will be able to put additional disk space online over spring break (March 10 - 19).
In the meantime, keeping an eye on your disk usage and avoiding excessive disk usage (which I would define as usage that's significantly more than others with home directories in your partition) is, and will always be, a good thing to do that will benefit everyone else all the time, and you when you have a sudden, short-term need for a larger amount of disk space.
Thanks for your cooperation.
Version 1.5 of Mozilla Thunderbird, the Mozilla Foundation's e-mail client, was released today.
I have installed it in /shared/local/thunderbird,
where it takes the place of the previous release (which was 1.0.7).
The old release will still be available in
/shared/local/thunderbird-1.0.7 for at least a couple
of weeks.
Please enjoy the new release, and let me know about any problems that you have with it.
The testing is complete. I have restarted all workstations and printers, so everything should be back to normal.
We did suffer one casualty, a faculty workstation whose power supply died. It's not clear that the problem was caused by the testing, but I have called the vendor for a replacement. (I have notified the machine's primary user by e-mail, so if you didn't get e-mail from me about your machine dying, your machine should be working -- please let me know if it isn't!)
Power-system and generator testing by the Claremont Colleges Physical Plant staff will affect all non-UPS-backed workstations on the mathematics department network. Because we cannot be certain about the duration of the outages, we will also be shutting down nonessential workstations.
The following systems will be unavailable:
shell.math.hmc.edu alias)We expect that the servers and other equipment in our machine
room (backed by UPSs and local generators) will continue to operate
during the outage period. ponder.math.hmc.edu should
be available for checking e-mail and other simple usage during the
outage.
The testing is scheduled for completion at 4:00 PM. If all goes well, the affected systems should be back soon after that time. If there are problems, systems will either be back by 6:00 PM or will not be running until sometime tomorrow, Saturday, 2005 December 31.
Please check back here, http://www.math.hmc.edu/computing/blog/,
for updates on system status.
The Claremont Colleges have scheduled some power-related work during the holiday break. The outages on December 27 and 29 do not affect our systems, but the outage on December 30 affects the entire campus.
These outages are to replace the last of the "G&W boxes" that were responsible for the (unscheduled) power outage earlier this semester.
I'm checking with Theresa Potter to verify whether the outages will affect our servers. They will definitely affect our workstations, however.
At a minimum, power will be interrupted for significant periods of time between 12:00 PM (noon) and 4:00 PM (or later -- Theresa's message indicates that the end time is approximate). Because workstations in faculty offices, the Clinic lab, and the scientific-computing lab have short-run or no UPSs, they will not be available during this time period.
During this time, no workstations will be available, including
individual office and lab machines and the
shell.math.hmc.edu alias.
If machine-room power will be interrupted, mail, file, print, and web services will also be interrupted.
If machine-room power is maintained, people using POP or IMAP
will be able to access their mail. Web service should also continue
to be available, as should ponder.math.hmc.edu.
I will shut down any workstations (including faculty, Clinic, and scientific-computing lab machines) that are running on Friday morning (around 10:00 AM).
If it turns out that the outage will affect the servers, I will arrange to monitor them in person or remotely and shut some or all of them down if that seems to be required.
If I'm not already on campus to deal with things when the outage ends, I will come by in the evening to check on the situation and (possibly) restart machines. Given the previous record on electrical work requiring power outages, I am not expecting it to end at the scheduled time. If power is not restored until sometime after early evening on Friday, I will come in and try to restart machines on Saturday.
Check back here for updates and notice of restored service.
I have updated the version of Maple installed on our server to
10.02. It's set as the default, so just typing maple
or xmaple should launch the latest version.
If you have problems, (1) please
tell me, and then (2) run the previous version by specifying
the full path to the maple or xmaple
executables, as in
/shared/local/maple10.01/bin/{maple|xmaple}.
I noticed that there was support for the AMD 64/x86_64 64-bit processors in the update, but found that I didn't have the original installation media for the 64-bit version of Maple. I got a copy from CIS, so we now have both 32-bit and 64-bit versions of Maple available for your use (assuming you're using one of the 64-bit workstations the department has, of course).
To be honest, I'm not sure what having the 64-bit version buys you, as Maple is a symbolic math application rather than a major number cruncher, but 64-bits must be cooler than 32-bits, right?
I came in today and rebooted our main server, esme.
As I had expected, the home partitions needed checking. Once that
process had finished, however, the machine came back up and was
running just fine.
I was able to move the new tape library onto esme
and verify that it works. Very cool.
While I was working with the machine, I took the opportunity to update various firmware packages (BIOS, SCSI RAID, etc.). As far as I can tell, those updates worked fine, too.
I rebooted the scientific-computing laboratory machines. Faculty and Clinic workstations should probably also be rebooted; I will look at rebooting the Clinic machines over the next couple of days. Faculty should reboot their machines sometime next week (ideally when I'm in the office, just in case there are any issues).
Thanks for everyone's patience and cooperation. As usual, if you
notice any problems with the systems, please send mail to
system@math.hmc.edu describing the problems you're
having.
I will be rebooting the department's server systems over the Thanksgiving holiday. Exactly when, I'm not sure, but our main server, which provides file, print, mail, and some other services, hasn't been rebooted in over 200 days. As there's been a major update (from CentOS 3.5 to CentOS 3.6) during that period, we're more than due for a reboot.
If you were planning on running processes over the Thanksgiving holiday, please contact me immediately. As of right now, no one has spoken to me about any such processes (which, you'll recall, is a requirement of the department's long-jobs policy), so I'm assuming it's safe for me to reboot the systems whenever it's most convenient for me to do so.
This summer, we learned about a matching grant program run by IBM. Mudders who had gone on to work at IBM had donated money to be given to Mudd, with IBM matching those funds. Altogether, the donation was around $40,000.
After the department chairs hashed out who would get how much of the pool of funds, the mathematics department opted for a 3581 Tape Autoloader, a device that contains a single Ultrium LTO 3 drive and a robotic tape carousel that can hold eight Ultrium 3 tapes.
Each Ultrium 3 tape can hold 400 GB of data uncompressed, or up to 800 GB of data if compression is used. Our current backup system, which uses DLT IV tape, can hold 40 GB uncompressed, 80 GB compressed per tape, so the new system represents a tenfold increase in capacity.
The eight-cartridge carousel also means less tape changing -- the system can be set up to cycle through the tapes in order or to select particular tapes based on the ``slot'' in which they're loaded.
I'm currently in the process of testing the new tape system. It's installed in our rack, but actually getting the servers to talk to it and make it do what we want is going to require a bit of fiddling. I hope to have it online by next semester.
This new tape library clears the way for increasing our disk-space capacity. Now that we can back up larger amounts of data, we can start working toward obtaining additional disk space, knowing that we will be able to protect that data.
This IBM 3581 Tape Autoloader, with rack-mount kit and a SCSI cable, sells for $9,293. We would like to say ``Thank you, IBM,'' and especially to thank the Mudders now working there for contributing to this fund.
I am in the process of converting the only system pages that make use of PHP to a form that does not use PHP. Once that conversion is complete (probably by the end of the day on Monday, 2005 November 14), I will be removing all support for the use of PHP on the department's web server(s).
PHP is a server-side programming language that allows developers to write web pages with computer code embedded in them. It is widely used in the hobbyist market for writing web log, bulletin board, and forum-type applications. Unfortunately, PHP appears to be insecure by design, as numerous security holes continue to be found in the core PHP Apache module even though the system is about ten years old and has undergone several major rewrites and reimplementations.
Note that I am not speaking of insecure code written in PHP -- such buggy code is trivial to produce in any language. But we are still seeing numerous flaws in the Apache module that implements the core language itself. Such flaws can open up the entire server to attack, and the risks are greater than the benefits.
As I've mentioned before, work is underway to replace the "brains" of the air conditioning system in the department's machine room and hook it into the general HVAC monitoring system.
Among the other things we keep in that room is a magic button that cuts the power for the servers in that room, a relic of the days when water-cooled mainframe computers might need to be shut down all at once to prevent electrocution.
These days, of course, our systems are air cooled. And they're all on UPS power, which means that hitting the panic button just switches them onto battery power. But the button is still there, waiting to be pressed....
Which is what happened this morning. The Physical Plant folks working on the air conditioning accidentally triggered the power cutoff. To make matters worse (and more confusing), the cutoff doesn't just affect the power in our machine room, but also the power in the scientific-computing lab, the publications room, and, I believe, at least one or two of the biology labs nearby.
Our servers, with their UPSs, were fine. But any jobs that were running on the scientific-computing lab machines were stopped when the power went out and the machines crashed. The machines rebooted, as they were set to, but they didn't restart your jobs -- you'll have to restart them yourselves.
Before you do that, however, I encourage you to review the department's policy on long jobs. Let me summarize for you: You're not supposed to leave processes running when you're not sitting in front of a machine unless you check with me first. The lab machines are meant for use by people sitting in front of them first, with people logging in remotely to run interactive jobs next. Long, unattended jobs should be run so that they don't dominate the processing power of the machine when someone is trying to do things at the console.
That means that you should
nice command, as in
nice -n 19
your_process_name
Ideally, you should also write your code so that it periodically writes out its status and results, and can resume by reading in that information and starting from where it left off. Writing such code is a bit more difficult, but it might save you from having to redo hours of computations when the power fails, someone reboots the machine because its running too slowly, or other unforeseen events stop your job from running.
If you're still looking for reasons to tell me about your long jobs, let me point out that I routinely update packages on the lab machines for security issues, and some of those updates require a reboot to take effect. If I don't know your job is running, I might reboot the machine without checking with you first. Letting me know means that I can maintain a list of machines to avoid rebooting without notice.
Please remember that the lab machines are a shared resource, and sharing requires that everyone using them behave responsibly and respect the other users.
It turns out that there are some significant issues with the air-conditioning unit in the mathematics department machine room. They're being looked into by F&M and the CUC Physical Plant HVAC people.
There should be no disruption of services, but should it become necessary for something major to be done, I will let people know as far in advance as I can. I would also hope that we could arrange for any significant disruption to occur on a weekend or over a break period.
Thanks for your patience during this work.
The Amber cluster has been successfully moved into its new home. Tim and I will probably be doing some additional shuffling around over the next few weeks or months, but we should be able to either make those disruptions short enough as to be unnoticeable or announce the disruptions in advance.
There may still be some issues that users might notice that I'm
not seeing; if you have any issues, please report them to
system@math.hmc.edu.
Thanks for your patience and cooperation!
The mathematics and computer-science departments' Beowulf cluster, Amber, is going to be moving from the mathematics department's machine room to the much more commodious CS machine room.
We will be moving the cluster sometime tomorrow, Wednesday, 2005 November 2.
If all goes perfectly, the cluster move will be simple and quick. If things get a bit more complicated, we will have to disassemble and reassemble the cluster, which means disconnecting sixteen computers (power & Ethernet), moving them in groups of three or four, then reconnecting everything in the new location, which will require at least an hour, maybe longer.
To make the process as easy as possible, we're asking that anyone who is actively using the cluster stop their work by 10:00 AM on Wednesday. We will post here when the cluster is back up.
(People who are authorized to use the Amber cluster have already received e-mail messages at their math addresses with this information, and will also receive a message when the cluster is running again.)
The Amber cluster has sixteen Dell PowerEdge 400SC nodes, each with a 2.8 GHz Pentium 4 processor and 1 or 1.25 GiB of RAM. The nodes communicate over a gigabit Ethernet switch. The cluster is running CentOS 3 with various additional cluster-related software packages (notably LAM/MPI). Use of the cluster is limited to faculty, students, and staff of the colleges who are doing computationally intensive research, especially research that requires or can take advantage of parallel-computing techniques.
Amber cluster nodes were purchased with funds from several CS faculty members. Systems integration and support is provided by the mathematics department.
In the process of cleaning up the path for most users, I
inadvertantly left most people using the java and
similar scripts installed by the libgcj package. As
those scripts don't actually do anything, that situation wasn't
ideal. ;-)
I've added some code to the global.tcshrc and
global.cshrc files, so you should now have
/shared/local/java/bin added to your path if you're
using the tcsh. (If you're not sure what I'm talking
about here, then you are using the tcsh and shouldn't
have to do anything.) If you're using another shell, you're already
having to massage your path; you're welcome to take a look at the
global.tcshrc file (which is in
~setup/global.tcshrc) to see how my code works.
The longer term solution, I think, is to figure out how to add
the /shared/local/java binaries to the alternatives
system such that they're used in preference to the
libgcj scripts, but have a lower priority (and are
replaced by) the binaries from a locally installed Java
package.
Late last week F&M was able to take a look at our machine-room air conditioning. It turned out that there was a loose wire in the thermostat that was periodically breaking contact and resetting the system. (At a guess, it's possible that as the room cooled down, the wire contracted and broke contact. Once the room warmed up again, the wire expanded and the system worked again.)
Whatever the exact details were, the air conditioning is now running again, and I have restarted the Amber cluster. Please let me know if you have any issues with the cluster.
In related news, I have swapped out the thermally compromised drive from our backup array with a new drive I'd purchased for that purpose a few months ago. The array is now working as expected, as is our disk-based backup system.
The air conditioning unit for our machine room is continuing to have problems. I have entered the room twice and found the controller flashing OFF. The buttons on the controller don't seem to work, and I have to power off the whole system before the controller responds again and the air conditioner runs.
I have reported the problem to F&M, but until they can fix it, I will have to keep the Amber cluster offline.
Sorry for any inconvenience.
Yesterday's power outage was fun for all, but it's had some negative effects on our computing services.
Our servers are all supplied with power through Uninterruptible Power Supplies -- big batteries, basically. The most important servers (the ones that provide file, print, e-mail, and web service) actually have two power supplies, each of which has a UPS that is, in turn, connected to a different power source. One of those sources has a local generator, so when the power went out, the servers first went on UPS, then were able to run off the generator power.
Unfortunately, our air conditioning unit for our server room, while separate from the Libra complex's air conditioning, does rely on actual power. And all the air conditioning systems share a set of chillers and other support equipment. That equipment is the source of the current air conditioning outage, and that is affecting our machine room.
The primary effect you may have noticed is that our Amber Beowulf cluster is offline. Sixteen machines generate a lot of heat, and therefore the Amber cluster will be offline until air conditioning is restored.
A secondary effect is that our disk-based backup system is suffering a thermal-related issue with one of the drives in the array. I first noticed the problem just before the power outage -- the air conditioning in the machine room wasn't working, and the machine had gotten hot enough for the drive to seize up. I was able to cool the system down and get the array rebuilding, but then we had the power failure followed by the air conditioning failure, so our disk-based backups will be offline until such time as the temperature drops, as well.
Note that we also do regular tape backups, so we do still have backups, it's just that they are slightly less current and much less convenient. Please try not to delete anything important until we have A/C back!
Faculty and staff (mostly) have small UPS units for their desktop machines. Those UPSs are meant to smooth the transition between line power and generator power during power outages and not to allow you to continue working through a significant power outage. These UPS units will not keep a typical desktop system running for more than about 5-15 minutes at the longest.
Because yesterday's power outage occurred on campus, the disruption prevented both line and generator power from being distributed. Thus your UPSs may have been run down to the point that your machines shutdown (or crashed when the power stopped).
You should save your work as soon as a power outage occurs. If the power isn't back after about three minutes, you should log out and shut your machine down manually.
Once the power is back, the UPS batteries will begin charging again. It should be safe to work with your machine once the power is back, but be aware that if additional power outages occur while the battery is charging, your machine will have less runtime than it did when the battery was fully charged.
I have updated our network install of Maple to 10.01, as mentioned in a previous entry.
I also have updates for standalone copies, so if you have one and you haven't updated by choosing the ``Check for Updates...'' option from the Tools menu, you can download the updates and apply them manually using the links in the previous entry.
Just in time for the new semester, both MathWorks, makers of MATLAB, and MapleSoft, makers of Maple, came out with new updates for their products.
Details about Service Pack 3 are available. The network install
has been updated; if you type matlab to start MATLAB
on a math Linux system, you'll get the service pack 3 version.
I haven't yet figured out what the best way of distributing updates to locally installed copies of MATLAB is; although it sounds like we're going to basically need to reinstall the app on each machine.
See our MATLAB support page for information and ways to run different versions (including the classroom or research licenses or older versions).
As of this writing, Maplesoft only has updates for the
single-user version of Maple. If you have Maple 10 installed on
your system and you would like to update it yourself, you can
download the
updates from our site (this link will only work for machines
with an hmc.edu address) or
get the updates direct from Maplesoft.
We will be updating our network install of Maple as soon as the media are available. Check back here for updates.
Our Maple support page may have additional information you might find interesting or useful.
Bringing us along into 2005, I have installed Firefox, the official Mozilla standalone web browser; Thunderbird, the Mozilla project's standalone mail client; and Nvu, the Linspire web-page editor, which happens to be based on Mozilla code.
All of these programs are installed in the
/shared/local partition, and should be usable from any
math department Linux system by simply typing firefox,
thunderbird, or nvu, respectively.
If you want to add an icon to your GNOME Panel or KDE Kicker,
please do so. Icons are hiding in sneakily named icons
directories inside the installation directories in
/shared/local.
I would strongly encourage you to consider using Thunderbird
with our IMAP server,
imap.math.hmc.edu, which will allow you to read mail
with Thunderbird while you're at a machine in your office or one of
the labs, but also read mail from a text-based mail client if
you're so inclined, and read mail using an IMAP mail client from
home.
You should be able to print to wuffles now using
math department Linux systems, Macintoshes with Mac OS X, and
Windows machines.
Please see
my earlier message for details on the name and IP address of
the printer. Drivers are available from our
wuffles page.
Note for Linuxy types trying to set things up on their own: I
haven't been able to get the copier to behave using straight CUPS
and the PPD file; it seems to work just fine when I install the
BrightQ drivers available from canon.codehost.com. I
still want to get the thing working without the additional
software, but in lieu of the looming start of the semester, I'm
tabling it 'til later.
I have added support information, including drivers and some of the secondary applications (e.g., scanner-interface software) for the new Canon imageRunner 8070 copier to the department's computing website.
The copier will be called wuffles, at least on the
math network, and has the IP address 134.173.34.138 for those of
you playing from home.
wuffles will, we hope, be up and running on
Friday.
Enjoy!
You may have heard (or seen, as it's in the hallway) that we're
getting a new networked copier. The new machine is a Canon
imageRunner 8070, and will be replacing our existing imageRunner
5000, fluffy.
The new copier has several major improvements over the old model, including
From the manual, it sounds like it could potentially do a whole slew of additional things, some of which might actually be useful, however, we apparently haven't actually paid to turn any of that additional functionality on. We won't know for absolute certain what we have and what we don't until we can plug the thing in and get it running.
The downside of the newer, faster, stronger model is that it
uses more electricity. As a result, we will need to have the
electrical socket rewired. As fluffy also uses more
juice than your average household incinerator, and there isn't room
for two crazy sockets in the same box, we will have to take the old
copier offline, then have the electrical work done, then have Canon
come out and assemble and configure the new copier. That pretty
much guarantees a downtime of a day or two while we coordinate
several teams of workers. Oh, and Canon shipped us (or ordered us)
the wrong finishing unit, so we can't really go ahead until we have
the right one anyway.
You'll hear more when we know it -- in the meantime, I am in the process of assembling webpages with pointers to the software that you'll need to use the new copier. I'll announce that here once it's in place.
Apparently the problems I was having getting Maple 10 working on the cluster were related to the problems with getting Maple 10 running on other machines. But they're sorted out now, so I have made Maple 10 the default Maple installation on the mathematics cluster!
Maple 10's interface has changed dramatically. Not surprisingly, it has lots of new features (and probably some new bugs, too). So I am keeping Maple 9.5 around for a while.
Running maple or xmaple will now
launch Maple 10. If you need to run the older version, you can do
so by typing the full path to the command, as in
/shared/local/maple9.5/bin/{,x}maple
or by adding the old path to your PATH environment
variable using one of the following methods:
setenv PATH /shared/local/maple9.5/bin:$PATH(fortcsh,csh)
or
set PATH=/shared/local/maple9.5/bin:$PATH(forbash,zsh,sh, etc.)
I expect that Maple 9.5 will be removed sometime before the end of the fall semester or whenever CIS's license server stops working for Maple 9.5.
The chiller is back up, which means that we have air conditioning in the machine room again, so I've restarted the Amber cluster.
Sorry for the disruption in service, but some things are out of my hands. The servers have to come first.
Tom Shaffer, the college's plant engineer, has informed us that the separate air conditioning system that supplies cooling for various labs and computer machine rooms is offline. As a result, I have taken down the Amber cluster until air conditioning is restored.
I may to take some additional servers offline in the near future, but we'll keep our fingers crossed that it won't come to that.
Some tiny number of you may have noticed that the Amber cluster was off line for a couple of hours. During that time, the cluster was completely dissassembled, stacked up in the hallway, and then moved to its new (temporary) location in the department's small machine room.
The cluster was moved because with the summer heat, my office was running around 85 - 90° Fahrenheit, which isn't healthy for people or computers. This move is temporary because the department's machine room is now completely filled up with machines, leaving little room for humans to move around and do any maintenance.
The plan is still for the cluster to move to the CS machine room by the end of the summer. It will remain on the mathematics department subnet, and will continue to be available to people who are in the amber group.
The old cluster will be retired; at this point I'm thinking that we will probably maintain the head node in some form so that people who haven't already done so can retrieve their data, but the rest of the machines will probably be stripped or scrapped outright. If you're in the market for a Pentium II machine, let me know and we might be able to hook you up.
I have installed the latest versions of Intel's FORTRAN and C++ compilers for
32-bit architectures in /shared/local/intel.
You can use the compilers by running
For the Intel C++ compiler
/shared/local/intel/bin/iccvars.csh /shared/local/intel/bin/iccvars.shFor the Intel FORTRAN compiler
/shared/local/intel/bin/ifortvars.csh /shared/local/intel/bin/ifortvars.sh
These commands set various environment variables
(PATH, MANPATH,
LD_LIBRARY_PATH, etc.) to include directories needed
to run the compilers. Their effects end when you quit the shell you
run them in (e.g., log out, close the terminal window). If you
should find yourself using these compilers all the time, you can
add the contents of these files to your own startup files.
The main advantages of the Intel compilers over the GNU Compiler Collection (gcc) compilers is that, in theory, Intel compilers take better advantage of the quirks in various CPU models. In practice, most code will not see a significant performance change when compiled with the Intel compilers, but there are exceptions. YMMV.
The Intel FORTRAN compiler also supports FORTRAN90 and
FORTRAN95, whereas g77, the GNU FORTRAN compiler, only
supports FORTRAN77 (as its name implies).
I have also installed the Intel Math Kernel Library, which provides mathematical functions optimized for use on Intel processors.
Documentation for these compilers and the Intel Math Kernel
Library is available in /shared/local/intel/doc, and
includes PDF and HTML manuals and training material.
Please note that our license for using these materials requires that they be used solely for noncommercial purposes. If you're planning to compile code that you hope to make money on, please use the standard GCC compilers or download your own Intel compilers. (Even better, don't do commercial work on our systems.)
Enjoy!
I've just purged the system of accounts that were marked as
expired as of 2004. The purge has gained us about 14 GB of space on
the /home/students partition, and bits and pieces
elsewhere.
If, by chance, I accidentally deleted an account that should still exist, please let me know as soon as possible. I can still restore such accounts from our disk or tape backups.
Faculty folks: If you end up working with a student whose account has been removed, we have a tape archive of the older accounts, so we can restore their contents if need be. There will be a delay, however, as I plan to store that tape in another physical location.
There's been a problem with the open file dialog in MATLAB ever
since we upgraded to MATLAB 7. The problem manifests itself as
follows: you click on File->Open from the menu bar or you click
on the open icon on the tool bar. You get a file picker dialog. You
move to the directory where your .mat file lives, then
click it to select it and click open or simply double-click the
file name. One of two things then happens: You get a dialog telling
you ``File not Found'' or you get an error message similar to
java.lang.InterruptedException at javax.swing.filechooser.FileSystemView.getFiles(Unknown Source) at javax.swing.plaf.basic.BasicDirectoryModel$LoadFilesThread.run(Unknown Source)
It turns out that this problem is somehow triggered by the
LANG environment variable (it looks like something to
do with Unicode). There are a couple of workarounds:
load or edit commands in the
MATLAB Command Windowscd in your terminal window or with the
navigation buttons in the MATLAB Current Directory browser pane)
and open files from the Current Directory browserLANG environment variable before
starting MATLABI have replaced the link to the latest MATLAB binary in
/shared/local/bin with a small script that unsets the
LANG environment variable and then starts MATLAB. The
change should be transparent to end users, but it should be
possible to open files directly from the open file dialog with this
change.
Note that if you start MATLAB in any way other than using the
matlab in /shared/local/bin, this change
won't help you. You can check to see what your shell thinks it
should run when you type matlab by typing the
following:
linux% which matlab
You should see
/shared/local/matlab/bin/matlab
If you don't, you can get the same effect by typing something similar to
For csh variants:
linux% ( unsetenv LANG ; /path/to/my/matlab ) &For Bourne-shell variants:
linux$ ( unset LANG ; /path/to/my/matlab ) &
Last night around 5:30 we had some issue with the department's
main server. They originally manifested as problems reading mail;
investigation showed that there was something up with
NIS, which
caused problems logging in, lsing files, and so
forth.
The server seemed to be thrashing badly; most of the systems
resources appeared to be devoted to the kswapd daemon.
ypserv was running, but not listening to any network
ports.
After trying various less drastic means to try to get the system
working properly (including dumping it down to single-user mode and
then back to network-server level, which initially seemed to work
but didn't last), I rebooted the server. When it came back up, it
(of course) had to run a check on the various /home
filesystems. As these total around 200 GB, this process took a
considerable amount of time. Once the checking was complete, the
server came back up and appears to be running normally at this
time.
I had been thinking about scheduling a reboot for this system in the near future, after classes and exams were over, so actually having to reboot it wasn't the worst thing in the world. (It had been running for 186 days without a reboot.) I do, however, apologize for any inconvenience you may have experienced when the server was unavailable.
If you notice any problems, please let me know ASAP so that I can take a look at them and get them resolved as quickly as possible.
There's probably no good time to upgrade your operating system, and I think that goes quadruple or more for a systems administrator. Suddenly your familiar working environment is completely different. Icons and menus have changed or are in different places. Some bits are missing. New functionality has replaced the old, familiar (working) functionality. Keys are remapped so they don't do the same things. Programs you used to have aren't there any more, because you don't have packages for this OS....
Of course, given a couple of days to concentrate, you could clear up the problem in no time. Just write that script to clean up old SRPMs and build shiny new RPMs you can install. Take the time to port your old configuration files over to the new system. Figure out where they moved things (and speculate on why). But a couple of days off are pretty rare in this biz, and when a user asks you a question, it's hard to say no. So you stumble along from issue to issue (A computer just died! No, two! Someone needs an application built! Someone else needs some technical advice on a paper they're writing! The printers aren't working! The mail system is broken!) and gradually piece your world back together.
All that is my way of saying, relax, I'll get back to you as soon as I can. As soon as I can get my editor work, the mail server to send mail, printers to print, TeX to TeX, and so on.... Just relax....
The mathematics department hasn't had any systems running Red Hat Linux 7.3 for almost a year now. Accordingly, we are announcing the end of support for Red Hat Linux 7.3, and we are removing packages built for Red Hat Linux 7.3 from our mirror server.
The removal of these obsolete packages will free up some space for supported systems and will also allow us to clean up our directory structure a bit.
Support for Red Hat Linux 9 and the Fedora Legacy packages for RHL 9 will continue until sometime this summer, when the last of our RHL 9 systems will be retired, replaced, or rebuilt.
I have installed gaspode in the Math Workroom
(Olin-1264). It should be available for general use as of this
writing.
You can obtain drivers, instructions, and other useful information for using this printer from its new webpage.
I have also added similar pages for the other ``public'' printers. They're all accessible from our printing support page (which has been up for some time).
Enjoy!
We have taken delivery of a new color printer,
gaspode, a
Hewlett Packard Color LaserJet 5550dtn.
This printer was a gift from Hewlett Packard and its Hardcopy Technologies Lab's director, John Meyer. We would not have received this generous gift without the work of Professor Mike Raugh, our department's Clinic Director.
gaspode replaces winter, our
Minolta-QMS magiColor 6100. The new printer is much faster (up to
27 pages per minute in black and white or color) and uses HP's
imageRET technology to achieve resolutions of up to 3600 dpi.
Information on printing to the new printer is available on our math computing support website. (Please use this link, as this information is not yet tied into the site as a whole; I expect to create similar pages for each printer in the near future.)
Please thank Mike Raugh for obtaining this printer for the department.
I have updated the versions of Sun's Java Software Devlopment Kits (SDKs) to the latest versions -- 1.4.2_07 and 1.5.0_02. The permission-elevation problem in the 1.4.2 series is addressed in the 07 update.
The standard Java remains 1.4.2. To use Java 5 (really 1.5), you
will have to run the binaries by typing their full pathnames or add
the Java 5 directory to your PATH. I recommend that
rather than using the release-specific directory, you use
/shared/local/java5, which is a link that will be
updated to point to the latest version installed on the system.
MathWorks has announced the release of Service Pack 2 for Release 14 of MATLAB. I will be installing it as soon as I get hold of the media, but in the meantime, you can read about the changes in this release.
Please note that I will be installing the new release in
parallel with the existing release. To use it, you will need to
specify the complete path to the new version of MATLAB on the
command line, or add the new directory to your PATH
before the old directory.
I've added the TEXMF tree provided by Wolfram for
use in compiling TeX documents exported from Mathematica to the
system.
The files are located in
/shared/local/share/Mathematica/texmf; to use them,
modify your TEXMF environment variable by adding that
directory. You will probably want something like
setenv TEXMF "$TEXMF, \!\!/shared/local/share/Mathematica/texmf"(*csh)
or
export TEXMF="$TEXMF, \!\!/shared/local/share/Mathematica/texmf"(Bourne/Korn variants)
which adds the files from the Mathematica TEXMF
tree after the rest of the files in the standard TEXMF
path.
Those of you using the Fugu SCP/SFTP client for Mac OS X should update to the latest version of the program.
It's available from the upstream
site or from yum.math.
Red Hat released version 4 of their Red Hat Enterprise Linux products last week. RHEL 4 is based on Fedora Core, Red Hat's ``free'' distribution, and includes features such as GNOME 2.8, SELinux, and the 2.6 Linux kernel.
RHEL 4 also drops the Mozilla suite in favor of Firefox and Thunderbird, and changes a whole bunch of other stuff in ways I haven't yet discovered.
CentOS 4 will be coming out soon, incorporating these changes.
I have been running a release candidate of CentOS 4 on a machine in my office, and thus far my impression is that it has many shiny improvements over CentOS 3, but that the changes may cause some issues if they aren't handled carefully. I expect to install CentOS 4 on my workstation and run it for a while before making a decision about rolling the new version of the OS out onto desktops. (Among other things, there's a fair amount of locally built and deployed software that will need to be rebuilt, updated, or replaced before a rollout can happen.)
Exactly when we upgrade workstations to CentOS 4 is unclear at this time, although it's likely that the upgrade will happen this summer at the latest, and probably sooner than that for lab workstations.
I may update the Amber cluster sooner, to see whether the changes affect some problems that have been seen there. Our servers will remain on CentOS 3 until I can see clear evidence that updating them would add enough valuable features to be worthwhile.
As usual, if you have any questions or comments, please feel
free to write to me at cmc@math.hmc.edu.
One of the MCM/ICM contest participants just brought a problem
with the shell startup scripts to my attention. The basic problem
was that there were no .tcshrc or .login
files in the home directories for accounts created since
mid-September, which, coincidentally, was when I upgraded our main
server from Red Hat Linux 9 to CentOS.
The result was that new accounts weren't sourcing the
global.tcshrc and global.login scripts,
which meant that their PATHs and other important
variables and aliases weren't being initialized properly.
I've fixed the problem. The longer term solution is to actually modify the default startup scripts on the machines, which is the ``right'' way to do things, and I will get around to doing that before too terribly long. I hope.
In the meantime, all should be back to normal. Remember, if you can't run standard software such as MATLAB, Maple, Java, and Acrobat Reader, please let me know. Making things work, and fixing them when they break, is why I'm here. But I don't use every tool available on the system, and I'm not always aware when things break.
To my amazement, New Egg
came through incredibly quickly and elijah and
ramandu are now back on line with shiny new video
cards and readable screens. Enjoy!