<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:admin="http://webns.net/mvcb/">
<channel>
<title>HMC Mathematics Computing Weblog</title>
<link>http://www.math.hmc.edu/computing/blog</link>
<description>HMC Math Computing Weblog</description>
<dc:language></dc:language>
<dc:creator>Claire Connelly</dc:creator>
<dc:date>2008-08-04T18:25:46-07:00</dc:date>
<admin:generatorAgent rdf:resource="http://nanoblogger.sourceforge.net" />
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/08/#e2008-08-04T18_25_29.txt</link>
<title>Mail Outages Resolved</title>
<dc:date>2008-08-04T18:25:29-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>An old process that was set up to analyze e-mail content ate up
all the space on the <code>/var</code> partition of our mail server,
which caused the server to stop accepting mail for delivery from
both internal and external clients.</p>

<p>I have removed all the components of that system, so we shouldn't
see this issue again.  If you do have problems with mail, however,
please let me know.  As problem with e-mail generally mean that
e-mailing me won't help, you might want to ask Suzanne or DruAnn to
contact me by phone.</p>

<p>I apologize for any inconvenience.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/06/#e2008-06-07T17_43_01.txt</link>
<title>Server Work Complete</title>
<dc:date>2008-06-07T17:43:01-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>At this point I'm done rebooting systems.  As far as I can tell
so far, everything went well.  We're now running the current kernels
on everything, and things seem to be working the way I would expect
them to work.  I will be keeping an eye on the systems to make sure
that nothing unusual starts happening; should you notice any
problems, I would very much appreciate knowing about them.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/06/#e2008-06-04T17_00_03.txt</link>
<title>CTAN and CentOS Mirrors Moved off yum.math.hmc.edu</title>
<dc:date>2008-06-04T17:00:03-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<dc:subject>mirror, News, LaTeX</dc:subject>
<description>
<![CDATA[<p>I have moved the current <a
href="http://www.centos.org/">CentOS</a> and <a
href="http://www.ctan.org/"><acronym title="Comprehensive TeX
Archive Network">CTAN</acronym></a> mirrors off
<code>yum.math.hmc.edu</code> and on to our new mirror server, <a
href="http://mirror.hmc.edu"><code>http://mirror.hmc.edu/</code></a>.</p>

<p>The new base URLs for these mirrors are</p>
<dl>
<dt>CentOS</dt>
<dd><a
        href="http://mirror.hmc.edu/centos/"><code>http://mirror.hmc.edu/centos/</code></a></dd>

<dt><acronym title="Comprehensive TeX                     
                    Archive Network">CTAN</acronym></dt>
<dd><a
        href="http://mirror.hmc.edu/ctan/"><code>http://mirror.hmc.edu/ctan/</code></a></dd>
</dl>

<p>For the time being, I have set up redirects so that attempts to
retrieve data from URLs pointing to <code>yum</code> will
automatically be redirected to <code>mirror.hmc.edu</code>.</p>

<p>At some point the machine that the old mirror server runs on may
be retired or repurposed, so if you're using our CentOS mirror to
update your machines or our CTAN mirror to download TeX-related
material, I would encourage you to change the URLs in your YUM
configuration files to point to the new location.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/06/#e2008-06-04T15_06_43.txt</link>
<title>Systems Work: Saturday, June 7, and Sunday, June 8</title>
<dc:date>2008-06-04T15:06:43-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<dc:subject>Mail, News, System Maintenance, Linux, Macintosh, Website, Amber</dc:subject>
<description>
<![CDATA[<h2>When</h2>

<p>I will be doing some systems work this weekend, June 7--8.</p>

<p>Work will probably begin around 11:00 AM on Saturday, June 7, and
will continue for several hours.  If necessary, additional work
may be done on Sunday, June 8, within a similar block of time.</p>


<h2>What Will Be Affected</h2>

<p>The work will disrupt most of our networked services, including
e-mail, file service, interactive sessions, and the web server for
periods of several minutes to an hour over the course of the work.</p>

<p>I also want to make sure that all of our Macs are running the
latest security updates, so will be updating these machines during
this time period as well.</p>


<h2>What You Should Do</h2>

<p>If you're using a Mac or Linux system that mounts file systems
from our servers, before you leave on Friday evening,</p>
<ul>
<li>Save all open files;</li>
<li>Close all applications;</li>
<li>Log out;</li>
<li>Leave your machine running.</li>
</ul>


<h2>Why</h2>

<p>This work is necessary for us to ensure the security and improve
the stability of the overall system.  In particular, I am hoping
that ongoing issues with our web server will be resolved as a
result of this work.</p>

<p>I will do my best to keep as much of the system functional as
possible for as much of the time as I can, but there will still be
some outages.</p>


<h2>Additional Background</h2>

<p>Last semester we had some serious issues with interactions between
the NFS support on our new file server and on our workstations and
older servers, exacerbated by the HVAC failure.  I was able to
stabilize things, but we still see some flaky behavior (especially
From the web server, which needs to be rebooted periodically).</p>

<p>On the Linux server side, I plan to update to the latest kernel
releases and do some experimentation to see if everything will
work together happily.  I will need to reboot various servers and
workstations an arbitrary number of times to explore all the
possible interactions.</p>

<p>For Macs, I will install the latest updates, most of which require
the machines to be rebooted.  As Tiger (Mac OS X 10.4) has
problems when an NFS server disappears and reappears, these
machines would need to be rebooted anyway.</p>


<h2>Comments/Problems/Other Issues</h2>

<p>As usual, if there are problems with the scheduling of this work,
requests or any other comments, please let me know.</p>


<h2>Updates/Status Reports</h2>

<p>As usual, updates on the status of the systems and progress
reports will be posted to the ``sysblog'', on our web server at</p>
<blockquote>
   <a
       href="http://www.math.hmc.edu/computing/blog/"><code>http://www.math.hmc.edu/computing/blog/></code></a>
</blockquote>

<p>Thanks for your cooperation!</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-26T23_47_18.txt</link>
<title>Status Update</title>
<dc:date>2008-04-26T23:47:18-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>Since the air conditioning was restored, we've been seeing a
crash on <code>esme</code> at about 2:00 AM each morning.  The
crashes appeared to be connected some interaction between the daily
backups and our NFS configuration.  To check that, I temporarily
disabled tape backups on the production servers, and spent all day
Friday testing configuration changes on a pair of test machines.</p>

<p>And <code>esme</code> <em>did</em> survive Friday night.  But the
rack's Ethernet switch did not.  Luckily I had a spare, and I went
in and replaced the dead switch.  I also updated some of the
configuration files on <code>gytha</code>, and I'm hoping that we'll
see improved stability (although there may be some <q>stuck</q>
machines that will require rebooting).</p>

<p>I have one more set of configuration updates to test and then
implement, but I'm going to hold off on those until I'm in the
office; probably around lunch on Monday.  I will also probably try
running a backup during the day to see whether it will trigger a
crash before we return to running automated backups.</p>

<p>Given the failure of the switch, it's possible that the key to
the problem was related to the switch handling the large amount of
bandwidth consumed by the backups rather than anything connected
directly to the NFS configurations or the backups (none of that
configuration had been changed until last night).  And if losing the
switch is the only hardware failure from the HVAC failure, we will
have gotten off pretty easy.  I'll be keeping an eye on things in
case there are additional failures.</p>

<p>In the meantime, I appreciate your patience while I work to
restabilize the systems, and while I do some additional testing.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-23T00_00_25.txt</link>
<title>Status Update</title>
<dc:date>2008-04-23T00:00:25-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>I'm very pleased to announce that F&amp;M did a great job in
getting a replacement motor for our HVAC system, and that system is
on line and cooling our machine room down to its usual chilly
68&deg; F.</p>

<p>As a result, I have brought most everything back up.  The lab
systems (responding to <code>shell.math.hmc.edu</code>),
<code>ponder</code>, the mail and web servers, the Amber cluster,
and even our new mirror server are up and running.</p>

<p><code>hex</code> is also running, but there's something
preventing SSH connections from working (for me, too).  I will look
into that on Wednesday morning and will, I hope, have it working
properly by the afternoon.</p>

<p>Once again, I apologize for the incredible inconvenience of this
outage right during the crunch time at the end of the semester.  I
am still watching the systems fairly closely, as I have some seen
some <q>weird</q> (that's a technical term) behavior related to NFS
and RPC.  I am doing my best to keep things as stable and available
as I can.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-21T15_24_50.txt</link>
<title>hmcposter LaTeX Class Version 3.0 on Website</title>
<dc:date>2008-04-21T15:24:50-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>Earlier this month, <a
href="http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-04T14_57_51.txt">I
announced</a> a new version of the <code>hmcposter</code> class.
Unfortunately, I hadn't updated the symlinks to the <q>current</q>
versions, so they weren't updated on the website.</p>

<p>I took advantage of the opportunity to add two shim classes that
will catch attempts to use the older classes and tell you what you
need to do to switch to the newer version.</p>

<p>The new version is available from <a
href="http://www.math.hmc.edu/computing/support/tex/classes/hmcposter/">the
poster class page</a> and is also installed on the math cluster (in
<code>/shared/local/share/texmf/tex/latex/hmcposter</code>.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-21T15_11_59.txt</link>
<title>HVAC Outage Continues; Limited Services Restored</title>
<dc:date>2008-04-21T15:11:59-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>Apparently the manufacturer of our AC system won't be able to get
us a replacement motor until May 8.  So F&amp;M are looking for an
equivalent motor that they can install.  That's still going to take
until tomorrow.</p>

<p>In the meantime, I have the file, mail, and authentication
servers running.  I have also brought <code>ponder</code> on line
for general use.  It's okay to (re)boot faculty, lab, and classroom
workstations for use.</p>

<p>The Amber cluster, <code>hex</code>, our new mirror server (but
<em>not</em> <code>yum.math.hmc.edu</code>) will remain offline
until we get proper cooling back.</p>

<p>If you <em>need</em> information that is stored on
<code>hex</code> or the Amber cluster, please send me e-mail so we
can work out some way of getting you access.  That won't happen
before tomorrow, though, and we may have those systems back on line
by then.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-21T09_18_15.txt</link>
<title>Servers Off Line Until HVAC Is Repaired</title>
<dc:date>2008-04-21T09:18:15-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>We needed to move the rack over to allow access to the HVAC
unit.  I have taken down the mail, authentication, and file servers
until the repairs can be made.</p>

<p><strong>All computing services will be unavailable until our HVAC
is back on line.  Once we have cooling, I will begin restarting
machines.</strong></p>

<p>Apologies for the inconvenience.</p>]]>
</description>
</item>
<item>
<link>http://www.math.hmc.edu/computing/blog/archives/2008/04/#e2008-04-20T22_08_46.txt</link>
<title>Partial Recovery; Repairs Tomorrow</title>
<dc:date>2008-04-20T22:08:46-07:00</dc:date>
<dc:creator>Claire Connelly</dc:creator>
<description>
<![CDATA[<p>After much work, the core servers are working as they did before
the failure.  I am currently processing the backlog of mail (most of
which, of course, is spam), and I have turned mail delivery back
on.</p>

<p>I have not turned the IMAP and POP servers on, so getting mail
will not be possible until I do.  I have also shut down most of the
Linux machines and Macs, and I won't be turning those back on until
after things are more stable.</p>

<p>Note that tomorrow morning around 8:00 AM, we will have an HVAC
engineer working to replace the dead motor in our HVAC unit.  We may
need to shut systems down in order to move things in the machine
room to provide access to the HVAC equipment.  Please check back
here (the core web server is in my office and will remain
operational) for more information tomorrow morning.</p>

<p>Please <em>don't</em> rush into starting up office, classroom, or
lab machines unless I've said here that things are back up and
reliable.</p>

<p>I apologize for this unforeseeable and extremely annoying
outage.  Please be assured that I am doing everything that I can to
bring our systems back on line in a timely but safe manner.</p>]]>
</description>
</item>
</channel>
</rss>
