Fri Apr 23 11:49:20 PDT 2010
Network Switch Failure (Temporarily Resolved)
At about 6:30 this morning, the gigabit
switch that provides connectivity to our
server cluster died, causing loss of all
server-related functionality, including web
service (for the department, the ODE
Toolkit site, the CODEE.org site, and
mirror.hmc.edu). Also
interrupted were file service, logins, and
incoming and outgoing mail.
I found out about the problem at about 8:30 when Suzanne called me. With DruAnn and Roger's help we figured out that the switch had died; Roger tried to reboot it but the switch remained wedged. Service was restored at 10:21 AM after I replaced the switch with the switch from the Amber cluster, which will be offline until we have a more permanent solution.
I will be investigating our options for replacing the network switch and, I hope, adding some redundancy to our connectivity. More details as I know them.
From what I can tell, service should be restored to all workstations and for all services. If you experience any problems, please let me know ASAP.