Fri Apr 23 11:49:20 PDT 2010

Network Switch Failure (Temporarily Resolved)

At about 6:30 this morning, the gigabit switch that provides connectivity to our server cluster died, causing loss of all server-related functionality, including web service (for the department, the ODE Toolkit site, the CODEE.org site, and mirror.hmc.edu). Also interrupted were file service, logins, and incoming and outgoing mail.

I found out about the problem at about 8:30 when Suzanne called me. With DruAnn and Roger's help we figured out that the switch had died; Roger tried to reboot it but the switch remained wedged. Service was restored at 10:21 AM after I replaced the switch with the switch from the Amber cluster, which will be offline until we have a more permanent solution.

I will be investigating our options for replacing the network switch and, I hope, adding some redundancy to our connectivity. More details as I know them.

From what I can tell, service should be restored to all workstations and for all services. If you experience any problems, please let me know ASAP.


Posted by Claire Connelly | Permalink