Difference between revisions of "Hardware:Status"

From CAC Wiki
Jump to: navigation, search
Line 104: Line 104:
 
| disk array full
 
| disk array full
 
| partly resolved (freed 4 TB)
 
| partly resolved (freed 4 TB)
| '''Yes'''
+
| Yes
 +
|-
 +
| 10/03/2017 - 8:00 AM
 +
| head-6b
 +
| disk array at near capacity
 +
| working on reducing usage
 +
| '''No'''
 
|-
 
|-
 
!colspan="5"| '''Currently no issues'''
 
!colspan="5"| '''Currently no issues'''
 
|-
 
|-
 
|}
 
|}

Revision as of 13:36, 3 October 2017

This page shows all system status updates for the SW cluster. This page will be updated with additional information as new events arise.

System Status Messages
Date/Time Affected Systems Issue Details Resolved ?
3/21/2017 - 1:30 PM All Compute / Login Power blip / outage Shutdown of all compute clusters and login nodes. Yes
3/22/2017 - 10:30 AM All Compute / Login Recovery from power outage Login nodes, system, and data access restored. Compute cluster still down, scheduler queues disabled. Yes
3/24/2017 - 8:00 AM All Compute Recovery from power outage Compute cluster nodes cac013-cac099 up and running. Scheduler queues restricted/disabled. Yes
3/24/2017 - 2:00 PM All Compute Recovery from power outage Scheduler queues for SW (Linux) compute cluster re-opened. Cluster is up and running. SNO (SX) cluster queues still disabled. Yes
3/24/2017 - 3:00 PM All Compute Recovery from power outage Scheduler queues for SX (SNO, Linux) compute cluster re-opened. Cluster is up and running. Yes
3/27/2017 - 2:00 PM File system (disk arrays 1 and 2) Trouble shooting on disk arrays Replacing disks, rebooting head units; intermittent login and disk access issues to be expected. Yes
3/28/2017 - 2:00 PM cac029 (compute) cac029 off-line cac029 is undergoing memory maintenance. Yes
4/13/2017 - 8:00 AM swlogin1 (login/workup) login problems connectivity issues on swlogin1 prevent or delay login from sflogin0 Yes
5/02/2017 - 8:00 AM all nodes (login & "SW" production) Grid Engine scheduler not functional The Grid Engine scheduler is currently not functional ; qstat/qsub/qmon not avaialable Yes
5/02/2017 - 1:00 PM swlogin1 Grid Engine scheduler not functional Network issues on SGE_PROD Yes
5/10/2017 - 8:00 AM swlogin1 / cac012-26 Connectivity issues Access to storage temporarily lost Yes
5/10/2017 - 9:00 AM swlogin1 Connectivity restored Login restored Yes
5/10/2017 - 12:00 PM cac012-27 Reboot Queues temporarfily disabled Yes
7/13/2017 - 10:00 PM all systems issues with Grid Engine qmaster resolved Yes
7/13/2017 - 10:00 PM swlogin1 unreachable through ssh resolved Yes
10/02/2017 - 8:00 AM head-6b disk array full partly resolved (freed 4 TB) Yes
10/03/2017 - 8:00 AM head-6b disk array at near capacity working on reducing usage No
Currently no issues