Difference between revisions of "Hardware:Status"
From CAC Wiki
Line 92: | Line 92: | ||
| issues with Grid Engine qmaster | | issues with Grid Engine qmaster | ||
| resolved | | resolved | ||
− | | | + | | Yes |
|- | |- | ||
| 7/13/2017 - 10:00 PM | | 7/13/2017 - 10:00 PM | ||
Line 98: | Line 98: | ||
| unreachable through ssh | | unreachable through ssh | ||
| resolved | | resolved | ||
− | | | + | | Yes |
|- | |- | ||
| 10/02/2017 - 8:00 AM | | 10/02/2017 - 8:00 AM |
Revision as of 17:17, 2 October 2017
This page shows all system status updates for the SW cluster. This page will be updated with additional information as new events arise.
System Status Messages | ||||
---|---|---|---|---|
Date/Time | Affected Systems | Issue | Details | Resolved ? |
3/21/2017 - 1:30 PM | All Compute / Login | Power blip / outage | Shutdown of all compute clusters and login nodes. | Yes |
3/22/2017 - 10:30 AM | All Compute / Login | Recovery from power outage | Login nodes, system, and data access restored. Compute cluster still down, scheduler queues disabled. | Yes |
3/24/2017 - 8:00 AM | All Compute | Recovery from power outage | Compute cluster nodes cac013-cac099 up and running. Scheduler queues restricted/disabled. | Yes |
3/24/2017 - 2:00 PM | All Compute | Recovery from power outage | Scheduler queues for SW (Linux) compute cluster re-opened. Cluster is up and running. SNO (SX) cluster queues still disabled. | Yes |
3/24/2017 - 3:00 PM | All Compute | Recovery from power outage | Scheduler queues for SX (SNO, Linux) compute cluster re-opened. Cluster is up and running. | Yes |
3/27/2017 - 2:00 PM | File system (disk arrays 1 and 2) | Trouble shooting on disk arrays | Replacing disks, rebooting head units; intermittent login and disk access issues to be expected. | Yes |
3/28/2017 - 2:00 PM | cac029 (compute) | cac029 off-line | cac029 is undergoing memory maintenance. | Yes |
4/13/2017 - 8:00 AM | swlogin1 (login/workup) | login problems | connectivity issues on swlogin1 prevent or delay login from sflogin0 | Yes |
5/02/2017 - 8:00 AM | all nodes (login & "SW" production) | Grid Engine scheduler not functional | The Grid Engine scheduler is currently not functional ; qstat/qsub/qmon not avaialable | Yes |
5/02/2017 - 1:00 PM | swlogin1 | Grid Engine scheduler not functional | Network issues on SGE_PROD | Yes |
5/10/2017 - 8:00 AM | swlogin1 / cac012-26 | Connectivity issues | Access to storage temporarily lost | Yes |
5/10/2017 - 9:00 AM | swlogin1 | Connectivity restored | Login restored | Yes |
5/10/2017 - 12:00 PM | cac012-27 | Reboot | Queues temporarfily disabled | Yes |
7/13/2017 - 10:00 PM | all systems | issues with Grid Engine qmaster | resolved | Yes |
7/13/2017 - 10:00 PM | swlogin1 | unreachable through ssh | resolved | Yes |
10/02/2017 - 8:00 AM | head-6b | disk array full | partly resolved (freed 4 TB) | Yes |
Currently no issues |