Difference between revisions of "Hardware:Status"

Revision as of 13:32, 23 November 2017

This page shows information about the status of systems at the Centre for Advanced Computing. It will be updated with additional information as new events arise.

System Status Messages
7/13/2017 - 10:00 PM	all systems	issues with Grid Engine qmaster	resolved	Yes
7/13/2017 - 10:00 PM	swlogin1	unreachable through ssh	resolved	Yes
10/02/2017 - 8:00 AM	head-6b	disk array full	partly resolved (freed 4 TB)	Yes
10/03/2017 - 8:00 AM	head-6b	disk array at near capacity	working on reducing usage	yes
10/30/2017 - 8:00 AM	swlogin1 (login node)	No login possible	login restored	yes
10/30/2017 - 8:00 AM	multiple production nodes unreachable	scheduler lost contact to production nodes	nodes will be transfered to Frontenac	yes
11/21/2017 - 11:00 PM	Frontenac (all nodes)	Temporary unmount of /global file system	re-mounted, file system accessible	yes

@@ Line 4: / Line 4: @@
 {| class="wikitable" style="float:left; margin-right: 25px;"
 !colspan="5"| '''System Status Messages'''
-|-
-|'''Date/Time'''
-|'''Affected Systems'''
-|'''Issue'''
-|'''Details'''
-|'''Resolved ?'''
-|-
-| 11/22/2017 - 11:00 AM
-| Frontenac (all nodes)
-| production jobs terminated due to FS issues
-| please re-submit your jobs
-| yes
-|-
-| 3/21/2017 - 1:30 PM
-| All Compute / Login
-| Power blip / outage
-| Shutdown of all compute clusters and login nodes.
-| Yes
-|-
-| 3/22/2017 - 10:30 AM
-| All Compute / Login
-| Recovery from power outage
-| Login nodes, system, and data access restored. Compute cluster still down, scheduler queues disabled.
-| Yes
-|-
-| 3/24/2017 - 8:00 AM
-| All Compute
-| Recovery from power outage
-| Compute cluster nodes cac013-cac099 up and running. Scheduler queues restricted/disabled.
-| Yes
-|-
-| 3/24/2017 - 2:00 PM
-| All Compute
-| Recovery from power outage
-| Scheduler queues for SW (Linux) compute cluster re-opened. Cluster is up and running. SNO (SX) cluster queues still disabled.
-| Yes
-|-
-| 3/24/2017 - 3:00 PM
-| All Compute
-| Recovery from power outage
-| Scheduler queues for SX (SNO, Linux) compute cluster re-opened. Cluster is up and running.
-| Yes
-|-
-| 3/27/2017 - 2:00 PM
-| File system (disk arrays 1 and 2)
-| Trouble shooting on disk arrays
-| Replacing disks, rebooting head units; intermittent login and disk access issues to be expected.
-| Yes
-|-
-| 3/28/2017 - 2:00 PM
-| cac029 (compute)
-| cac029 off-line
-| cac029 is undergoing memory maintenance.
-| Yes
-|-
-| 4/13/2017 - 8:00 AM
-| swlogin1 (login/workup)
-| login problems
-| connectivity issues on swlogin1 prevent or delay login from sflogin0
-| Yes
-|-
-| 5/02/2017 - 8:00 AM
-| all nodes (login & "SW" production)
-| Grid Engine scheduler not functional
-| The Grid Engine scheduler is currently not functional ; qstat/qsub/qmon not avaialable
-| Yes
-|-
-| 5/02/2017 - 1:00 PM
-| swlogin1
-| Grid Engine scheduler not functional
-| Network issues on SGE_PROD
-| Yes
-|-
-| 5/10/2017 - 8:00 AM
-| swlogin1  / cac012-26
-| Connectivity issues
-| Access to storage temporarily lost
-| Yes
-|-
-| 5/10/2017 - 9:00 AM
-| swlogin1
-| Connectivity restored
-| Login restored
-| Yes
-|-
-| 5/10/2017 - 12:00 PM
-| cac012-27
-| Reboot
-| Queues temporarfily disabled
-| Yes
 |-
 | 7/13/2017 - 10:00 PM

Difference between revisions of "Hardware:Status"

Revision as of 13:32, 23 November 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools