Difference between revisions of "Hardware:Status"

Revision as of 19:48, 27 February 2019

This page shows information about the status of systems at the Centre for Advanced Computing. It will be updated with additional information as new events arise.

System Status Messages
Date	Affected systems	Details/reason	Resolution
02/27/2019 - 10 am	caclogin02	caclogin02 VM down	login traffic directed to caclogin03/4, rebooted resolved issue
12/10/2018 - 8 am	caclogin02	caclogin02 VM down	login traffic directed to caclogin03/4
12/04/2018 - 8 pm	caclogin03/04	issues with login nodes; caclogin02 works	resolved after reboot
10/29/2018 - 4 pm	caclogin03	/ file system full	resolved
08/06/2018 - 08/10/2018	Cluster downtime	Scheduled filesystem upgrade	planned downtime
07/12/2018 - 10:30 AM	GPFS outage	Filesystem temporarily unavailable	resolved
06/27/2018 - 9:00 AM	Login node shutdown	Maintenance (unscheduled)	node back in service
06/20/2018 - 8:00 AM	Login node non-responsive	Cause : out of memory	resolved, login restored (take-down, reboot)
05/01/2018 - 9:00 AM	Scheduler maintenance	Scheduled upgrade/downtime of scheduler	resolved
04/23/2018 - 7:00 AM	Frontenac login node	login issues, reboot	functional after reboot
04/19/2018 - 3:30 PM	Frontenac login node	lost access to file system, reboot	resolved after reboot
03/16/2018 - 11:00 AM	Scheduler upgrade	Scheduled upgrade/downtime of scheduler	Upgrade complete, working on x11 support
01/28/2018 - 5:00 AM	Frontenac login node caclogin02	Node went down out of schedule	login restored, investigating causes
01/18/2018 - 11:30 AM	Frontenac login node caclogin01	Out-of-schedule shutdown / reboot (~45min)	updates / maintenance
11/21/2017 - 11:00 PM	Frontenac (all nodes)	Temporary unmount of /global file system	re-mounted, file system accessible
10/30/2017 - 8:00 AM	multiple production nodes unreachable	scheduler lost contact to production nodes	nodes will be transfered to Frontenac
10/30/2017 - 8:00 AM	swlogin1 (login node)	No login possible	login restored
10/03/2017 - 8:00 AM	head-6b	disk array at near capacity	working on reducing usage
10/02/2017 - 8:00 AM	head-6b	disk array full	partly resolved (freed 4 TB)
7/13/2017 - 10:00 AM	swlogin1	unreachable through ssh	resolved
7/13/2017 - 8:00 AM	caclogin01	temporary maintenance shutdown	back up

@@ Line 9: / Line 9: @@
 | '''Details/reason'''
 | '''Resolution'''
+|-
+| 02/27/2019 - 10 am
+| '''caclogin02'''
+| '''caclogin02 VM down'''
+| '''login traffic directed to caclogin03/4, rebooted resolved issue'''
 |-
 | 12/10/2018 - 8 am

Difference between revisions of "Hardware:Status"

Revision as of 19:48, 27 February 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools