Service status updates

Subscribe via RSS | Twitter: @SRCFstatus

See also our Nagios automated test output.

Java problems on pip believed to be resolved

Possible intermittent problems with Java-based applicatons on pip (the main SRCF shell and web server) are now believed to have been corrected.  Apologies for any inconvenience.  If you are still having problems with Java (or anything else), please report them to the sysadmins.

(The problem arises from Java’s use of a large quantity of virtual memory address space; previously it has been observed to immediately allocate an address space of a little over a quarter of the total memory in the server.  Since the recent hardware upgrade, a quarter of the total memory is rather large and exceeded the default virtual memory limit.  We have increased the virtual memory limit in order to accommodate this.)

0 notes

Web server problem caused unreliability 13:45-19:30

Since about 13:45 today, occasional web requests have been rejected with an internal server error message (HTTP 500).  This is because the Apache web server on pip has started crashing unpredictably.  The cause is under investigation; more information here when we have it.

Update, 23:44: the problem has been identified and fixed.  (Since the hardware upgrade, Apache’s virtual memory usage has increased and it was now hitting a long-standing per-process virtual memory limit.)

0 notes

Desktop and a few miscellaneous services down

The upgrade of pip tonight completed successfully with under an hour’s downtime.

However, unrelated work started during our regular scheduled maintenance hour (Sundays 2-3am) ran into a hitch which is currently causing a few peripheral services to be unavailable.  This will continue to be the case until I’m able to go on-site later today, since the out-of-band management chip in the affected server (the virtual machine disk storage server, earthquake) has crashed.

Affected services are:

  • Desktop (cyclone)
  • Usenet news (flood)
  • User game servers (cavein)
  • Assorted internal development and management systems

Estimated time of service recovery: early Sunday afternoon.

0 notes

Major server upgrade 00:00 tonight (2014-06-15)

Tonight, starting at midnight, we intend to replace the main SRCF server (pip).  This will improve performance and reliability for all SRCF services.  The replacement does however mean that most SRCF services (specifically, all services except IRC) will be unavailable for a while tonight.  The maintenance might potentially last several hours but will hopefully be complete within an hour.  We apologise for any inconvenice this may cause.

This upgrade has had to be performed at short notice, somewhat sooner than anticipated, because of a problem with the old server which led to last week’s downtime overrunning considerably.  (Since a recent disk upgrade, the old server has had problems rebooting due to a suspected firmware bug; we need to complete the move to all-new hardware urgently to avoid further extended outages caused by this bug.)

0 notes

Maintenance 02:00 tonight (2014-06-08)

Tonight, from 02:00, there will be a short disruption to most SRCF services as we will be installing an urgent system update on our servers.

This will involve a reboot, so if you are using our services this evening please make sure that you save your work and exit any applications you have running either on the shell service (pip), the desktop service (cyclone) or the user server service (cavein).

Apologies for the short notice; this patch is important for the continued safety and security of the facility.

0 notes

Maintenance 01:00 tonight (2014-05-13)

Tonight, from 01:00, there will be a short disruption to most SRCF services as we will be installing an urgent system update on our servers.

This will involve a reboot, so if you are using our services this evening please make sure that you save your work and exit any applications you have running either on the shell service (pip) or on the desktop service (cyclone).

Apologies for the short notice; this patch is important for the continued safety and security of the facility.

0 notes

Emergency maintenance, 2014-04-08

Due to a critical vulnerability in OpenSSL reported a few hours ago, most Linux systems which use SSL or TLS in any form can be made to disclose private data.

All SRCF servers will be undergoing maintenance to patch this issue tonight.  In most cases this will involve a reboot.

We apologise for the disruption caused but urgent action is required to protect users’ data.  (If you have Linux systems elsewhere should ensure that these are patched as well, even if they are not servers.)

0 notes

Outgoing mail delays

Since 22:17 on 26th February, some outgoing mail from the SRCF has been experiencing delays (in some cases, of several hours).  This is due to connectivity problems with the University’s central mail servers.

We are working with the University Computing Service to investigate and resolve the problem.  Meanwhile, apologies for the disruption.

Update 19:45: we deployed a workaround at 18:20; there should be no further delays.

The cause was an interoperability bug in the TLS implementations (leading to “Bad record MAC” errors and aborted SMTP connections) between our mail server and the University’s; previously these connections generally hadn’t used TLS at all until the University Computing Service enabled it last night.

0 notes

IPv6 unavailable on pip, 06:58-08:21

Due to an erroneous Ubuntu update triggering a long-standing Linux kernel misfeature, the main SRCF shell/web server (pip) lost IPv6 connectivity at 06:58.  This was repaired at 08:21.

IPv4 connectivity was unaffected so few people are likely to have noticed, but if it did affect you (e.g. long-standing IPv6 connections dropped, poor performance for applications which don’t support RFC 6555 (Happy Eyeballs), no reachability from IPv6-only hosts) then we apologise for the inconvenience.

0 notes

Email delays, 2013-08-16

Due to a spam incident on Thursday evening, mail leaving the SRCF servers is currently being severely rate-limited by the University mail servers and we therefore have a backlog of undelivered mail.

Mail sent from the SRCF server or forwarded through the SRCF is likely to be delayed by several hours until at least mid-morning on Friday.

Apologies for any inconvenience caused.

0 notes