The University of Sussex is a leading higher education and research institution near Brighton, located in the south of England.
The University has over 14,000 students, of which over a third are postgraduates.
Sussex has developed a reputation for innovation and inspiration, attracting leading researchers from around the globe. There are currently over 2,100 staff, including 1,000 teaching and research staff.
The University of Sussex run an e-Submissions and e-Feedback system known as "ESEF", which enables students to upload their course assignments at their own convenience. A key component of ESEF is Study Direct, a customised version of the open-source Moodle platform.
In 2014, a monitoring system was built for Study Direct, based on the Open Source ELK stack (now known as Elastic Stack). The system gave staff a better view of ESEF system behaviour and diagnostics into problems that arose, and getting visibility of what was going on in the system when students claimed they were unable to submit work.
In late 2015, the monitoring system began to encounter serious stability issues, resulting in its complete breakdown from the end of October. This meant the university had no way of knowing if claims of unsubmitted work were justified or not. Moreover, the original developer had since left, and the University had very little documentation available to fix it.
The University of Sussex contacted us and asked for our help. We performed a three day assessment of the monitoring system. The health check revealed the main cause of the problem lay in the complicated and often resource hungry architecture. We proposed an optimised architecture by simplifying processes and upgrading to the latest technology. Meanwhile we added OS monitoring which had been previously absent.
Over three weeks, we revised the architecture, migrated the existing system from ELK stack 1.4 to Elastic Stack 2.4, and added new Grafana dashboards to monitor system metrics such as disk, memory, CPU and network. We then successfully cleaned and migrated their existing data and dashboards. In addition to successfully migrating their data to the new system, we were able to also put a retention policy in place to avoid any future resource issues.
Following the engagement, the University of Sussex’s IT department was able to efficiently monitor their systems for errors and performance issues, resulting in greater levels of service for their users. Using dashboards designed by Rittman Mead, they could see performance levels, as well as detailed and historic metrics surrounding any issues with the system. Furthermore, the new solution meant the University is now capable of accurately investigating the claims of unsubmitted work.
The overhead of monitoring was minimal, yet the benefit significant. By monitoring server metrics, it was also possible to drive realtime alerting for immediate problems, as well as analyse historical trends in order to support capacity planning. Without this kind of monitoring, capacity planning becomes impossible. Furthermore, troubleshooting is entirely reactive to problems that occur, rather than proactively preventing the problems occurring in the first place.
The implementation of a monitoring and diagnostics system is an essential step for any organisation that wants to ensure the highest levels of service for it's users. Once this is in place, we recommend using the data to further enhance the system, such as anomaly detection and smart alerting.
System usage data can also be used as the basis for analysing levels of user engagement with a platform, and identifying any potential areas for improving the accessibility of a system.