A backend release was executed on Tuesday June 17th, 2020 which included core platform stability and scaling improvements. The upgrade forced ZIVVER’s platform to change the way resources are created in our RESTful web services. This led to a thread exhaustion in the Java Virtual Machine of the backend nodes.
Users were not able to access ZIVVER through a client. Data integrity was not impacted.
7.14 AM (CEST) till 8.30 AM (CEST)
Restarting the back-ends on a continuous base while performing a rollback to the previous stable version. In the evening a release was executed including the necessary bug-fixes.
Lessons learned and next steps:
Moving forward, we will firstly treat the upgrade of major versions of our core 3rd party libraries as separate releases. Secondly, we are also implementing our JVM-specific alerting that we have in place to report on trends before we reach exhaustion of resources. Thirdly, we are also implementing a better distributed load testing to catch patterns of exhaustion or memory leaks across the whole collection of backend nodes.
Permanent fix was deployed.
We have experienced some performance issues this morning. However, the platform is stable again.
We are further investigating the issue and we will get back to you with an update.
The backend system is currently unable to handle most of the incoming requests.