Zivver Relay

Post-Mortem - SMTP incident 18-10-2022

Resolved
Assessed

Situation:
On October 19th 2022 at 09:51 CEST Zivver was contacted by multiple customers stating that their SMTP connection with Zivver wasn’t working. After which Zivver’s engineers found the source of the issue and mitigated it at 10:15 CEST on the same day.

Impact:
The impact for users was that their email server could not connect to our SMTP server and therefore would retry sending on a later time. The impact was that the emails they sent during the time at which our SMTP server was unavailable, were delayed.

Solution:
Zivver’s engineers saw that a storage exception was triggered, resulting in the outage of the service. Changing storage parameters mitigated the incident.

Root-cause:

  1. A significant increase in the use case of researchers using Zivver’s SMTP solution to send data automatically caused a similarly significant increase in disk usage.
  2. The above situation was not deemed possible at this time and thus no alarms were set on this threshold.

Mitigating Actions
These actions will prevent this issue from happening again in the future:

  1. Significant increase in storage capacity.
  2. Set up alerting for the SMTP servers to prevent future similar incidents.
  3. Change log rotation for the SMTP servers to prevent excessive storage requirements.