Last week some of you may have noticed a brief outage in ChatMail’s core services. We are happy to report our team worked quickly to restore all connections with minimal downtime. We take our uptime and our commitment to transparency very seriously, and we would like to breakdown what occurred to help our resellers and clients understand what happened and how we plan to improve our network reliability in the future.
On December 5th, 2022, we were notified by our NOC (Network Operations Center) and TAC (Radware) team that we had a service outage in our primary data center at Myntex Headquarters. Effectively, our internet had gone down and taken our critical infrastructure with it (including websites, email, portal, VPN, and core ChatMail services).
How long did it last?
Roughly 4 hours, from 8:03am to 12:01pm MDT on December 5th, 2022.
What steps were taken to fix it?
Our initial steps were to contact our ISP (Internet Service Provider) to verify their services were still operational. After some further troubleshooting, we were able to eliminate our internet availability as the culprit. We then began the meticulous process of isolating the issue within our own network. Eventually we were able to narrow it down to the primary Core Edge Router.
Was there a security vulnerability?
No. This was a hardware failure, nothing to do with the security of our network.
What was the solution?
As part of our disaster recovery plan, we have extra critical hardware on hand allowing us to quickly hot swap network equipment if required. Once we replaced our Core Edge Router with a new one, our networking team was able to configure the new device. After that was completed, all functions were fully restored, and we immediately notified our network of ChatMail partners of the issue and that all services were restored.
How will we improve for the future?
We’ve replenished our stock of critical network hardware so that if/when this happens again, we’ll have the equipment on hand to make the necessary repairs. To allow for quicker troubleshooting and faster deployment we will be pre loading the network hardware with our latest stable configurations.
Our users’ experience can only be as good as our uptime. The last outage we experienced was back in March of 2021 (following a forced fibre relocation). That’s why the home page of Myntex.com boasts a “99.9% server uptime”. After this December’s outage, we are still achieving 99.95% uptime, or less than 4.38 hours of downtime per year.
No one can guarantee 100% uptime, but we’re determined to get as close as possible.