System Outage
-
10:05 AM PDT (UTC-7) Aug 3rd 2021
What happened?
On July 30th 2021, a new deployment triggered our load balancers to enter into a rare invalid state. While servers were being rolled over to new versions, old instances were removed. Load balancers were not able to use the new instances because their configuration was generated incorrectly.What was the impact?
All ART19 services, including RSS feeds and the CMS were partially unavailable starting at 1:58 pm PT, and completely unavailable between 2:36 pm and 2:51 pm PT.How did we resolve the issue?
We suspected heavy load on our application servers and scaled them up, but this had no effect. Our investigation revealed that there is a problem with our load balancer configuration templates when this rare state occurs. We excluded the affected resources and then restarted our configuration.What steps are we taking to prevent future incidents?
We updated our configuration templates to be more resilient. We will also capture load balancer metrics to use for auto-scaling purposes.Again, we apologize for any inconvenience caused. Feel free to reach out to support@art19.com with any other questions or concerns.
3:05 PM PDT (UTC-7) July 30th 2021
The system outage has been resolved. We will provide a postmortem within the next few days. We apologize again for any inconvenience. If you have any questions or concerns don’t hesitate to reach out to us at support@art19.com.
2:35 PM PDT (UTC-7) July 30th 2021
ART19 is currently experiencing a system wide outage. We’re actively investigating and working on a resolution. As soon as we know more we will send a follow up. We’re sincerely sorry for the inconvenience and appreciate your patience. If you have any questions or concerns don’t hesitate to reach out to us at support@art19.com.