During a planned infrastructure update in the sandbox environment, we encountered an unexpected issue that led to temporary API downtime.
The update involved upgrading the Amazon Machine Images (AMIs) used by our underlying infrastructure. While this maintenance was intended to be non-disruptive, the new AMI version introduced a bug(https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/2101914) in the iptables logic, which blocked traffic from nodes to internal service components. This led to a disruption in internal networking and affected API availability.
The issue was mitigated by rolling back to the previous stable AMI version, after which full service was restored. The sandbox environment was unavailable for approximately 2.5 hours, with no impact on production systems.