MK8s: Partial Connectivity Degradation to Control Planes
Mar 16, 19:34 UTC Update - The changes applied to some control plane clusters have had positive effects. The team is continuing the rollout to other affected clusters.
Mar 16, 13:56 UTC Update - We are marking DBaaS as recovered. Our Kubernetes Team is currently working on stabilizing the Kubernetes Control Plane. We are focusing on mitigating recurring load spikes influencing stability.
Mar 16, 12:57 UTC Update - We are marking the Container Registry Service as recovered.
Mar 16, 12:22 UTC Update - We are closing the incident for the AI Model Hub. All metrics have recovered and the service should be up and running again normally.
Mar 16, 11:57 UTC Update - We are adding the Container Registry as an affected Service. Customers may currently experience issues pulling and pushing images from the Registry.
Mar 16, 11:37 UTC Update - Our Kubernetes Team has deployed a fix for the affected AI Model Hub Database Services. We currently see metrics improving and monitoring the situation closely.
Mar 16, 11:05 UTC Update - We are expanding the scope of this incident to include DBaaS and AI Model Hub. We have observed an increased error count originating from PostgresDB on Kubernetes. Additionally, to improve transparency, the previously reported separate incident regarding the AI Model Hub (https://status.ionos.cloud/incidents/rmgs845klm32) is being merged into this primary incident.
Mar 16, 10:04 UTC Identified - The team has identified the root cause as a resource constraint within the etcd database. Mitigation efforts are currently underway.
Mar 16, 08:24 UTC Investigating - Some customers may experience connection problems to the control plane and degraded functionality of kubernetes. Our teams are investigating and working on a resolution.
Discussion in the ATmosphere