External Publication
Visit Post

Inquiry About Dataset for AI-Driven Cloud Load Balancing and Auto scaling of instances

Hugging Face Forums [Unofficial] March 4, 2026
Source

sohamk28:

I’m currently building a Smart Load Balancer with Auto-Scaling Instances and exploring ways to optimize cloud performance using AI-based techniques.

I’m looking for a dataset that contains:

  • Server or VM utilization data (CPU, memory, network usage)
  • Task or request distribution logs
  • Auto-scaling or workload patterns over time
  • Any real or simulated cloud performance metrics

I’d really appreciate it if anyone could suggest:

  • Publicly available cloud workload datasets
  • Google, Alibaba, or Azure cluster traces
  • Or any datasets that can help in modeling or testing AI-based load balancing algorithms

Thanks in advance for your help and suggestions

Soham Kale

Hi Soham, you can cover this in two ways: use public traces for realism, and synthetic traces for controlled stress testing.

Public datasets worth checking:

  • Google cluster traces (Borg) for job/task scheduling and resource usage patterns

  • Alibaba cluster trace for container workloads and utilization over time

  • Azure traces and other public workload datasets from academic benchmarking papers

  • Also look for “cluster trace”, “workload trace”, “autoscaling trace”, “request trace”, “datacenter telemetry”, “Kubernetes trace” on the Hub

If you cannot find a dataset with all signals in one place, a common approach is to fuse:

  • a request arrival trace (per service) plus

  • a resource utilization trace (per node or pod) then derive autoscaling events from policy simulation.

How I can help you directly:

  • Provide a ready to use synthetic dataset generator that produces time series for CPU, memory, network, request rate, latency, error rate, plus autoscaling actions under different policies (HPA style, predictive, RL style)

  • Include bursty traffic, diurnal seasonality, noisy telemetry, failures, and multi service interference

  • Output formats that plug into training easily, like parquet plus a gym style environment spec for RL or a supervised dataset for predicting scale up and scale down actions

  • Add evaluation scripts for cost latency SLO violations and stability metrics, so you can compare heuristics vs learned policies

Discussion in the ATmosphere

Loading comments...