{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia364dvidj5m4fm3rhztv2p6rhi3riqa7ditox5nnepmv46wjydwe",
    "uri": "at://did:plc:qzjwstutqk2cy7df7jbzd2hx/app.bsky.feed.post/3mfecg2rbuow2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreifgdyxpoq3bgfebwknpj3zkahh7edlabsfvqvwl5zkvghvqbbdezm"
    },
    "mimeType": "image/jpeg",
    "size": 7568703
  },
  "path": "/article/4135277/arrcus-targets-ai-inference-bottleneck-with-policy-aware-network-fabric.html",
  "publishedAt": "2026-02-20T14:41:54.000Z",
  "site": "https://www.networkworld.com",
  "tags": [
    "Artificial Intelligence, Network Management Software, Networking",
    "threefold bookings growth in 2025",
    "AINF",
    "Shekar Ayyar",
    "SONiC",
    "scale out data center capacity"
  ],
  "textContent": "As AI usage continues to scale, there is a distinct type of application traffic that is having an impact on networking. Training isn’t the issue, it’s inference.\n\nTraining runs in centralized clusters on predictable schedules. Inference is distributed, latency-sensitive and subject to real-time constraints around power availability, data sovereignty, and cost. The network fabric that is routing that traffic is increasingly the bottleneck, and traditional hardware-defined networking was not built to handle it.\n\nThat is the problem Arrcus is moving to address. The San Jose-based networking software company has spent a decade building ArcOS, a network operating system designed to decouple routing and switching workloads from proprietary hardware. The company sells into data center, telco, and enterprise markets, running in production across thousands of network nodes globally. This week, Arrcus reported threefold bookings growth in 2025 and announced the Arrcus Inference Network Fabric (AINF), a product built to dynamically steer AI inference traffic across distributed infrastructure.\n\n“To enhance agentic AI adoption by improving response times, networks need to become AI-aware,” Shekar Ayyar, chairman and CEO of Arrcus, told _Network World_.\n\n## How ArcOS differs from SONiC and NSX\n\nUnderstanding what Arrcus is doing with AINF requires understanding what ArcOS actually is, and where it sits relative to other networking technologies like SONiC or VMware’s NSX.\n\nSONiC is a switching-focused operating environment suited to operators that want to scale out data center capacity with straightforward packet forwarding. NSX operates at the virtualization layer as a network overlay for compute environments. ArcOS works at Layer 3 and is designed for policy-rich routing use cases: 5G network slicing for carriers, data center interconnects, and environments where programmable traffic steering matters. SoftBank’s deployment of Arrcus for SRv6 mobile user plane is one publicly disclosed example.\n\n“Switching is essentially a simpler operation. You just kind of send a packet or not,” Ayyar explained. “Routing is a more complex operation. You tell the packet where to go and what to do. You have a lot more richness and policy in what you do on the routing front.”\n\nThat policy-rich routing foundation is what Arrcus is now applying to AI inference.\n\n## The inference problem and how AINF addresses it\n\nAs AI workloads shift from centralized training to distributed inference, the network faces a different class of demands.\n\nInference nodes are geographically dispersed and must satisfy simultaneous constraints around latency, throughput, power capacity, data residency, and cost. Those constraints vary by location and change in real time, and traditional hardware-defined networking was not designed to handle them dynamically.\n\n“These inference nodes are now going to become super critical in understanding exactly what the constraints are at those inference points,” Ayyar said. “Do you have a power constraint? Do you have a latency constraint? Do you have a throughput constraint? And if you do, how are you going to direct and steer your traffic?”\n\nAINF addresses this by introducing a policy abstraction layer that sits between Kubernetes-based orchestration and the underlying silicon. Models expose their requirements via an API interface, disclosing the parameters they need. Those requirements flow down to the routing layer, which steers traffic accordingly.\n\n“Think about us as speeding up the process of how all of those requirements find their way to the router, and then instructing the routing node at the appropriate location in this giant web of networking nodes to do the right thing so that it satisfies the inference policy,” Ayyar said.\n\nOperators define business policies including latency targets, data sovereignty boundaries, model preferences, and power constraints. AINF evaluates those conditions in real time and steers inference traffic to the optimal node or cache. Components include query-based inference routing with policy management, interconnect routers, and edge networking. The system integrates with vLLM, SGLang, and Triton inference frameworks. Prefix awareness is used to optimize KV cache usage and help inferencing applications meet service-level objectives for throughput, latency, data sovereignty, power, and cost.\n\n## Challenges and outlook\n\nAyyar identified two near-term obstacles to adoption. The first is awareness. He noted that many potential customers have been designing inference architectures without accounting for policy-aware fabrics as an option. The second is incumbent lock-in, with Cisco and Juniper shops needing assurance that AINF can interoperate cleanly alongside existing infrastructure. Ayyar said Arrcus has invested heavily in interoperability testing to address this.\n\nArrcus is projecting to cross $100 million in bookings in 2026, a target set before any contribution from AINF. The company plans to demonstrate the product at MWC Barcelona and Nvidia GTC in San Jose.\n\n“All the talk we’re seeing about AI and the infrastructure related to AI is mostly the tip of the iceberg,” Ayyar said. “What people are not appreciating yet is what is underneath the water, where we believe the efficiency gains as well as the effectiveness gains are hidden and lurking underneath. As soon as that comes to light, it’s almost like throwing X-ray vision on top of this and saying, look, this is where the world is headed. Begin now.”",
  "title": "Arrcus targets AI inference bottleneck with policy-aware network fabric"
}