{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigjdnrr2dyaxt2tk7uxj7t2r7iwwfs65tdixphb47c7imuaqjvlfy",
    "uri": "at://did:plc:3fychdutjjusoqeq24ljch6q/app.bsky.feed.post/3miidxqxok5p2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
    },
    "mimeType": "image/png",
    "size": 24783
  },
  "path": "/abs/2604.01086v1",
  "publishedAt": "2026-04-02T00:00:00.000Z",
  "site": "https://arxiv.org",
  "tags": [
    "Guokai Li",
    "Jiaxin",
    "Liang",
    "Mo Liu",
    "Yanzhe",
    "Lei",
    "Stefanus Jasin",
    "Fenghua Yang",
    "Preet Baxi"
  ],
  "textContent": "**Authors:** Guokai Li, Jiaxin, Liang, Mo Liu, Yanzhe, Lei, Stefanus Jasin, Fenghua Yang, Preet Baxi\n\nWe study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \\emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\\in\\\\{A,B\\\\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-α$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\\mathbb{E}[C_π] + \\mathbb{E}[g(W_π)]$, where $C_π$ is the total query cost, $W_π$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^ρ$ with $ρ\\ge 1$). We prove that as the error tolerance $α\\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \\emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $α$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single ``specialist'' LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $α\\rightarrow 0$.",
  "title": "Asymptotically Optimal Sequential Testing with Heterogeneous LLMs"
}