{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreie4j6fxutzt4al6r2rgfr33r6jp2bphggx2xx4as4qpdf5c7ouqni",
    "uri": "at://did:plc:3fychdutjjusoqeq24ljch6q/app.bsky.feed.post/3mkkzp6c77gy2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiflo6xt7is6b2iafwghkjahlgggocme5jwjsbeuqqwcywuvjhmszm"
    },
    "mimeType": "image/png",
    "size": 24783
  },
  "path": "/abs/2604.24737v1",
  "publishedAt": "2026-04-28T00:00:00.000Z",
  "site": "https://arxiv.org",
  "tags": [
    "Nirmit Joshi",
    "Roey Magen",
    "Nathan Srebro",
    "Nikolaos Tsilivis",
    "Gal Vardi"
  ],
  "textContent": "**Authors:** Nirmit Joshi, Roey Magen, Nathan Srebro, Nikolaos Tsilivis, Gal Vardi\n\nWe study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\\varepsilon$, a moderate number of thinkers that scales as $\\log \\frac{1}{\\varepsilon}\\log \\log \\frac{1}{\\varepsilon}$, and sufficient passive end-result data that scales as $\\frac{1}{\\varepsilon}\\cdot poly\\log\\frac{1}{\\varepsilon}$.",
  "title": "Learning to Think from Multiple Thinkers"
}