Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic74b2plzf4qirerhlvubj7bsas6awumju4ugidu35hscediridv4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkszlwe6v552"
  },
  "path": "/t/is-an-agent-harness-evaluation-preprint-suitable-for-arxiv-cs-ai/175693#post_2",
  "publishedAt": "2026-05-01T20:08:21.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I took a quick look at the repo. This looks like a real empirical evaluation note, not just a blog-post style claim. Having the configs, trial logs, snapshots, and analysis code public helps a lot.\n\nFor category fit, cs.AI seems defensible if the paper is framed as agent evaluation / scaffold effects. I would also look carefully at cs.MA, since arXiv treats intelligent agents and multi-agent systems as a separate CS category. I would not pick cs.CL as the primary category unless the paper is mainly about language modelling or NLP rather than agent harnesses and evaluation setup.\n\nOn endorsement, I would not overthink it, but I also would not cold-message half of cs.AI. Submit first, get the endorsement link from arXiv, then send it to one or two people whose recent papers are actually close to this topic. The ask should be narrow: “does this belong in the area well enough for endorsement?”, not “please review my paper” or “please vouch for the result.”\n\nAlso, be prepared for the moderator to move the category. That is not a disaster. Choose a reasonable primary category and not oversell the scope.",
  "title": "Is an agent-harness evaluation preprint suitable for arXiv cs.AI?"
}