{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiaeayygvnyzf7q3puzausi5c5v3xsvpfs6azowieu5jveazrgyhhi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkry2hytzfj2"
},
"path": "/t/is-an-agent-harness-evaluation-preprint-suitable-for-arxiv-cs-ai/175693#post_1",
"publishedAt": "2026-05-01T09:15:14.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://doi.org/10.5281/zenodo.19819492",
"https://github.com/namanvats/scaffold-effects"
],
"textContent": "I’m an independent researcher working on agent systems and LLM evaluation. I recently prepared a small empirical preprint and am trying to understand the right path for sharing it with the research community.\n\nThe paper studies how different agent harnesses/scaffolds can affect measured benchmark performance and token cost under a controlled setup. It compares Goose, OpenCode, and OpenHands-SDK on a fixed Terminal-Bench-Pro task slice across two models.\n\nPaper / DOI: https://doi.org/10.5281/zenodo.19819492\nCode/repo: https://github.com/namanvats/scaffold-effects\n\nI’m currently looking for advice from people familiar with arXiv cs.AI submissions: does this look appropriately scoped for cs.AI, and what is the respectful way for a first-time independent author to handle the endorsement process?\n\nI’m not asking for a review of the paper’s claims, only for guidance on category fit and the right process.",
"title": "Is an agent-harness evaluation preprint suitable for arXiv cs.AI?"
}