{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibz6zlowtiduujhsn2zystbpe463innfty2b2hahsqmw53d2kp554",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjxtzckp5eo2"
},
"path": "/t/request-seeking-arxiv-cs-ai-endorsement-independent-researcher-llm-metacognition-benchmark-live-kaggle-leaderboard-8-frontier-models-n-69-human-panel/175421#post_1",
"publishedAt": "2026-04-21T00:10:43.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://www.kaggle.com/benchmarks/rctoliveira/metacognitive-probe-measuring-llm-self-awareness"
],
"textContent": "Hey folks!\n\nI’m an independent AI researcher seeking an arXiv endorsement for the **cs.AI** category (cross-list: cs.CL, cs.LG, stat.ML). This is my first arXiv submission and I don’t have an institutional affiliation, so I need a personal endorsement from someone who has published in a related category.\n\n### About the paper\n\n**Title:** “The Metacognitive Probe: Decomposing LLM Self-Knowledge into Five Measurable Dimensions”\n\nThe paper presents a 5-task diagnostic benchmark that decomposes LLM self-knowledge into separately-measurable dimensions — confidence calibration, epistemic vigilance, knowledge boundaries, calibration range, and reasoning-chain validation. Standard benchmarks (MMLU, BIG-Bench, HELM) measure _what_ models know; this instrument measures _what models know about what they know_.\n\n**Headline finding:** A 47-point within-model dissociation in Gemini 2.5 Flash — it achieves the panel’s best within-task calibration (T1-CC = 88) but the worst cross-task confidence prediction (T4-CR = 41). Flash reports confidence ≈ 100 on every factoid, including ones it gets wrong. This has direct implications for confidence-gated deployment systems.\n\nThe benchmark is evaluated on 8 frontier models (Claude Opus/Sonnet, Gemini Pro/Flash, DeepSeek-R1, GLM-5, Qwen 3, Gemma 3) and a human calibration panel (N=69). All code, data, prompts, and scoring rubrics are publicly released.\n\n### Verifiable materials\n\n * **Live Kaggle benchmark:** https://www.kaggle.com/benchmarks/rctoliveira/metacognitive-probe-measuring-llm-self-awareness\n\n * **Google DeepMind Hackathon entry** (Measuring Progress Toward AGI — Cognitive Abilities track)\n\n * Happy to share the full PDF privately before you decide\n\n\n\n\n### Endorsement details\n\n * **Category:** cs.AI (primary), cross-list cs.CL, cs.LG, stat.ML\n\n * **Endorsement code:** I4G6HG\n\n * To endorse, the endorser needs to have submitted 3+ papers to any cs.* category on arXiv within the last 5 years\n\n\n\n\nIf you’re an active arXiv author in any of these categories and willing to help, I’d really appreciate it. The endorsement takes about 30 seconds — just clicking a link and confirming. I’m happy to send you the paper first if you’d like to review it.\n\nThanks for your time!\n\nRafael Oliveira",
"title": "[Request] Seeking arXiv cs.AI endorsement — independent researcher, LLM metacognition benchmark (live Kaggle leaderboard, 8 frontier models, N=69 human panel)"
}