{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiessx2dzsnvjovmhowjtrlkfzcv4lw66ougjg5mppvkg45a6alb64",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3ml6c3yqhu7b2"
},
"path": "/t/pure-prompt-vs-cognitive-runtime-for-pr-review-a-reproducible-case-study/175694#post_4",
"publishedAt": "2026-05-06T07:16:37.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "This is an excellent feedback John as usual, thank you! I agree with your core reframing: this is better described as **governed PR/release approval** than generic “AI code review.”\n\nThe main claim we want to defend is exactly: **prompts can review, runtimes can gate**.\n\nAlso aligned on metrics: headline accuracy is secondary; **unsafe approvals / critical false positives** are the primary safety signal.\n\nWe’ll incorporate your strongest methodological points in the next iteration:\n\n 1. per-fixture **per-policy** expected labels,\n\n 2. stronger baseline ladder (including schema-constrained prompt + policy-only gate),\n\n 3. richer trace artifacts and reproducibility metadata.\n\n\n\n\nOn architecture, we also agree with the direction to make final enforcement fully deterministic (LLM for interpretation, policy code for authority).\n\nIn short: the goal is not replacing human review; it is preventing unstructured LLM inference from acting as policy authority in CI/CD.",
"title": "Pure Prompt vs Cognitive Runtime for PR Review: A Reproducible Case Study"
}