{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiessx2dzsnvjovmhowjtrlkfzcv4lw66ougjg5mppvkg45a6alb64",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3ml6c3yqhu7b2"
  },
  "path": "/t/pure-prompt-vs-cognitive-runtime-for-pr-review-a-reproducible-case-study/175694#post_4",
  "publishedAt": "2026-05-06T07:16:37.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "This is an excellent feedback John as usual, thank you! I agree with your core reframing: this is better described as **governed PR/release approval** than generic “AI code review.”\n\nThe main claim we want to defend is exactly: **prompts can review, runtimes can gate**.\n\nAlso aligned on metrics: headline accuracy is secondary; **unsafe approvals / critical false positives** are the primary safety signal.\n\nWe’ll incorporate your strongest methodological points in the next iteration:\n\n  1. per-fixture **per-policy** expected labels,\n\n  2. stronger baseline ladder (including schema-constrained prompt + policy-only gate),\n\n  3. richer trace artifacts and reproducibility metadata.\n\n\n\n\nOn architecture, we also agree with the direction to make final enforcement fully deterministic (LLM for interpretation, policy code for authority).\n\nIn short: the goal is not replacing human review; it is preventing unstructured LLM inference from acting as policy authority in CI/CD.",
  "title": "Pure Prompt vs Cognitive Runtime for PR Review: A Reproducible Case Study"
}