{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihkfqoatw4zi373xjjly2c6xk4h54mi23f3bzdx3uwk4ye2g6mm2u",
    "uri": "at://did:plc:4n6wgsqsqm6q2hjncgwmreey/app.bsky.feed.post/3miscu5a775w2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiasuizyizxcc2kr5gp7hoydsgczt3cuylvd2b5drfohnpr5jn6gby"
    },
    "mimeType": "image/jpeg",
    "size": 49826
  },
  "path": "/post/48335755",
  "publishedAt": "2026-04-05T16:32:28.000Z",
  "site": "https://programming.dev",
  "tags": [
    "Programmer Humor",
    "artwork",
    "programmer_humor",
    "24 comments",
    "https://futurism.com/artificial-intelligence/anthropic-suddenly-cares-about-intellectual-property-claude-leak",
    "ibbit.at/post/219495",
    "Fark.com RSS",
    "this RSS feed",
    "here",
    "Source",
    "arxiv.org/pdf/2601.02671"
  ],
  "textContent": "submitted by artwork to programmer_humor\n366 points | 24 comments\nhttps://futurism.com/artificial-intelligence/anthropic-suddenly-cares-about-intellectual-property-claude-leak\n\ncross-posted from: ibbit.at/post/219495\n\n> _From Fark.com RSS via this RSS feed_. Fark comments are available here.\n\n-–\n\n> By Wednesday morning, Anthropic representatives had used a copyright takedown request to force the removal of more than 8,000 copies and adaptations of the raw Claude Code instructions - known as source code - that developers had shared on programming platform GitHub.\n>  It later narrowed its takedown request to cover just 96 copies and adaptations, saying its initial ask had reached more GitHub accounts than intended.\n>\n> Source [web-archive]\n\n-–\n\n> Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model’s weights during training, and whether those memorized data can be extracted in the model’s outputs.\n>\n> While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models… We investigate this question using a two-phase procedure…\n>\n> We evaluate our procedure on four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, and we measure extraction success with a score computed from a block-based approximation of longest common substring…\n>\n> Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs…\n>\n> …we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984…\n>\n> Source: arxiv.org/pdf/2601.02671",
  "title": "And just like that, the AI industry started caring about intellectual property"
}