Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicn3i6gnnshgcxjeoessf5rk2koz6o5xnqet2lnoomy3gj7b4bvfu",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moj2wehihxt2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreia7ufynxchhkwiilzrj4bqiz3zlkyisdvzmiqj4lh3rjdvubhgmqm"
    },
    "mimeType": "image/webp",
    "size": 46962
  },
  "path": "/trevorthecreativeguy/how-to-redact-pii-before-sending-prompts-to-openai-claude-or-gemini-25gg",
  "publishedAt": "2026-06-17T19:21:56.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "privacy",
    "security",
    "webdev",
    "LLM Privacy Shield on RapidAPI"
  ],
  "textContent": "If you send user text to an LLM, you are probably sending personal data with it without meaning to. A support message, a chat transcript, a pasted form. They carry names, emails, phone numbers, and sometimes card numbers, and all of it ends up in your prompt. Once that prompt leaves your server, the personal data is sitting in someone else's logs, which is a real problem under GDPR and HIPAA.\n\nThe fix is simpler than most people expect, and it does not mean giving up the model. You redact the personal data before the prompt goes out, send the safe version to the model, then put the real values back into the answer. This post walks through that pattern with working code.\n\n##  The pattern in three steps\n\n  1. Redact. Take your raw text and swap each piece of personal data for a placeholder like `[EMAIL_1]`. Keep a small map on your side that records which placeholder stands for which real value.\n  2. Call the model with the redacted text. The model only ever sees placeholders, so the real data never lands in logs you do not control.\n  3. Restore. Take the model's reply and your map, and put the real values back so the final output is useful to your user.\n\n\n\nThat is the whole idea. The personal data takes a round trip through placeholders and never leaves your stack.\n\n##  A real example\n\nSay you run a support tool that uses GPT to summarize customer emails. A customer writes in with their email and phone number. You want the summary, but you do not want to ship their contact details to OpenAI.\n\nHere is the flow end to end. I am using a small API I built for the redact and restore steps, but the shape is the same no matter how you handle those two calls.\n\n\n\n    // Real example: summarize a customer support email with GPT,\n    // without sending any personal data to OpenAI.\n    const SHIELD = \"https://llm-privacy-shield.p.rapidapi.com\";\n    const shieldHeaders = {\n      \"content-type\": \"application/json\",\n      \"x-rapidapi-host\": \"llm-privacy-shield.p.rapidapi.com\",\n      \"x-rapidapi-key\": \"YOUR-RAPIDAPI-KEY\"\n    };\n\n    async function summarizeSafely(customerEmail) {\n      // 1. Redact PII before the text leaves your server.\n      const { redacted, token_map } = await fetch(SHIELD + \"/api/redact\", {\n        method: \"POST\",\n        headers: shieldHeaders,\n        body: JSON.stringify({ text: customerEmail })\n      }).then(res => res.json());\n\n      // customerEmail:\n      //   \"Hi, my confirmation went to john@acme.com but nothing arrived.\n      //    Please call me back on 415-555-0132.\"\n      // redacted:\n      //   \"Hi, my confirmation went to [EMAIL_1] but nothing arrived.\n      //    Please call me back on [PHONE_1].\"\n\n      // 2. Send only the safe version to your model. OpenAI is shown here.\n      //    Anthropic and Gemini work the same way, just swap this call.\n      const completion = await openai.chat.completions.create({\n        model: \"gpt-4o-mini\",\n        messages: [\n          { role: \"system\", content: \"Summarize this support email in one sentence.\" },\n          { role: \"user\", content: redacted }\n        ]\n      });\n      const summary = completion.choices[0].message.content;\n\n      // 3. Put the real contact details back for your support agent.\n      const { restored } = await fetch(SHIELD + \"/api/restore\", {\n        method: \"POST\",\n        headers: shieldHeaders,\n        body: JSON.stringify({ text: summary, token_map })\n      }).then(res => res.json());\n\n      return restored;\n    }\n\n\nThe customer's email and phone number never reach OpenAI. The model summarizes placeholder text, and you swap the real values back in at the end for your support agent to read.\n\n##  The two calls, side by side\n\nIf you just want to see the redact and restore calls on their own:\n\n\n\n    // The two calls, at a glance\n    const HOST = \"https://llm-privacy-shield.p.rapidapi.com\";\n    const headers = {\n      \"content-type\": \"application/json\",\n      \"x-rapidapi-host\": \"llm-privacy-shield.p.rapidapi.com\",\n      \"x-rapidapi-key\": \"YOUR-RAPIDAPI-KEY\"   // from your RapidAPI dashboard\n    };\n\n    const r = await fetch(HOST + \"/api/redact\", {\n      method: \"POST\",\n      headers,\n      body: JSON.stringify({ text: \"Email john@acme.com about invoice 4521\" })\n    }).then(x => x.json());\n\n    // r.redacted   => \"Email [EMAIL_1] about invoice 4521\"\n    // r.token_map  => { \"[EMAIL_1]\": \"john@acme.com\" }\n\n    const back = await fetch(HOST + \"/api/restore\", {\n      method: \"POST\",\n      headers,\n      body: JSON.stringify({ text: modelReply, token_map: r.token_map })\n    }).then(x => x.json());\n    // back.restored => the model reply with the real values put back\n\n\nThe redact call hands you back the safe text and the `token_map`. You hold the map. The restore call takes that map and the model's reply and rebuilds the real answer.\n\n##  The three modes\n\nRedaction is not one size fits all. There are three modes worth knowing:\n\n  * **Tokenize** (reversible). Replaces each value with a placeholder you can reverse later. Use this when you need the real values back in the answer.\n  * **Mask**. Replaces a value with a generic label like `<EMAIL>`. Good when you never need it back.\n  * **Remove**. Deletes the value entirely.\n\n\n\nTokenize is the default in the examples above, because the restore step depends on it.\n\n##  Do you have to use a hosted API?\n\nNo. If you would rather self-host, Microsoft Presidio is a solid open source option for detecting and anonymizing PII. The redact then restore pattern is the part that matters, and it works the same whether you run it yourself or call a service.\n\nI built a hosted version because I wanted the redact and the restore in one place, running in-process so the protected data never goes to a third party, with a response time under a millisecond. It detects emails, phone numbers, SSNs, credit cards, IP addresses, and API keys, and there is a free tier if you want to try it without committing to anything paid:\n\nLLM Privacy Shield on RapidAPI\n\n##  Wrapping up\n\nIf your app sends user text to an LLM, run it through a redact step first. Keep the map, send placeholders to the model, restore at the end. Your users get the same useful output, and the personal data stays where it belongs.\n\nWhat new app or existing pipeline can you heighten privacy using this API?",
  "title": "How to Redact PII before sending prompts to OpenAI, Claude, or Gemini"
}