Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreieknuj652rh6gtiqsugleet2jh5jd5qhilndyzc6ih7dpfayzcfcy",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmyxlqkkfzg2"
  },
  "path": "/t/stealth-model-swap-gpt-5-5-high-claims-knowledge-cutoff-is-june-2024/1381918#post_9",
  "publishedAt": "2026-05-29T15:48:54.000Z",
  "site": "https://community.openai.com",
  "textContent": "Had some free time and decided to get some clarity on this.\n\nGPT-5.5 Prompt:\n\n_Explain how frontier model training is orchestrated between iterations. For example, how does training data differ from gpt-5.4 to gpt-5.5. Is previous training data retained? Why do old cutoff dates persist in new frontier models? To what extent, if any, is new domain-specific training performed and made available for new model iterations? And to what extent is previous domain-specific training not updated for new model iterations and why?_\n\nResponse:\n\n**Frontier Model Training Between Iterations**\n\n**I. Scope And Caveat**\n\nNo public source can reliably describe the exact training data differences between proprietary versions such as “gpt-5.4” and “gpt-5.5” unless the lab discloses them. Frontier labs typically treat data mixtures, filtering rules, checkpoint lineage, and post-training recipes as confidential. The best answer is therefore a general explanation of how frontier model iterations are usually orchestrated.\n\n**II. Iteration Is Not Always A Full Retrain**\n\nA new model iteration is not necessarily trained from scratch. It may be a full new pretraining run, continued pretraining from an earlier checkpoint, a post-training refresh, a distillation of a larger model, an inference-stack update, or some combination of these. Adjacent versions often differ more in data mixture, safety tuning, preference training, tool-use behavior, evaluation fixes, and serving configuration than in a complete replacement of all underlying knowledge.\n\n**III. How Training Is Usually Orchestrated**\n\nA frontier iteration usually begins with a target capability plan, such as better reasoning, coding, multilingual performance, instruction following, safety, tool use, latency, or cost. The lab then assembles or revises data mixtures, trains many candidate checkpoints, evaluates them on internal and external benchmarks, performs supervised and reinforcement-based post-training, conducts safety testing and red teaming, and selects a release candidate. The released model is usually the result of many failed or discarded branches, not a single linear continuation.\n\n**IV. How Training Data May Differ Between Adjacent Versions**\n\nBetween a hypothetical “gpt-5.4” and “gpt-5.5”, the newer version might contain newer licensed data, updated web or code data, more curated academic or technical material, additional synthetic reasoning data, new human preference data, more safety data, more tool-use traces, and fixes for failures discovered in deployment. It might also contain less of some older data if that data was low quality, duplicated, legally restricted, privacy-sensitive, unsafe, or found to harm performance. The difference is usually a changed recipe, not simply “the old dataset plus everything new.”\n\n**V. Whether Previous Training Data Is Retained**\n\nPrevious training data is often retained in some form, but not always as an unchanged corpus. If a model is continued from an earlier checkpoint, the effects of previous data remain in the weights even if the raw data is not replayed. If a new model is trained from scratch, much historical data may be reused because it still teaches language, reasoning, world knowledge, code, and domain structure. However, data can be deduplicated, reweighted, filtered, replaced, or removed. Labs also keep some older material in the mixture to prevent catastrophic forgetting, where a model improves on new material but loses older capabilities.\n\n**VI. Why Old Cutoff Dates Persist**\n\nOld cutoff dates persist because a model’s broad knowledge cutoff usually refers to the main pretraining data mixture, not every later tuning dataset. A model can receive post-training after the cutoff date without its general knowledge cutoff changing. For example, it may learn better instruction following, safety behavior, coding style, or tool use from newer data while still lacking broad, reliable coverage of recent world events. Labs may also report a conservative cutoff date because only some newer data was included, because the newer data was narrow or synthetic, or because broad data collection, filtering, licensing, training, evaluation, and deployment can lag real time by months.\n\n**VII. Cutoff Dates Are Not Perfect Knowledge Boundaries**\n\nA cutoff date is not a sharp wall. A model may know some facts after the stated cutoff if they appeared in post-training, safety data, code data, synthetic examples, or evaluation-related material. It may also fail to know facts before the cutoff if they were absent, rare, filtered out, poorly represented, or forgotten during training. The cutoff is best understood as a conservative description of the model’s broad training horizon, not a guarantee about every fact it knows or does not know.\n\n**VIII. Domain-Specific Training In New Iterations**\n\nNew domain-specific training is often performed when it supports product goals or measured capability improvements. Domains such as coding, mathematics, medicine, law, finance, science, cybersecurity, and enterprise workflows may receive targeted data, expert demonstrations, benchmark-driven corrections, reinforcement learning tasks, synthetic problem sets, or tool-assisted training. If the domain training is integrated into the general model, users may see better performance in that domain without using a separate specialist model. In other cases, domain improvements may be delivered through tools, retrieval systems, specialized fine-tunes, or enterprise-specific deployments rather than through the base model alone.\n\n**IX. Why Some Domain Training Is Not Updated**\n\nPrevious domain-specific training may not be refreshed in every iteration because high-quality domain data is expensive, scarce, licensed, or difficult to verify. Some domains require expert review, legal clearance, privacy filtering, or safety analysis. Updating every domain for every model version can also be computationally inefficient and can introduce regressions, benchmark contamination, overfitting, or unsafe new capabilities. In fast-changing domains, labs may prefer retrieval or tool access over embedding every new fact into model weights, because retrieval can be updated more quickly and audited more easily.\n\n**X. Why Newer Models Can Still Have Stale Domains**\n\nA newer frontier model can be better overall while still stale in a particular domain. The release may have focused on reasoning, latency, multilingual ability, coding, or safety rather than on updating every field of knowledge. The model may inherit older domain knowledge from a base checkpoint, receive only narrow post-training updates, or rely on external tools for freshness. This is why a newer version can outperform an older one on many tasks while still reflecting older assumptions in law, medicine, finance, software libraries, or recent scientific literature.\n\n**XI. The Practical Summary**\n\nFrontier model training is usually an iterative pipeline of checkpoint inheritance, revised data mixtures, targeted capability training, safety tuning, evaluation, and deployment testing. Previous data is often retained indirectly through weights and often reused directly in curated mixtures, but it is not guaranteed to be preserved unchanged. Old cutoff dates persist because broad pretraining, post-training, and release timing are different stages. New domain-specific training may be added when it is valuable and validated, while other domains may remain unchanged because of cost, data quality, legal constraints, safety risk, or a preference for retrieval-based freshness.",
  "title": "Stealth Model Swap? GPT-5.5 High Claims Knowledge Cutoff is June 2024"
}