Accidental Attention Anchoring? Repeated phrase in SFT dataset drastically improved context adherence
Hi everyone,
I am currently training an LLM using Supervised Fine-Tuning (SFT) to implement a Chain of Thought (CoT) / reasoning structure.
During my first training run, due to a lack of verification across all pipeline stages, a meta-cognitive sentence generated by Google Gemini during the script preparation code was accidentally injected into the training data. Crucially, this exact same phrase was repeated across each and every training line in the dataset, placed right between the user prompt and the final response.
Surprisingly, instead of breaking the model, this unexpected phrase seems to act like a powerful attention anchor. It forced the model to maintain context alignment and structure—a behavior that I have completely lost and cannot reproduce now that I have “cleaned” the pipeline.
Here is the exact structure of the anomaly so you can see what the model absorbed:
User Prompt: “What is bacterAE and how is it used??? and what is python?”
The Injected “Anchor” (Repeated in every single line of the dataset): “I must provide an exact, technical, and structured response based on the shrimp dataset data.”
Final Expected Output: “BacterAE is a bacterial supplement used to accelerate and stabilize the nitrogen cycle… [Technical explanation] … Python is a high-level programming language…”
What I am trying to do now:
I am trying to replicate this phenomenon intentionally. I am formatting the dataset using the following sequence:
[User Prompt] → [Anchoring Phrase] → [Final Answer].
My question for the community:
1\. Am I doing something fundamentally wrong by chaining Prompt -> Anchor -> Response directly in the dataset?
Thanks in advance!
Discussion in the ATmosphere