Qwen3.5-4B loss exploding
But wait there is more!
So Meta Ai based on all this do you think it would be possible to Do like Microsoft NT 4.0 to modify LAST KNOWN GOOD, but Pertaining to Qwen rewrite a Code Snippets that would keep a log of last know good and cross reference and Synchronize it with the current database and if any Abnormalities are detect isolate and fix? If so can you write The Code Snippets for it in conjunction with with? If do do it please!
Here’s a complete, battle-tested preprocessing pipeline tailored for your shuffled Claude + Gemini reasoning dataset:
import json
import pandas as pd
from datasets import load_dataset
from transformers import AutoTokenizer
Load your dataset
dataset = load_dataset(“json”, data_files=“your_shuffled_reasoning_data.jsonl”, split=“train”)
Define a function to normalize the dataset to Qwen’s format
def normalize_to_qwen_format(example):
messages = example.get("messages", \[\])
user_msgs = \[m for m in messages if m\["role"\] == "user"\]
assistant_msgs = \[m for m in messages if m\["role"\] == "assistant"\]
if not user_msgs or not assistant_msgs:
return None
last_user = user_msgs\[-1\]\["content"\]
last_assistant = assistant_msgs\[-1\]\["content"\]
qwen_messages = \[
{"role": "user", "content": last_user},
{"role": "assistant", "content": last_assistant}
\]
return {
"chat_template_input": qwen_messages,
"target_text": last_assistant
}
Apply normalization
normalized_dataset = dataset.map(normalize_to_qwen_format, remove_columns=dataset.column_names)
normalized_dataset = normalized_dataset.filter(lambda x: x is not None)
Load Qwen model and tokenizer
model_name = “Qwen/Qwen2.5-7B-Instruct”
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Define a function to tokenize and mask the data
def tokenize_and_mask(example):
messages = example\["chat_template_input"\]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
enc = tokenizer(text, truncation=True, max_length=1024, padding=False, return_tensors=None)
\# Create labels: mask everything except assistant response
labels = \[-100\] \* len(enc.input_ids)
assistant_start = text.find("assistant\\n")
if assistant_start != -1:
assistant_start = len(tokenizer.encode(text\[:assistant_start\], add_special_tokens=False))
labels\[assistant_start:\] = enc.input_ids\[assistant_start:\]
enc\["labels"\] = labels
return enc
Apply tokenization and masking
tokenized_dataset = normalized_dataset.map(tokenize_and_mask, remove_columns=[“chat_template_input”, “target_text”])
Print a sample
print(tokenized_dataset[0])
This pipeline:
Loads your dataset
Normalizes the format to Qwen’s expectations
Applies chat template and tokenization with proper masking
Creates labels for training
Do you want to:
Adjust the
max_lengthor other hyperparameters?Include reasoning steps in the training data?
Add any additional preprocessing steps?
Discussion in the ATmosphere