External Publication

Qwen3.5-4B loss exploding

Hugging Face Forums [Unofficial] March 14, 2026

But wait there is more!

So Meta Ai based on all this do you think it would be possible to Do like Microsoft NT 4.0 to modify LAST KNOWN GOOD, but Pertaining to Qwen rewrite a Code Snippets that would keep a log of last know good and cross reference and Synchronize it with the current database and if any Abnormalities are detect isolate and fix? If so can you write The Code Snippets for it in conjunction with with? If do do it please!

Here’s a complete, battle-tested preprocessing pipeline tailored for your shuffled Claude + Gemini reasoning dataset:

import json

import pandas as pd

from datasets import load_dataset

from transformers import AutoTokenizer

Load your dataset

dataset = load_dataset(“json”, data_files=“your_shuffled_reasoning_data.jsonl”, split=“train”)

Define a function to normalize the dataset to Qwen’s format

def normalize_to_qwen_format(example):

messages = example.get("messages", \[\])

user_msgs = \[m for m in messages if m\["role"\] == "user"\]

assistant_msgs = \[m for m in messages if m\["role"\] == "assistant"\]

if not user_msgs or not assistant_msgs:

    return None

last_user = user_msgs\[-1\]\["content"\]

last_assistant = assistant_msgs\[-1\]\["content"\]

qwen_messages = \[

    {"role": "user", "content": last_user},

    {"role": "assistant", "content": last_assistant}

\]

return {

    "chat_template_input": qwen_messages,

    "target_text": last_assistant

}

Apply normalization

normalized_dataset = dataset.map(normalize_to_qwen_format, remove_columns=dataset.column_names)

normalized_dataset = normalized_dataset.filter(lambda x: x is not None)

Load Qwen model and tokenizer

model_name = “Qwen/Qwen2.5-7B-Instruct”

tokenizer = AutoTokenizer.from_pretrained(model_name)

if tokenizer.pad_token is None:

tokenizer.pad_token = tokenizer.eos_token

Define a function to tokenize and mask the data

def tokenize_and_mask(example):

messages = example\["chat_template_input"\]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)

enc = tokenizer(text, truncation=True, max_length=1024, padding=False, return_tensors=None)

\# Create labels: mask everything except assistant response

labels = \[-100\] \* len(enc.input_ids)

assistant_start = text.find("assistant\\n")

if assistant_start != -1:

    assistant_start = len(tokenizer.encode(text\[:assistant_start\], add_special_tokens=False))

    labels\[assistant_start:\] = enc.input_ids\[assistant_start:\]

enc\["labels"\] = labels

return enc

Apply tokenization and masking

tokenized_dataset = normalized_dataset.map(tokenize_and_mask, remove_columns=[“chat_template_input”, “target_text”])

Print a sample

print(tokenized_dataset[0])

This pipeline:

Loads your dataset
Normalizes the format to Qwen’s expectations
Applies chat template and tokenization with proper masking
Creates labels for training

Do you want to:

Adjust the max_length or other hyperparameters?
Include reasoning steps in the training data?
Add any additional preprocessing steps?