Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihyr3duv3wslvfxlltozc2xpt7y4pbb2p5jekcz2aivm7z2ra2xbi",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnccgbadugn2"
  },
  "path": "/t/fine-tuning-an-slm-for-a-low-resource-language/176467#post_1",
  "publishedAt": "2026-06-02T08:30:25.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Hello, I am fine-tuning an SLM for an AI festival. I am aiming to make the model stronger in a specific language. Unfortunately, there are not many fine-tuning-ready datasets for the language I’m aiming for, and because of my hardware limitations and internet restrictions, I cannot continue pretraining the model. I wanted to ask two things: Can I use LoRA to simulate continued pretraining? And how can I build a QA dataset from raw Wikipedia dumps?",
  "title": "Fine-Tuning an SLM for a Low-Resource Language"
}