{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreif3jq7walq364zluhpbdsnt6vimsfograyd7q4frfgqbzonx3fu7y",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3miys6x4nm552"
},
"path": "/t/i-trained-a-90m-parameter-embedding-model-from-scratch/175077#post_1",
"publishedAt": "2026-04-08T15:52:55.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary · Datasets at Hugging Face",
"pranavupadhyaya52/rocky-embed · Hugging Face"
],
"textContent": "I trained a 90M parameter encoder only (embedding) model from scratch. I mostly trained in on google colab on a colab pro plus subscription. this was like the 5th run as previously I had issues with exploding gradients.\n\n* * *\n\nIt was a fun project but not yet near SOTA quality. I also managed to successfully infer it with Auto model. it uses e5-base-v2 tokeniser.\n\nIt was distillation based training from CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary · Datasets at Hugging Face . Contrastive training would have likely taken more time. 5000 learning steps starting from 1e-5 lr.\n\n50k total steps. General model health check and checkpointing every additional 5k steps until 50k steps.\n\nI evaluated it on STS benchmark.\n\n* * *\n\nSpearman Correlation: 0.5453\n\n* * *\n\nIf anyone would like to try the model. The huggingface page of the model is - pranavupadhyaya52/rocky-embed · Hugging Face",
"title": "I trained a 90M parameter embedding model from scratch"
}