{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif3jq7walq364zluhpbdsnt6vimsfograyd7q4frfgqbzonx3fu7y",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3miyyvupyo7i2"
  },
  "path": "/t/i-trained-a-90m-parameter-embedding-model-from-scratch/175077#post_1",
  "publishedAt": "2026-04-08T15:52:55.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary · Datasets at Hugging Face",
    "pranavupadhyaya52/rocky-embed · Hugging Face"
  ],
  "textContent": "I trained a 90M parameter encoder only (embedding) model from scratch. I mostly trained in on google colab on a colab pro plus subscription. this was like the 5th run as previously I had issues with exploding gradients.\n\n* * *\n\nIt was a fun project but not yet near SOTA quality. I also managed to successfully infer it with Auto model. it uses e5-base-v2 tokeniser.\n\nIt was distillation based training from CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary · Datasets at Hugging Face . Contrastive training would have likely taken more time. 5000 learning steps starting from 1e-5 lr.\n\n50k total steps. General model health check and checkpointing every additional 5k steps until 50k steps.\n\nI evaluated it on STS benchmark.\n\n* * *\n\nSpearman Correlation: 0.5453\n\n* * *\n\nIf anyone would like to try the model. The huggingface page of the model is - pranavupadhyaya52/rocky-embed · Hugging Face",
  "title": "I trained a 90M parameter embedding model from scratch"
}