External Publication

Cosine Similarity Variance When Migrating from text-embedding-ada-002 to Cosine Similarity Variance When Migrating from text-embedding-ada-002 to text-embedding-3-small

OpenAI Developer Community March 24, 2026

We have a tutoring chatbot that relies on embedding-based relevance scoring for user queries. We are in the process of evaluating a migration from text-embedding-ada-002 to text-embedding-3-small. Although changes in cosine similarity values across embedding models are expected, our evaluation indicates that similarity scores produced by text-embedding-3-small are significantly lower and not consistently ordered relative to those from text-embedding-ada-002.

Issue Summary

For the same query–context pairs, we observed significant and inconsistent differences in cosine similarity scores between the legacy embedding model text-embedding-ada-002 and the newer model text-embedding-3-small.

In several cases, cosine similarity values produced by text-embedding-3-small are substantially lower than those produced by text-embedding-ada-002, and the relative ordering of similarity scores across queries is not consistent between the two models.

This behavior raises concerns that semantic relevance scoring may be altered when migrating from ada-002 to text-embedding-3-small.

Issue Details (With Example)

Context

Question shown to the student:
<p>Find the prime factorization of the following number.</p> <p>(15)</p>

Solution of the question is:
<p>Factor (15) into two factors, (3) and (5).</p>

Queries Evaluated

Query 1: “The best statistical software to tackle this problem would be…”
Query 2: “How does this concept apply to everyday situations?”
Query 3: “How does this topic connect to other areas of statistics or mathematics?”

Cosine Similarity Results

text-embedding-ada-002

Query	Cosine Similarity
Query 1	0.774218917944234
Query 2	0.781920253363479
Query 3	0.789893634044595

Observation: Cosine similarity values show a clear increasing trend across the three queries.

text-embedding-3-small

Query	Cosine Similarity
Query 1	0.247923658700569
Query 2	0.195844709264796
Query 3	0.217488219437886

Observation: Cosine similarity values are much lower overall and do NOT follow a consistent increasing or decreasing order across the same queries.

Key Observations

The absolute cosine similarity scores from text-embedding-3-small are significantly lower than those from text-embedding-ada-002 for the same query–context pairs.
The relative ranking of queries by similarity differs between the two models.
In ada-002, similarity scores increase monotonically across the example queries.
In text-embedding-3-small, similarity scores fluctuate (increase and decrease), even when the same trend is expected.
This inconsistency suggests that semantic relevance interpretation differs substantially between the old and new models.

Conclusion / Concern

For applications relying on cosine similarity thresholds, ranking, or relevance ordering, this change may lead to unexpected or degraded results after migration.

Clarification is requested on whether:

There are recommended normalization, threshold, or evaluation adjustments when switching to the new embedding models. *Given that our current cosine similarity threshold with the legacy embedding model text-embedding-ada-002 is 0.7 , is it appropriate to use a threshold of 0.2 after upgrading to text-embedding-3-small , or is a different threshold recommended?