External Publication

Vector Store search: correct chunk only retrieved at top_k ≥ 45

OpenAI Developer Community May 19, 2026

When searching a Vector Store via the /vector_stores/{id}/search endpoint, the chunk with the highest similarity score is not returned unless max_num_results is set to 45 or higher. This violates the basic invariant that top_k=N results should be a strict prefix of top_k=M results when M > N.

This appears to be a severe HNSW recall issue, not a ranking issue — the correct chunk exists in the index with a high score (0.88), but the retrieval engine fails to surface it at small top_k values.

Environment

API: POST /v1/vector_stores/{vector_store_id}/search
Vector store: ~40 markdown files, default chunking strategy
Embedding model: default
No filters, no custom ranking options, no rewrite_query
Query: a short non-English search phrase (4 words)
Expected top result: a specific document (referred to as Doc A below) that is the most semantically relevant to the query

Reproduction

Same query, same vector store, same API call — only max_num_results changes.

Run 1: `max_num_results = 50`

Rank 1: Doc A    score = 0.8838  ← correct, highest score
Rank 2: Doc B    score = 0.8300
Rank 3: Doc C    score = 0.7991
Rank 4: Doc D    score = 0.7846
...

Doc A ranks #1 with the highest score (0.8838).

Run 2: `max_num_results = 2`

Rank 1: Doc B    score = 0.8300  ← wrong document
Rank 2: Doc C    score = 0.7991

Doc A is completely missing , despite having the highest score (0.8838) in Run 1.

Threshold testing

I tested max_num_results at multiple values to find when Doc A first appears in the results:

max_num_results	Doc A in results?	Doc A rank
2		-–
5		-–
10		-–
20		-–
30		-–
44		-–
45		1
50		1

Doc A only appears starting at max_num_results = 45, and when it does appear, it is ranked #1 with the highest score by a clear margin.

Why this is a bug, not expected behavior

In a correctly functioning vector search:

top_k = N results MUST be a strict prefix of top_k = M results when M > N, assuming deterministic ranking by score
Recall@10 on a small index (~few hundred chunks) should be ≥95% for HNSW with reasonable parameters
A chunk with score 0.8838 should never be excluded from results that include chunks with scores 0.7991 and below

All three properties are violated here. The most likely root cause is that ef_search (HNSW exploration parameter) is set too low and/or scales with top_k, causing graph traversal to terminate before reaching the node containing Doc A’s embedding.

Environment

Reproduction

Run 1: max_num_results = 50

Run 2: max_num_results = 2

Threshold testing

Why this is a bug, not expected behavior

Discussion in the ATmosphere

Run 1: `max_num_results = 50`

Run 2: `max_num_results = 2`