OpenSearch Semantic Search

Chris June 11, 2026
Source
As part of my recent work, I've been using a lot of OpenSearch's vector search. As it's quite a new topic for me, I thought it would be worth writing up my thoughts and understanding of the tech. Hopefully you find it useful too! Vector Embeddings So, an initial point for vector searching (sometime known as semantic searching) is setting up an index which will store your vector embeddings So what is a vector embedding!? Vector embeddings are essentially co-ordinates in the form of a vector [0.4, -0.87, 0.1263, ...] that map the meaning of a word or chunk of text. (Most unstructured data can also be vectorised, including images! However, I've not done much work on semantic searching of images... yet.) In the diagram above, you can see an example of some words mapped onto a very simple two-dimensional space with two parameters, loudness (x-axis) and positivity (y-axis). In this case the words above would have vectors of: Cheer \[1, 1\] Laugh \[0.5, 1\] Giggle \[-0.2, 0.6\] Whimper \[-0.7, -0.5\] Sob \[-1, -1\] From the above, we can see easily which words have a similar meaning within our parameters; for example, both a laugh and a cheer are both very positive and quite loud and so are close in semantic meaning (and are physically close on our map). Of course, realistically, the embeddings generated by LLM models have an incredibly higher number of dimensions (typically 384, 768, 1024, or even higher depending upon the model). This allows them to store much more semantic information that my two dimensional diagram above! These systems capture much more context, including synonyms, themes, etc. For example, "dog" and "chihuahua" would match relatively closely on that multi-dimensional vector. How does this become a search? So, OpenSearch measures the mathematical distance (in meaning) between a query that a user has entered, and vectors that are stored in our indexed database. When a user enters a query, that query is run through the exact same embedding model that was

Discussion in the ATmosphere

Loading comments...