company logo
Rakesh Pandey

Rakesh Pandey

How to use word embeddings to improve search?

by Rakesh Pandey,

What is Word Embedding?

Word embedding is the numerical representation of words that is used to enhance search by capturing semantic links between words and improving understanding of textual material. They can be represented as dense vectors in a continuous vector space using a variety of methods, the most common of which being Word2Vec, which is trained on large text corpora which increase the relevance of user search functionality.

For example in 3 dimensional space we can assign following vectors to given words:
Dimensions: Age, Gender, Size
Cow: 0.7, 1, 0.65
Calf: 0.35, 0, 0.25
Dog: 1, 1, 0.25
Puppy: 0.25, 0.125
Computer: 0, 0, 0.15
In real world applications, we use 200-300 dimensions to generate vectors for words.

How to create Embedding?

Let’s say we have a table of products with fields: Title, price, avg rating, no of reviews, brands, characters, categories and search_tags.

TitlePriceAvg RatingNo. of ReviewsBrandCharactersCategoriesSearch Tags
“Melissa & Doug Abacus – Classic Wooden Educational Counting Toy With 100 Beads”$134.816998Melissa & DougCharacter AEducational Counting Toy“abacus”, “wooden”, “100”, “beads”

An aggregation method like averaging or pooling can be used to construct product embeddings.
Average the word embeddings in the title, brand, characters, categories, and search tags for each product.

Words / PhrasesAssumed Embedding Vector
Product Title: “Melissa & Doug Giant Giraffe – Lifelike Stuffed Animal (over 4 feet tall)”[0.2, 0.4, -0.1, 0.6]
Brand: Melissa & Doug[-0.3, 0.1, 0.5, -0.2]
Character: Giant Giraffe[0.5, -0.3, 0.2, 0.7]
Category: Stuffed Animal[0.1, -0.2, -0.5, 0.3]
Tag 1: Stuffed Toys[0.4, -0.1, 0.2, 0.6]
Tag 2: Stuffed Animal Toy[-0.2, 0.5, 0.1, 0.4]
Tag 3: Giraffe[-0.5, -0.3, -0.2, 0.1]
Tag 4: Building Toys[-0.3, 0.2, 0.6, -0.4]

(title_embedding + brand_embedding + character_embedding + category_embedding + tag1_embedding + tag2_embedding + tag3_embedding) / 7 = [0.2, 0.4, -0.1, 0.6] + [-0.3, 0.1, 0.5, -0.2] + [0.5, -0.3, 0.2, 0.7] + [0.1, -0.2, -0.5, 0.3] + [0.4, -0.1, 0.2, 0.6] + [-0.2, 0.5, 0.1, 0.4] + [-0.5, -0.3, -0.2, 0.1] = [0.2, -0.3, 0.2, 1.3 ] / 7 = [0.0286, -0.0429, 0.0286, 0.5428]

The above average eventually produces a single vector representation for “Melissa & Doug Giant Giraffe – Lifelike Stuffed Animal (over 4 feet tall)”.

How to find user’s likes/dislikes through embeddings?

Once we have the embeddings for each product, we can calculate the similarity between user preferences and product embeddings using techniques like cosine similarity.

Let’s assume the user likes brand Melissa & Doug, category Stuffed Toys, and tag 3 Giraffe, and dislikes tag 4: Building Toys and character Spiderman.

User’s likes embedding: (brand_embedding + category_embedding + tag1_embedding) / 3 [-0.3, 0.1, 0.5, -0.2] + [0.1, -0.2, -0.5, 0.3] + [0.4, -0.1, 0.2, 0.6] / 3 = [0.2, -0.2, 0.2, 0.2] /3 = [0.066, -0.066, 0.066, 0.2]
User’s dislikes embedding: (tag4_embedding + character_embedding) / 2
[-0.3, 0.2, 0.6, -0.4] + [0.5, -0.3, 0.2, 0.7] / 2 = [0.2, -0.1, 0.8, 0.3] / 2 = [0.1, -0.05, 0.4, 0.15]
Now, we shall calculate the similarity between the user’s preferences and the product embeddings using cosine similarity.


Cosine Similarity (Product 1, User's likes)

To find the cosine similarity between the two vectors, let’s denote them as Vector A and Vector B:
Product vector: [0.0286, -0.0429, 0.0286, 0.1857]
Likes vector: [0.066, -0.066, 0.066, 0.2]
Dot Product = (0.0286 _ 0.066) + (-0.0429 _ -0.066) + (0.0286 _ 0.066) + (0.1857 _ 0.2)
Magnitude of product = sqrt((0.0286)^2 + (-0.0429)^2 + (0.0286)^2 + (0.1857)^2)
Magnitude of likes = sqrt((0.066)^2 + (-0.066)^2 + (0.066)^2 + (0.2)^2)
Cosine Similarity = Dot Product / (Magnitude A _ Magnitude B)
Cosine Similarity = (0.0286 * 0.066 + (-0.0429 * -0.066) + 0.0286 * 0.066 + 0.1857 * 0.2) / ((sqrt((0.0286)^2 + (-0.0429)^2 + (0.0286)^2 + (0.1857)^2)) _ (sqrt((0.066)^2 + (-0.066)^2 + (0.066)^2 + (0.2)^2)))
Cosine Similarity = 0.043734 / 0.0465252 ≈ 0.9406
Therefore, the cosine similarity between the given vectors A and B is approximately 0.9406.

Cosine Similarity(Product 1, User's dislikes)

Product vector: [0.0286, -0.0429, 0.0286, 0.1857]
Dislikes vector: [0.1, -0.05, 0.4, 0.15]
Cosine Similarity = 0.0443 / 0.097679 = 0.4534
Therefore, the cosine similarity between the given vectors A and B is approximately 0.4534.
Since Cosine Similarity for Product & Likes (0.9406) is greater than Cosine similarity for Product and Dislikes (0.4534), we will recommend this product to the customer.


  • #Word Embeddings,
  • #AI,
  • #ML