Skip to content
Dev.to1 min read

Cosine Similarity vs Dot Product in Attention...

For comparing the hidden states between the encoder and decoder, we need a similarity score. Two common approaches to calculate this are: Cosine similarity Dot product Cosine Similarity It performs a dot product on the vectors and then normalizes the result. Example Encoder output: [-0.76, 0.75] Decoder output: [0.91, 0.38] Cosine similarity ≈ -0.39 Close to 1 → very similar → strong attention Close to 0 → not related Negative → opposite → low attention This is useful when: Values can vary a lot
Read original on dev.to
0
0

Comment

Sign in to join the discussion.

Loading comments…

Related

Get the 10 best reads every Sunday

Curated by AI, voted by readers. Free forever.

Liked this? Start your own feed.

0
0