Dev.to6d ago1 min read

Cosine Similarity vs Dot Product in Attention...

For comparing the hidden states between the encoder and decoder, we need a similarity score. Two common approaches to calculate this are: Cosine similarity Dot product Cosine Similarity It performs a dot product on the vectors and then normalizes the result. Example Encoder output: [-0.76, 0.75] Decoder output: [0.91, 0.38] Cosine similarity ≈ -0.39 Close to 1 → very similar → strong attention Close to 0 → not related Negative → opposite → low attention This is useful when: Values can vary a lot

Read original on dev.to