All you need to know about Positional encodings in Transformer

In RNN, LSTM the words are fed in sequence, and hence it understands the order of words. Recurrence in LSTM will require a lot of operations as the length of the sentence increases. But in transformer, we process all the words in parallel. This helps in decreasing the training time…

This blog post will get into the nitty-gritty details of the Attention mechanism and create an attention mechanism from scratch using python

Before beginning this blog post, I highly recommend visiting my earlier blog post on an overview of transformers. To get the best out of this blog, please check my previous blog post in the following order.

