Sequence to Sequence Model

We are using sequence to sequence model in our daily life which include applications like Google Translate, voice-enabled devices and online chatbots.
This model is introduced by Google, a sequence to sequence model aims to map a fixed-length input with a fixed-length output where the length of the input and output may differ.
For example, translating “What are you doing today?” from English to Chinese has input of 5 words and output of 7 symbols (今天你在做什麼？). Clearly, we can’t use a regular LSTM network to map each word from the English sentence to the Chinese sentence.

Working of Sequence to Sequence Model

seq1

Sequence to sequence model consist of three layers

Encoder Layer:
- Here several recurrent units (LSTM or GRU cells) where each accepts a single element of the input sequence, collects information for that element and propagates it forward.
- When translating english language to another language, here each english word is the input for encoder model. Each word is represented as x_i where i is the order of that word.
- Now we have to calculate the hidden state which can be done using following formula:
  - Here is above figure we calculate the hidden state by just multiplying previous hidden state with the weight, summed it with the multiplication of input and weight and then applying the activation function.
Encoder Vector:
- This is the final hidden state produced from the encoder part of the model. It is calculated using the formula above.
- This vector aims to encapsulate the information for all input elements in order to help the decoder make accurate predictions.
- This acts as the initial hidden state for the decoder and is calculated using above mentioned formula.
Decoder Layer:
- Here several stack of RNN cells are used with giving output y_t at a time step t.
- Each recurrent unit accepts a hidden state from the previous unit and produces and output as well as its own hidden state.
- Here is language translation problem, the output sequence is a collection of all words from the the french vocabulary. Each word is represented as y_i where i is the order of that word.
- Here hidden state is computed as:
- We are just using the previous hidden state to compute the next one.
- The output y_t at time step t is computed using the formula:
- We calculate the outputs using the hidden state at the current time step together with the respective weight W(S). Softmax is used to create a probability vector which will help us determine the final output (e.g. Language translation).

Summary

Here in this blogpost we have covered the basics of sequence to sequence model. We have also covered the working of the model. In next blog we will learn about the implementation of the model.