Why RNN?

Although there is already a many advanced algorithms available in the field of deep learning like feed forward neural network and convolutional neural network, we need recurrent neural network to train text data. As in image data we don't need sequence information but for textual data sequence information is all we need, hence here comes recurreent neural network which keeps this sequential information intact.

This property of this algorithm makes it possible to use for a task like machine translation, time series forecasting, and many other tasks.

Forward Propagation in RNN

Architechture of RNN

rnn

Lets learn the forward propagation in RNN step by step.

Step 1: Input
- Input is the sequence of words. so lets take the sequence of four words as <x11, x12, x13, x14>.
Step 2: Feeding Input to the Network
- For timestamp t = 1, we need to feed the input x11 by multiplying with weight w to the network. Once the network is fed with the input, it will generate the output o1 which is again fed to the same network and this is the reason RNN keeps track of the sequence information.
Step 3: Output
- Once we get the output for the timestamp t=2, t=3 and t=4 it is our output expression is given as:
  - o1 = f(x11*w)
  - o2 = f(x12w + o1w1)
  - o3 = f(x13w + o2w1)
  - o4 = f(x14w + o3w1)
- Now after getting final output we should pass it to sigmoid or to softmax layer as our requirements and get final output y_hat. step 4: Calculating Loss
- Loss is the error between our output and the expected output.
  - Loss(L) = (y_hat - y)

Backward Propagation in Recurrent Neural Network

We perform the backward propagation by using chain rule of differentiation. Here first we calculate the derivative of the loss with respect to the output of the network. and then perform the weight update of w11 as below:

weight2

After we update a weight of w11 now we need to update the weight of w as below:
Similarly in the same fashion all the weight will be updated. Once we reach the global minima after few iteration backpropagation will stop.
In above equation alpha is a learning parameter which is a hyperparameter and we can twist as our need. Also according to the positive or negative slope of the loss function, sign in the weight update equation changes.

Problems in Recurrent Neural Network

It will suffer from Vanishing and Exploding Gradient Problem. In the hidden network we use a activation function as sigmoid or Relu.

If we use sigmoid function as our activation function then the gradient will be vanishing since the derivative of sigmoid function is between 0 and 1. Here we have to perform many weight updation and multiplying each samll value of weight with the derivative of sigmoid function create very small weight which is negligible so that it will never converge to global minima.
If we use Relu function as our activation function then the gradient will be exploding since the derivative of Relu function is greater than 1. Here we have to perform many weight updation and multiplying each big value of weight with the derivative of Relu function create very large weight which is too large so that it will never converge to global minima.

Solution to the above problem

Using LSTM RNN is the solution for this problem which I will explain in my next blog.