What is ANN?

  • An arttificial neural networks is a computational network/model that mimics the biological neural network. Let’s take an example, when a hot object is placed in a hand, neuron associated in that hand will be activated and trigger the output that is required to prevent from burning but neuron associated in the other hand will not be activated.
  • Artificial neural networks uses a learning algorithm to learn the patterns in the data by adjusting the weights and biases of the neural network which is called as back propagation.
  • There are three layers in a artificial neural networks, they are input, hidden and output layers.

ann2 Photo credit–>Berkeley Scientific Journal

  • The input layer is the layer that takes the input data.
  • The hidden layer is the layer that takes the input from the input layer, perform a complex operation to find a pattern in a data and gives the output to the output layer.
  • The output layer is the layer that takes the output from the hidden layer and gives the output.

How does ANN work?

I will show the working stepwise fashion.

ann3 Photo Credit–>Data Flair

  • STEP 1: The input layer takes the input data.
  • STEP 2: The hidden layer takes the input from the input layer, perform a complex operation to find a pattern in a data and gives the output to the output layer. Here in this layer, summation of the product of input from the input layer and the weights of the hidden layer is performed and activation function is applied to it. ann0

ann1

ann4

  • STEP 3: The output layer takes the output from the hidden layer and gives the output. Output expression is given as:

    ann5

    • STEP 4: Now loss is calculated in the output layer. Loss is the difference between actual value and the predicted value.
      • There are different types of loss function for classification and regression problems.
      • For Classification:

        - Binary Cross-Entropy
        - Categorical Cross-Entropy
        - Sparse Categorical Cross-Entropy
        -  Hinge Loss
        - Squared Hinge Loss
        - Kullback Leibler Divergence Loss.
        
      • For Regression:

        - Mean Squared Error
        - Mean Squared Logarithmic Error
        - Mean Absolute Error.
        
    • STEP 5: Update the weight of that node which has made a more error by the help of back propagation.
    • STEP 6: Repeat the steps from step 2 to step 5 until we don’t reach the global minimum that means our model is generalised well. This step is also called as optimization, which is done by the help of optimization function.
    • Different types of optimization functions are:
      • Gradient Descent
      • Stochastic Gradient Descent
      • Mini-batch Gradient Descent
      • Momentum
      • AdaGrad
      • RMSProp
      • Adam( Most used optimization function)

INTERVIEW TOPICS

1. What is the difference between Deep Learning and Artificial Neural Networks?

  • When researchers started to create large artificial neural networks, they started to use the word deep to refer to them.
  • As the term deep learning started to be used, it is generally understood that it stands for artificial neural networks which are deep as opposed to shallow artificial neural networks.
  • Deep Artificial Neural Networks and Deep Learning are generally the same thing and mostly used interchangeably.

2. What is the shallow neural network?

  • A network with only one hidden layer is called as shallow neural network.

3. What is the difference between Forward Propagation and Backward Propagation?

  • Forward Propagation is the process of taking the input and passing it through the network to get the output. Each hidden layer accepts the input data, processes it as per the activation function, and passes it to the successive layer.
  • Back propagation is the practice of fine-tuning the weights of the neural network based on the error rate obtained from the previous epoch. Proper tuning of the weights ensures low error rates, making the model more reliable.

How would you prevent Overfitting when designing an Artificial Neural Network?

  • Training the model in more data. Giving the model an infinite number of examples will plateau the model in terms of what the capacity of the network is capable of learning.
  • Changing the network structure (number of weights). Pruning by removing the nodes in the network will counteract the overfitting.
  • Changing the network parameters (values of weights). Larger weights cause sharp transitions in the activation functions and cause large changes in output for small changes in inputs. This method to keep weights small is called Regularization.

What are some similarities between SVMs and Neural Networks?

  • Parametric: SVM and neural networks are both parametric but for different reasons.
    • For SVM the typical parameters are; soft-margin parameter (C), parameter of the kernel function (gamma).
    • Neural networks also have parameters but it is a lot more than SVM. Some NN parameters are the number of layers and their size, number of training epochs, and the learning rate.
  • Embedding Non-Linearity: Both the methods can embed non-linear functions. In SVM, the non-linearity is the kernel function. In NN, the non-linearity is the activation function.

  • Comparable Accuracy:
    • If both SVM and Neural Networks are trained in the same dataset, given the same training time, and the same computation power they have comparable accuracy.
    • If neural networks are given as much computation power and training time as possible then it outperforms SVMs.

What is epoch?

  • Epoch is the number of times the training data is passed through the network and it consist of following steps:

  • Initialize the weights of the network.

    • Forward Propagation.
    • Backward Propagation.
    • Update the weights.

Why we use activation function in neural network?

  • It is used to bring following properties:
    • Non Linearity
      • Change of the input is not proportional to the change of the output
    • Differentiability
      • A differentiable function of one real variable is one whose derivative occurs at each point in its domain.
      • The gradient of loss function is calculated during backpropagation using gradient decent method so activation function must be differentiable with respect to its input.
    • Continous
      • Continous variation of the argument causes the continous variation of the function value. This means there is not abrupt changes in value
    • Bounded
      • Bounded function are limited by some form of boundary condition. A bounded function ranges has both lower and upper limit.
      • This is relevant for neural networks because the activation function is responsible for keeping the output values within a certain range, otherwise, the values may exceed reasonable values
    • Zero-centering
      • When a function range contains both positive and negative value then it is called as zero centered.
      • If a function is not zero-centered, as the sigmoid function, the output of a layer is always shifted to either positive or negative values. As a result, the weight matrix requires more updates to be adequately trained, increasing the number of epochs needed to train the network. This is why the zero-centered property is useful, even if it isn’t required.