What is Machine Translation?

Machine translation is the process of automatically translating content from one language (the source) to another (the target) without any human input.
Here input sequences and output sequences have different lengths and the entire input sequence is required in order to start predicting the target.
This is the example of Sequence to Sequence Model. Lets see how to implement it.
To understand following code you should have a good grap of what RNN, LSTM, Sequence to Sequence Model. So if you don't know then go through my previous post.

Data

Here in this project, we are going to translate the English text to Indian text and we are using this data.

Data Preprocessing

Vectorizing Data

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
import tensorflow as tf
import numpy as np

batch_size = 64  # Batch size for training.
epochs = 130  # Number of epochs to train for.
latent_dim = 256  # Latent dimensionality of the encoding space.
num_samples = 10000  # Number of samples to train on.

#Path to the data txt file on disk.
data_path = './lan_data/fra.txt'

input_texts = []
target_texts = []
input_charecters = set()
target_charecters = set()
def vectorize_text(text_path):
    with open(text_path, 'r', encoding='utf-8') as f:
        lines = f.read().split('\n')
    for line in lines[: min(num_samples, len(lines) - 1)]:
        input_text, target_text, _ = line.split('\t')
        
        # We use "tab" as the "start sequence" character
        # for the targets, and "\n" as "end sequence" character.
        target_text = '\t' + target_text + '\n'
        input_texts.append(input_text)
        target_texts.append(target_text)
        for char in input_text:
            if char not in input_charecters:
                input_charecters.add(char)
        for char in target_text:
            if char not in target_charecters:
                target_charecters.add(char)
    return input_charecters, target_charecters

input_charecters, target_charecters = vectorize_text(data_path)
input_charecters= sorted(list(input_charecters))
target_charecters = sorted(list(target_charecters))
num_encoder_tokens = len(input_charecters)
num_decoder_tokens = len(target_charecters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print("Number of samples:", len(input_texts))
print("Number of unique input tokens:", num_encoder_tokens)
print("Number of unique output tokens:", num_decoder_tokens)
print("Max sequence length for inputs:", max_encoder_seq_length)
print("Max sequence length for outputs:", max_decoder_seq_length)

Number of samples: 10000
Number of unique input tokens: 71
Number of unique output tokens: 93
Max sequence length for inputs: 15
Max sequence length for outputs: 59

input_token_index = dict([(char, i) for i, char in enumerate(input_charecters)])
target_token_index = dict([(char, i) for i, char in enumerate(target_charecters)])

Turn the sentences into 3 Numpy arrays, encoder_input_data, decoder_input_data, decoder_target_data:

encoder_input_data is a 3D array of shape (num_pairs, max_english_sentence_length, num_english_characters) containing a one-hot vectorization of the English sentences.
decoder_input_data is a 3D array of shape (num_pairs, max_french_sentence_length, num_french_characters) containg a one-hot vectorization of the French sentences.
decoder_target_data is the same as decoder_input_data but offset by one timestep. decoder_target_data[:, t, :] will be the same as decoder_input_data[:, t + 1, :].

encoder_input_data = np.zeros((len(input_texts), max_encoder_seq_length, num_encoder_tokens), dtype='float32')
decoder_input_data = np.zeros((len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype='float32')
decoder_target_data  = np.zeros((len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype='float32')

This following code creates the one hot encoding of the English and French sentences.

for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.

Building the Model

The first step is to define an input sequence for the encoder.
Because it's a character-level translation, it plugs the input into the encoder character by character.
Now you need the encoder's final output as an initial state/input to the decoder.
So, for the encoder LSTM model, the return_state = True. With this, you can get the hidden state representation of the encoder at the end of the input sequence. state_h denotes a hidden state and state_c denotes cell state.

def encoder(encoder_tokens):
    """This function returns the encoder output"""
    encoder_inputs = Input(shape=(None, encoder_tokens))
    encoder = LSTM(latent_dim, return_state=True)
    encoder_output, state_h, state_c = encoder(encoder_inputs)
    encoder_states = [state_h, state_c]
    return encoder_inputs, encoder_output, encoder_states

def decoder(decoder_tokens, encoder_states):
    """This function returns the decoder output"""
    # Set up the decoder, using `encoder_states` as initial state.
    decoder_inputs = Input(shape=(None, decoder_tokens))
    # We set up our decoder to return full output sequences,
    # and to return internal states as well. We don't use the 
    # return states in the training model, but we will use them in inference.
    decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                        initial_state=encoder_states)
    decoder_dense = Dense(decoder_tokens, activation='softmax')
    decoder_outputs = decoder_dense(decoder_outputs)
    return decoder_inputs, decoder_outputs

Model Training

Here in our model, we are using "RMSProp" optimizer and loss as a categorical crossentropy.

encoder_inputs, encoder_outputs, encoder_states = encoder(num_encoder_tokens)
decoder_inputs, decoder_outputs = decoder(num_decoder_tokens, encoder_states)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)
model.save("./trained_data/E2F")

Epoch 1/100
125/125 [==============================] - 9s 39ms/step - loss: 0.9758 - val_loss: 1.0696
Epoch 2/100
125/125 [==============================] - 3s 27ms/step - loss: 0.9071 - val_loss: 1.0257
Epoch 3/100
125/125 [==============================] - 3s 27ms/step - loss: 0.8610 - val_loss: 0.9672
Epoch 4/100
125/125 [==============================] - 3s 26ms/step - loss: 0.8232 - val_loss: 0.9595
Epoch 5/100
125/125 [==============================] - 3s 26ms/step - loss: 0.7926 - val_loss: 0.8795
Epoch 6/100
125/125 [==============================] - 3s 26ms/step - loss: 0.7681 - val_loss: 0.8711
Epoch 7/100
125/125 [==============================] - 3s 27ms/step - loss: 0.7484 - val_loss: 0.8484
Epoch 8/100
125/125 [==============================] - 3s 27ms/step - loss: 0.7281 - val_loss: 0.8290
Epoch 9/100
125/125 [==============================] - 3s 27ms/step - loss: 0.7114 - val_loss: 0.8129
Epoch 10/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6946 - val_loss: 0.7892
Epoch 11/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6823 - val_loss: 0.7992
Epoch 12/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6692 - val_loss: 0.8075
Epoch 13/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6574 - val_loss: 0.7613
Epoch 14/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6479 - val_loss: 0.7519
Epoch 15/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6379 - val_loss: 0.7605
Epoch 16/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6308 - val_loss: 0.7481
Epoch 17/100
125/125 [==============================] - 3s 28ms/step - loss: 0.6227 - val_loss: 0.7516
Epoch 18/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6147 - val_loss: 0.7135
Epoch 19/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6091 - val_loss: 0.7279
Epoch 20/100
125/125 [==============================] - 3s 27ms/step - loss: 0.6027 - val_loss: 0.7507
Epoch 21/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5975 - val_loss: 0.7123
Epoch 22/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5914 - val_loss: 0.6755
Epoch 23/100
125/125 [==============================] - 4s 28ms/step - loss: 0.5845 - val_loss: 0.6916
Epoch 24/100
125/125 [==============================] - 4s 30ms/step - loss: 0.5789 - val_loss: 0.6819
Epoch 25/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5726 - val_loss: 0.6463
Epoch 26/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5634 - val_loss: 0.6772
Epoch 27/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5479 - val_loss: 0.6413
Epoch 28/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5363 - val_loss: 0.6583
Epoch 29/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5297 - val_loss: 0.6416
Epoch 30/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5233 - val_loss: 0.6585
Epoch 31/100
125/125 [==============================] - 3s 28ms/step - loss: 0.5178 - val_loss: 0.6209
Epoch 32/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5108 - val_loss: 0.6119
Epoch 33/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5054 - val_loss: 0.6278
Epoch 34/100
125/125 [==============================] - 3s 27ms/step - loss: 0.5011 - val_loss: 0.6234
Epoch 35/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4963 - val_loss: 0.6227
Epoch 36/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4925 - val_loss: 0.6207
Epoch 37/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4874 - val_loss: 0.6160
Epoch 38/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4836 - val_loss: 0.6172
Epoch 39/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4792 - val_loss: 0.6097
Epoch 40/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4762 - val_loss: 0.6062
Epoch 41/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4726 - val_loss: 0.5951
Epoch 42/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4685 - val_loss: 0.5973
Epoch 43/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4646 - val_loss: 0.6068
Epoch 44/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4614 - val_loss: 0.5940
Epoch 45/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4579 - val_loss: 0.6019
Epoch 46/100
125/125 [==============================] - 3s 26ms/step - loss: 0.4553 - val_loss: 0.5835
Epoch 47/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4516 - val_loss: 0.5827
Epoch 48/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4498 - val_loss: 0.5867
Epoch 49/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4468 - val_loss: 0.5857
Epoch 50/100
125/125 [==============================] - 3s 26ms/step - loss: 0.4443 - val_loss: 0.5804
Epoch 51/100
125/125 [==============================] - 4s 29ms/step - loss: 0.4419 - val_loss: 0.5893
Epoch 52/100
125/125 [==============================] - 4s 29ms/step - loss: 0.4391 - val_loss: 0.5886
Epoch 53/100
125/125 [==============================] - 3s 28ms/step - loss: 0.4374 - val_loss: 0.5804
Epoch 54/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4347 - val_loss: 0.5807
Epoch 55/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4321 - val_loss: 0.5782
Epoch 56/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4307 - val_loss: 0.5795
Epoch 57/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4276 - val_loss: 0.5779
Epoch 58/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4258 - val_loss: 0.5839
Epoch 59/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4242 - val_loss: 0.5755
Epoch 60/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4216 - val_loss: 0.5764
Epoch 61/100
125/125 [==============================] - 3s 28ms/step - loss: 0.4192 - val_loss: 0.5695
Epoch 62/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4174 - val_loss: 0.5718
Epoch 63/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4154 - val_loss: 0.5741
Epoch 64/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4138 - val_loss: 0.5701
Epoch 65/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4118 - val_loss: 0.5692
Epoch 66/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4102 - val_loss: 0.5702
Epoch 67/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4082 - val_loss: 0.5674
Epoch 68/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4058 - val_loss: 0.5821
Epoch 69/100
125/125 [==============================] - 3s 28ms/step - loss: 0.4038 - val_loss: 0.5623
Epoch 70/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4020 - val_loss: 0.5640
Epoch 71/100
125/125 [==============================] - 3s 27ms/step - loss: 0.4007 - val_loss: 0.5667
Epoch 72/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3981 - val_loss: 0.5781
Epoch 73/100
125/125 [==============================] - 3s 28ms/step - loss: 0.3961 - val_loss: 0.5541
Epoch 74/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3948 - val_loss: 0.5625
Epoch 75/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3947 - val_loss: 0.5599
Epoch 76/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3921 - val_loss: 0.5664
Epoch 77/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3915 - val_loss: 0.5679
Epoch 78/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3905 - val_loss: 0.5609
Epoch 79/100
125/125 [==============================] - 3s 28ms/step - loss: 0.3888 - val_loss: 0.5520
Epoch 80/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3876 - val_loss: 0.5545
Epoch 81/100
125/125 [==============================] - 3s 26ms/step - loss: 0.3858 - val_loss: 0.5547
Epoch 82/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3848 - val_loss: 0.5605
Epoch 83/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3827 - val_loss: 0.5534
Epoch 84/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3816 - val_loss: 0.5777
Epoch 85/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3804 - val_loss: 0.5581
Epoch 86/100
125/125 [==============================] - 3s 26ms/step - loss: 0.3792 - val_loss: 0.5726
Epoch 87/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3780 - val_loss: 0.5594
Epoch 88/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3776 - val_loss: 0.5484
Epoch 89/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3773 - val_loss: 0.5642
Epoch 90/100
125/125 [==============================] - 3s 28ms/step - loss: 0.3755 - val_loss: 0.5537
Epoch 91/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3740 - val_loss: 0.5473
Epoch 92/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3740 - val_loss: 0.5569
Epoch 93/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3720 - val_loss: 0.5532
Epoch 94/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3712 - val_loss: 0.5502
Epoch 95/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3717 - val_loss: 0.5413
Epoch 96/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3683 - val_loss: 0.5571
Epoch 97/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3684 - val_loss: 0.5537
Epoch 98/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3655 - val_loss: 0.5552
Epoch 99/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3658 - val_loss: 0.5600
Epoch 100/100
125/125 [==============================] - 3s 27ms/step - loss: 0.3648 - val_loss: 0.5562

WARNING:absl:Found untraced functions such as lstm_cell_2_layer_call_fn, lstm_cell_2_layer_call_and_return_conditional_losses, lstm_cell_3_layer_call_fn, lstm_cell_3_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading.

INFO:tensorflow:Assets written to: ./trained_data/E2F\assets

INFO:tensorflow:Assets written to: ./trained_data/E2F\assets

Decode the sentence

Finally, create the model() function for encoder_inputs i.e., input tensor and encoder hidden states state_h_enc and state_c_enc as output tensor.
Now build the model for the decoder.

# Restore the model and construct the encoder and decoder.
model = tf.keras.models.load_model("./trained_data/E2F/")

encoder_inputs = model.input[0]  # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output  # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

decoder_inputs = model.input[1]  # input_2
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs
)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states
)

Create two reverse-lookup token indexes to decode the sequence to make it readable.

reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

Next, create a predict function named decode_sequence. After generating the empty sequence of length 1, the model should know when to start and stop reading the text. To read the model will check out for \t in this case. Keep two conditions, either when the max length of sentence is hit or find stop character \n. Keep on updating the target sequence by one and update the states.

def decode_sequence(input_seq):
    """This function returns the decoded sequence"""
    states_value = encoder_model.predict(input_seq)
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    target_seq[0, 0, target_token_index["\t"]] = 1.0
    stop_condition = False
    decoded_sentence = ""
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char
        if sampled_char == "\n" or len(decoded_sentence) > max_decoder_seq_length:
            stop_condition = True
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.0
        states_value = [h, c]
    return decoded_sentence

A random sentence will appear when you run the cell. The sentences are basic. It's always an add-on to your skills to learn a new foreign language. Also, it will be helpful when you visit France.

i = np.random.choice(len(input_texts))
input_seq = encoder_input_data[i:i+1]
translation = decode_sequence(input_seq)
print('-')
print('Input:', input_texts[i])
print('Translation:', translation)

1/1 [==============================] - 0s 25ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 35ms/step
1/1 [==============================] - 0s 33ms/step
1/1 [==============================] - 0s 37ms/step
1/1 [==============================] - 0s 34ms/step
1/1 [==============================] - 0s 34ms/step
1/1 [==============================] - 0s 34ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 36ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 32ms/step
-
Input: Come alone.
Translation: Venez seule.

Result

We translate a english sentence into French one and compare it to google translate which give preety amazing result: