Basic Attention Code

This example focuses on a simple sequence-to-sequence task of translating English sentences into French using an LSTM-based encoder-decoder architecture with attention.

import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense, Dot, Concatenate
from tensorflow.keras.models import Model

# Define the encoder
def build_encoder(input_shape, latent_dim):
encoder_inputs = Input(shape=input_shape)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]
return encoder_inputs, encoder_outputs, encoder_states

# Define the decoder
def build_decoder(input_shape, latent_dim, output_vocab_size):
decoder_inputs = Input(shape=input_shape)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
attention = Dot(axes=[2, 2])([decoder_outputs, encoder_outputs])
attention_weights = tf.nn.softmax(attention, axis=1)
context_vector = Dot(axes=[2, 1])([attention_weights, encoder_outputs])
decoder_combined_context = Concatenate(axis=-1)([context_vector, decoder_outputs])
decoder_dense = Dense(output_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_combined_context)
return decoder_inputs, decoder_outputs

# Build the model
latent_dim = 256
encoder_input_shape = (None, 100) # Sequence length of English sentences
decoder_input_shape = (None, 120) # Sequence length of French sentences
output_vocab_size = 5000 # Number of words in the French vocabulary

encoder_inputs, encoder_outputs, encoder_states = build_encoder(encoder_input_shape, latent_dim)
decoder_inputs, decoder_outputs = build_decoder(decoder_input_shape, latent_dim, output_vocab_size)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
# …

# Generate predictions
# …

In this example, the encoder takes English sentences as input and produces encoder outputs along with the final states of the LSTM layer. The decoder, on the other hand, takes French sentences as input and uses the encoder outputs and states as initial states for its LSTM layer. The attention mechanism is applied by computing attention weights between the decoder outputs and encoder outputs. The context vector is then calculated based on these attention weights and concatenated with the decoder outputs. Finally, the model is compiled, trained, and can be used to generate predictions.

Please note that this is a simplified example, and there may be additional pre-processing steps, data handling, and modifications required depending on your specific task.

Scroll to Top