Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

tomjelen · 2024-11-27T13:53:57Z

This pull request addresses several issues in the "English-to-Spanish translation with a sequence-to-sequence Transformer" code example:

Swapped input tensors:
- Issue: The encoder and decoder inputs were swapped, causing either the model to be trained in the wrong translation direction, or inference happening in the wrong direction.
- Fix: Aligned the encoder and decoder inputs correctly, so the model is trained and used to translate from English to Spanish. This results in meaningful translations when trained to convergence.
Padding handling:
- Issue: The padding masks were computed but never used.
- Fix: Propagate the masks out of the embedding layer.
- Fix: Updated the decoder to handle multiple masks, accommodating padding in both Spanish and English sequences.
- Fix: Modified the loss computation to exclude padding tokens, preventing them from affecting the training process.

I identified these issues while using the example to build my own Keras-based transformer models. I noticed that the model wasn't producing meaningful translations, which these corrections address. By fixing these issues, this pull request ensures that the example functions properly, allowing others to use it as a reliable foundation or learning resource for their projects.

It seems these issues may have arisen due to regression errors over time, and this update aims to restore the example to its intended functionality.

Given 30 epochs of training, the model never ended producing sensible output. These are examples: 1) Tom didn't like Mary. → [start] ha estoy qué 2) Tom called Mary and canceled their date. → [start] sola qué yo pasatiempo visto campo When fitting the model the following relevant warning was emitted: ``` UserWarning: The structure of `inputs` doesn't match the expected structure: ['encoder_inputs', 'decoder_inputs']. Received: the structure of inputs={'encoder_inputs': '*', 'decoder_inputs': '*'} ``` After the fix the model now outputs sentences that are close to proper Spanish: 1)That's what Tom told me. → [start] eso es lo que tom me dijo [end] 2) Does Tom like cheeseburgers? → [start] a tom le gustan las queso de queso [end]

The check essentially disables the mask calculation, as the layer is the first one to receive the input, and thus never has a previous. With this change mask is now passed on to the encoder. Looks like a regression error. The initial commit looks very similar to this.

As per https://github.com/tensorflow/tensorflow/blob/6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/tensorflow/python/keras/engine/base_layer.py#L965 the inputs should be a list, and not kwargs. When this is done, both the masks are received as a tuple in the mask argument.

…computation

google-cla · 2024-11-27T13:54:01Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fchollet

Thanks for the fixes! The code looks good. Please add the generated files as well.

tomjelen · 2024-11-28T14:21:33Z

Perfect, here you go.

The generated markdown file contained thousands of lines of progress bar updates, so I removed them by hand and checked it was nice and readable.

fchollet

LGTM, thank you. Good call on cleaning up the logs.

jorcisai · 2024-12-04T09:40:26Z

Many thanks, Tom! As soon as I can, I will test it.

tomjelen added 4 commits November 22, 2024 14:51

Apply both the padding masks in the attention layers and during loss …

c99fc8a

…computation

github-actions bot assigned sachinprasadhs Nov 27, 2024

fchollet reviewed Nov 27, 2024

View reviewed changes

Regenerate ipynb/md-files for NMT example

dc5d72e

fchollet approved these changes Nov 29, 2024

View reviewed changes

fchollet merged commit 2158c91 into keras-team:master Nov 29, 2024
1 check passed

tomjelen mentioned this pull request Dec 2, 2024

Neural machine translation with transformer decoded sentence is the same as input sentence #1968

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

tomjelen commented Nov 27, 2024

google-cla bot commented Nov 27, 2024

fchollet left a comment

tomjelen commented Nov 28, 2024

fchollet left a comment

jorcisai commented Dec 4, 2024

Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

Conversation

tomjelen commented Nov 27, 2024

google-cla bot commented Nov 27, 2024

fchollet left a comment

Choose a reason for hiding this comment

tomjelen commented Nov 28, 2024

fchollet left a comment

Choose a reason for hiding this comment

jorcisai commented Dec 4, 2024