Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to the "English-to-Spanish Translation with a Sequence-to-Sequence Transformer" Code Example #1997

Merged
merged 5 commits into from
Nov 29, 2024

Conversation

tomjelen
Copy link
Contributor

This pull request addresses several issues in the "English-to-Spanish translation with a sequence-to-sequence Transformer" code example:

  1. Swapped input tensors:

    • Issue: The encoder and decoder inputs were swapped, causing either the model to be trained in the wrong translation direction, or inference happening in the wrong direction.
    • Fix: Aligned the encoder and decoder inputs correctly, so the model is trained and used to translate from English to Spanish. This results in meaningful translations when trained to convergence.
  2. Padding handling:

    • Issue: The padding masks were computed but never used.
    • Fix: Propagate the masks out of the embedding layer.
    • Fix: Updated the decoder to handle multiple masks, accommodating padding in both Spanish and English sequences.
    • Fix: Modified the loss computation to exclude padding tokens, preventing them from affecting the training process.

I identified these issues while using the example to build my own Keras-based transformer models. I noticed that the model wasn't producing meaningful translations, which these corrections address. By fixing these issues, this pull request ensures that the example functions properly, allowing others to use it as a reliable foundation or learning resource for their projects.

It seems these issues may have arisen due to regression errors over time, and this update aims to restore the example to its intended functionality.

Given 30 epochs of training, the model never ended producing sensible output. These are examples:
1) Tom didn't like Mary. → [start] ha estoy qué
2) Tom called Mary and canceled their date. → [start] sola qué yo pasatiempo visto campo

When fitting the model the following relevant warning was emitted:
```
UserWarning: The structure of `inputs` doesn't match the expected structure: ['encoder_inputs', 'decoder_inputs']. Received: the structure of inputs={'encoder_inputs': '*', 'decoder_inputs': '*'}
```

After the fix the model now outputs sentences that are close to proper Spanish:
1)That's what Tom told me. → [start] eso es lo que tom me dijo [end]
2) Does Tom like cheeseburgers? → [start] a tom le gustan las queso de queso [end]
The check essentially disables the mask calculation, as the layer is the first one to receive the input, and thus never has a previous.

With this change mask is now passed on to the encoder.

Looks like a regression error. The initial commit looks very similar to this.
 As per https://github.com/tensorflow/tensorflow/blob/6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/tensorflow/python/keras/engine/base_layer.py#L965 the inputs should be a list, and not kwargs. When this is done, both the masks are received as a tuple in the mask argument.
Copy link

google-cla bot commented Nov 27, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! The code looks good. Please add the generated files as well.

@tomjelen
Copy link
Contributor Author

Perfect, here you go.

The generated markdown file contained thousands of lines of progress bar updates, so I removed them by hand and checked it was nice and readable.

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you. Good call on cleaning up the logs.

@fchollet fchollet merged commit 2158c91 into keras-team:master Nov 29, 2024
1 check passed
@jorcisai
Copy link

jorcisai commented Dec 4, 2024

Many thanks, Tom! As soon as I can, I will test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants