Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The batch_size value is incorrect #26

Open
liyuzhan opened this issue Oct 20, 2024 · 1 comment
Open

The batch_size value is incorrect #26

liyuzhan opened this issue Oct 20, 2024 · 1 comment

Comments

@liyuzhan
Copy link

Hello author, I would like to ask you what is the appropriate value of batch_size when the H+W of this place is greater than or equal to 64
e7a7e7c3c411e1fedc6e3194075c23af

@YuanYuan98
Copy link
Collaborator

When $H+W>64$, it is recommended to reduce the batch size appropriately. This is because larger $H$ and $W$ values lead to longer sequence lengths after patching, and the computational complexity of the transformer increases quadratically with the sequence length. As a result, GPU memory usage may increase significantly, potentially causing an out-of-memory error.

In addition to reducing the batch size, you can also increase the patch size to reduce the sequence length, which will help lower both the computational and memory requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants