You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When $H+W>64$, it is recommended to reduce the batch size appropriately. This is because larger $H$ and $W$ values lead to longer sequence lengths after patching, and the computational complexity of the transformer increases quadratically with the sequence length. As a result, GPU memory usage may increase significantly, potentially causing an out-of-memory error.
In addition to reducing the batch size, you can also increase the patch size to reduce the sequence length, which will help lower both the computational and memory requirements.
Hello author, I would like to ask you what is the appropriate value of batch_size when the H+W of this place is greater than or equal to 64
The text was updated successfully, but these errors were encountered: