The batch_size value is incorrect #26

liyuzhan · 2024-10-20T14:07:43Z

Hello author, I would like to ask you what is the appropriate value of batch_size when the H+W of this place is greater than or equal to 64

YuanYuan98 · 2024-12-13T08:00:05Z

When $H+W>64$, it is recommended to reduce the batch size appropriately. This is because larger $H$ and $W$ values lead to longer sequence lengths after patching, and the computational complexity of the transformer increases quadratically with the sequence length. As a result, GPU memory usage may increase significantly, potentially causing an out-of-memory error.

In addition to reducing the batch size, you can also increase the patch size to reduce the sequence length, which will help lower both the computational and memory requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The batch_size value is incorrect #26

The batch_size value is incorrect #26

liyuzhan commented Oct 20, 2024

YuanYuan98 commented Dec 13, 2024

The batch_size value is incorrect #26

The batch_size value is incorrect #26

Comments

liyuzhan commented Oct 20, 2024

YuanYuan98 commented Dec 13, 2024