We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
为何开了sequence parallelism 但是每张卡还需要单独加载整个T5语言模型? 开了8卡也同样炸显存。。。。