Skip to content

用GPU和用CPU,训练损失差别好大 #2

Open
@cqcracked

Description

1-pretrain-vlm.py使用GPU训练如下:
模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion)
Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808
Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808
Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808
Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808
使用CPU训练如下:
Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808
Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808
用CPU训练到100个批次损失就到2.95了,
这是怎么回事?
配置如下:
dim: int = 768,
n_layers: int = 16,
n_heads: int = 16,
n_kv_heads: int = 8,

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions