Open
Description
1-pretrain-vlm.py使用GPU训练如下:
模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion)
Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808
Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808
Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808
Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808
使用CPU训练如下:
Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808
Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808
用CPU训练到100个批次损失就到2.95了,
这是怎么回事?
配置如下:
dim: int = 768,
n_layers: int = 16,
n_heads: int = 16,
n_kv_heads: int = 8,
Metadata
Assignees
Labels
No labels