Open
Description
I'm trying to fine-tune the AMR3.0 large SBART checkpoint on another dataset, but during training I get the following warnings:
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | WARNING | tensorboardX.x2num | NaN or Inf found in input tensor.
2023-04-29 00:02:05 | INFO | train | {"epoch": 1, "train_loss": "inf", "train_nll_loss": "inf", "train_loss_seq": "inf", "train_nll_loss_seq": "inf", "train_loss_pos": "0.710562", "train_nll_loss_pos": "0.710562", "train_wps": "687.9", "train_ups": "0.51", "train_wpb": "1354.7", "train_bsz": "55.2", "train_num_updates": "71", "train_lr": "1.87323e-06", "train_gnorm": "17.868", "train_loss_scale": "8", "train_train_wall": "45", "train_wall": "158"}
In my config I set the fairseq-preprocess arguments as:
FAIRSEQ_PREPROCESS_FINETUNE_ARGS="--srcdict /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/dict.en.txt --tgtdict /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/dict.actions_nopos.txt"
and train args as:
FAIRSEQ_TRAIN_FINETUNE_ARGS="--finetune-from-model /content/DATA/AMR3.0/models/amr3.0-structured-bart-large-neur-al/seed42/checkpoint_wiki.smatch_top5-avg.pt --memory-efficient-fp16 --batch-size 16 --max-tokens 512 --patience 10"
Any ideas as to what I'm doing wrong?
Thanks in advance.
Metadata
Assignees
Labels
No labels