nohup.out

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:04<00:08,  4.49s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:08<00:04,  4.49s/it]/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/cuda/memory.py:303: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
--> Model lmsys/vicuna-13b-v1.5-16k

--> lmsys/vicuna-13b-v1.5-16k has 328.09472 Million params

trainable params: 6,553,600 || all params: 13,022,417,920 || trainable%: 0.05032552357220002
--> Training Set Length = 200
--> Validation Set Length = 50
evaluating Epoch:   0%|[32m          [0m| 0/50 [00:00<?, ?it/s]/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
evaluating Epoch:   2%|[32m▏         [0m| 1/50 [00:07<06:07,  7.49s/it]evaluating Epoch:   4%|[32m▍         [0m| 2/50 [00:13<05:11,  6.50s/it]evaluating Epoch:   6%|[32m▌         [0m| 3/50 [00:19<04:51,  6.20s/it]evaluating Epoch:   8%|[32m▊         [0m| 4/50 [00:25<04:39,  6.07s/it]evaluating Epoch:  10%|[32m█         [0m| 5/50 [00:30<04:28,  5.97s/it]evaluating Epoch:  12%|[32m█▏        [0m| 6/50 [00:36<04:20,  5.93s/it]evaluating Epoch:  14%|[32m█▍        [0m| 7/50 [00:42<04:13,  5.88s/it]evaluating Epoch:  16%|[32m█▌        [0m| 8/50 [00:48<04:06,  5.87s/it]evaluating Epoch:  18%|[32m█▊        [0m| 9/50 [00:54<04:00,  5.86s/it]evaluating Epoch:  20%|[32m██        [0m| 10/50 [00:59<03:54,  5.86s/it]evaluating Epoch:  22%|[32m██▏       [0m| 11/50 [01:05<03:48,  5.86s/it]evaluating Epoch:  24%|[32m██▍       [0m| 12/50 [01:11<03:42,  5.86s/it]evaluating Epoch:  26%|[32m██▌       [0m| 13/50 [01:17<03:37,  5.89s/it]evaluating Epoch:  28%|[32m██▊       [0m| 14/50 [01:23<03:31,  5.88s/it]evaluating Epoch:  30%|[32m███       [0m| 15/50 [01:29<03:25,  5.88s/it]evaluating Epoch:  32%|[32m███▏      [0m| 16/50 [01:35<03:19,  5.88s/it]evaluating Epoch:  34%|[32m███▍      [0m| 17/50 [01:40<03:11,  5.81s/it]evaluating Epoch:  36%|[32m███▌      [0m| 18/50 [01:46<03:05,  5.81s/it]evaluating Epoch:  38%|[32m███▊      [0m| 19/50 [01:52<02:59,  5.79s/it]evaluating Epoch:  40%|[32m████      [0m| 20/50 [01:58<02:54,  5.80s/it]evaluating Epoch:  42%|[32m████▏     [0m| 21/50 [02:04<02:48,  5.82s/it]evaluating Epoch:  44%|[32m████▍     [0m| 22/50 [02:09<02:42,  5.80s/it]evaluating Epoch:  46%|[32m████▌     [0m| 23/50 [02:15<02:36,  5.79s/it]evaluating Epoch:  48%|[32m████▊     [0m| 24/50 [02:21<02:30,  5.79s/it]evaluating Epoch:  50%|[32m█████     [0m| 25/50 [02:27<02:25,  5.81s/it]evaluating Epoch:  52%|[32m█████▏    [0m| 26/50 [02:33<02:19,  5.81s/it]evaluating Epoch:  54%|[32m█████▍    [0m| 27/50 [02:38<02:13,  5.81s/it]evaluating Epoch:  56%|[32m█████▌    [0m| 28/50 [02:44<02:08,  5.82s/it]evaluating Epoch:  58%|[32m█████▊    [0m| 29/50 [02:50<02:02,  5.82s/it]evaluating Epoch:  60%|[32m██████    [0m| 30/50 [02:56<01:56,  5.82s/it]evaluating Epoch:  62%|[32m██████▏   [0m| 31/50 [03:02<01:50,  5.80s/it]evaluating Epoch:  64%|[32m██████▍   [0m| 32/50 [03:07<01:44,  5.79s/it]evaluating Epoch:  66%|[32m██████▌   [0m| 33/50 [03:13<01:38,  5.81s/it]evaluating Epoch:  68%|[32m██████▊   [0m| 34/50 [03:19<01:33,  5.82s/it]evaluating Epoch:  70%|[32m███████   [0m| 35/50 [03:25<01:27,  5.82s/it]evaluating Epoch:  72%|[32m███████▏  [0m| 36/50 [03:31<01:21,  5.83s/it]evaluating Epoch:  74%|[32m███████▍  [0m| 37/50 [03:37<01:15,  5.83s/it]evaluating Epoch:  76%|[32m███████▌  [0m| 38/50 [03:43<01:10,  5.84s/it]evaluating Epoch:  78%|[32m███████▊  [0m| 39/50 [03:48<01:04,  5.83s/it]evaluating Epoch:  80%|[32m████████  [0m| 40/50 [03:54<00:58,  5.82s/it]evaluating Epoch:  82%|[32m████████▏ [0m| 41/50 [04:00<00:52,  5.81s/it]evaluating Epoch:  84%|[32m████████▍ [0m| 42/50 [04:06<00:46,  5.80s/it]evaluating Epoch:  86%|[32m████████▌ [0m| 43/50 [04:12<00:40,  5.82s/it]evaluating Epoch:  88%|[32m████████▊ [0m| 44/50 [04:17<00:34,  5.80s/it]evaluating Epoch:  90%|[32m█████████ [0m| 45/50 [04:23<00:28,  5.78s/it]evaluating Epoch:  92%|[32m█████████▏[0m| 46/50 [04:29<00:23,  5.78s/it]evaluating Epoch:  94%|[32m█████████▍[0m| 47/50 [04:35<00:17,  5.80s/it]evaluating Epoch:  96%|[32m█████████▌[0m| 48/50 [04:40<00:11,  5.81s/it]evaluating Epoch:  98%|[32m█████████▊[0m| 49/50 [04:46<00:05,  5.80s/it]evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:52<00:00,  5.79s/it]Number of tokens in the example:  2609
Number of tokens in the example:  2416
Number of tokens in the example:  3098
Number of tokens in the example:  5148
Number of tokens in the example:  2243
Number of tokens in the example:  2973
Number of tokens in the example:  1823
Number of tokens in the example:  2652
Number of tokens in the example:  3725
Number of tokens in the example:  2306
Number of tokens in the example:  3866
Number of tokens in the example:  3906
Number of tokens in the example:  2912
Number of tokens in the example:  3901
Number of tokens in the example:  2799
Number of tokens in the example:  3804
Number of tokens in the example:  2601
Number of tokens in the example:  3474
Number of tokens in the example:  2322
Number of tokens in the example:  3776
Number of tokens in the example:  2793
Number of tokens in the example:  2844
Number of tokens in the example:  3203
Number of tokens in the example:  2372
Number of tokens in the example:  2456
Number of tokens in the example:  3333
Number of tokens in the example:  2513
Number of tokens in the example:  2733
Number of tokens in the example:  2692
Number of tokens in the example:  1843
Number of tokens in the example:  2805
Number of tokens in the example:  3050
Number of tokens in the example:  2846
Number of tokens in the example:  2476
Number of tokens in the example:  2573
Number of tokens in the example:  3750
Number of tokens in the example:  2651
Number of tokens in the example:  2776
Number of tokens in the example:  3344
Number of tokens in the example:  2910
Number of tokens in the example:  4373
Number of tokens in the example:  3128
Number of tokens in the example:  2312
Number of tokens in the example:  2353
Number of tokens in the example:  3772
Number of tokens in the example:  2715
Number of tokens in the example:  2773
Number of tokens in the example:  2906
Number of tokens in the example:  2889
Number of tokens in the example:  2629
evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:52<00:00,  5.85s/it]
 eval_ppl=tensor(1.7953, device='cuda:0') eval_epoch_loss=tensor(0.5852, device='cuda:0')
Training Epoch0:   0%|[34m          [0m| 0/200 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Training Epoch0:   0%|[34m          [0m| 1/200 [00:28<1:33:45, 28.27s/it]Training Epoch0:   1%|[34m          [0m| 2/200 [00:55<1:31:30, 27.73s/it]Training Epoch0:   2%|[34m▏         [0m| 3/200 [01:22<1:30:30, 27.57s/it]Training Epoch0:   2%|[34m▏         [0m| 4/200 [01:50<1:29:47, 27.49s/it]Training Epoch0:   2%|[34m▎         [0m| 5/200 [02:17<1:29:17, 27.48s/it]Training Epoch0:   3%|[34m▎         [0m| 6/200 [02:45<1:28:52, 27.49s/it]Training Epoch0:   4%|[34m▎         [0m| 7/200 [03:12<1:28:11, 27.42s/it]Training Epoch0:   4%|[34m▍         [0m| 8/200 [03:39<1:27:37, 27.38s/it]Training Epoch0:   4%|[34m▍         [0m| 9/200 [04:07<1:27:15, 27.41s/it]Training Epoch0:   5%|[34m▌         [0m| 10/200 [04:34<1:26:53, 27.44s/it]Training Epoch0:   6%|[34m▌         [0m| 11/200 [05:02<1:26:26, 27.44s/it]Training Epoch0:   6%|[34m▌         [0m| 12/200 [05:29<1:26:02, 27.46s/it]Training Epoch0:   6%|[34m▋         [0m| 13/200 [05:57<1:25:30, 27.44s/it]Training Epoch0:   7%|[34m▋         [0m| 14/200 [06:24<1:24:59, 27.42s/it]Training Epoch0:   8%|[34m▊         [0m| 15/200 [06:51<1:24:31, 27.41s/it]Training Epoch0:   8%|[34m▊         [0m| 16/200 [07:19<1:24:03, 27.41s/it]Training Epoch0:   8%|[34m▊         [0m| 17/200 [07:46<1:23:41, 27.44s/it]Training Epoch0:   9%|[34m▉         [0m| 18/200 [08:14<1:23:14, 27.44s/it]Training Epoch0:  10%|[34m▉         [0m| 19/200 [08:41<1:22:44, 27.43s/it]Training Epoch0:  10%|[34m█         [0m| 20/200 [09:09<1:22:13, 27.41s/it]Training Epoch0:  10%|[34m█         [0m| 21/200 [09:36<1:21:45, 27.41s/it]Training Epoch0:  11%|[34m█         [0m| 22/200 [10:04<1:21:25, 27.45s/it]Training Epoch0:  12%|[34m█▏        [0m| 23/200 [10:31<1:21:06, 27.49s/it]Training Epoch0:  12%|[34m█▏        [0m| 24/200 [10:59<1:20:37, 27.49s/it]Training Epoch0:  12%|[34m█▎        [0m| 25/200 [11:26<1:20:09, 27.48s/it]Training Epoch0:  13%|[34m█▎        [0m| 26/200 [11:54<1:19:46, 27.51s/it]Training Epoch0:  14%|[34m█▎        [0m| 27/200 [12:21<1:19:18, 27.51s/it]Training Epoch0:  14%|[34m█▍        [0m| 28/200 [12:49<1:18:53, 27.52s/it]Training Epoch0:  14%|[34m█▍        [0m| 29/200 [13:16<1:18:27, 27.53s/it]Training Epoch0:  15%|[34m█▌        [0m| 30/200 [13:44<1:18:03, 27.55s/it]Training Epoch0:  16%|[34m█▌        [0m| 31/200 [14:11<1:17:36, 27.56s/it]Training Epoch0:  16%|[34m█▌        [0m| 32/200 [14:39<1:17:13, 27.58s/it]Training Epoch0:  16%|[34m█▋        [0m| 33/200 [15:07<1:16:53, 27.63s/it]Training Epoch0:  17%|[34m█▋        [0m| 34/200 [15:35<1:16:30, 27.65s/it]Training Epoch0:  18%|[34m█▊        [0m| 35/200 [16:02<1:16:03, 27.66s/it]Training Epoch0:  18%|[34m█▊        [0m| 36/200 [16:30<1:15:35, 27.65s/it]Training Epoch0:  18%|[34m█▊        [0m| 37/200 [16:57<1:15:00, 27.61s/it]Training Epoch0:  19%|[34m█▉        [0m| 38/200 [17:25<1:14:34, 27.62s/it]Training Epoch0:  20%|[34m█▉        [0m| 39/200 [17:53<1:14:05, 27.61s/it]Training Epoch0:  20%|[34m██        [0m| 40/200 [18:20<1:13:37, 27.61s/it]Training Epoch0:  20%|[34m██        [0m| 41/200 [18:48<1:13:10, 27.62s/it]Training Epoch0:  21%|[34m██        [0m| 42/200 [19:15<1:12:41, 27.61s/it]Training Epoch0:  22%|[34m██▏       [0m| 43/200 [19:43<1:12:15, 27.61s/it]Training Epoch0:  22%|[34m██▏       [0m| 44/200 [20:11<1:11:44, 27.60s/it]Training Epoch0:  22%|[34m██▎       [0m| 45/200 [20:38<1:11:15, 27.58s/it]Training Epoch0:  23%|[34m██▎       [0m| 46/200 [21:06<1:10:48, 27.59s/it]Training Epoch0:  24%|[34m██▎       [0m| 47/200 [21:33<1:10:20, 27.58s/it]Training Epoch0:  24%|[34m██▍       [0m| 48/200 [22:01<1:09:51, 27.58s/it]Training Epoch0:  24%|[34m██▍       [0m| 49/200 [22:28<1:09:25, 27.58s/it]Training Epoch0:  25%|[34m██▌       [0m| 50/200 [22:56<1:08:58, 27.59s/it]Training Epoch0:  26%|[34m██▌       [0m| 51/200 [23:24<1:08:30, 27.59s/it]Training Epoch0:  26%|[34m██▌       [0m| 52/200 [23:51<1:08:00, 27.57s/it]Training Epoch0:  26%|[34m██▋       [0m| 53/200 [24:19<1:07:32, 27.57s/it]Training Epoch0:  27%|[34m██▋       [0m| 54/200 [24:46<1:07:01, 27.54s/it]Training Epoch0:  28%|[34m██▊       [0m| 55/200 [25:14<1:06:34, 27.55s/it]Training Epoch0:  28%|[34m██▊       [0m| 56/200 [25:41<1:06:11, 27.58s/it]Training Epoch0:  28%|[34m██▊       [0m| 57/200 [26:09<1:05:46, 27.60s/it]Training Epoch0:  29%|[34m██▉       [0m| 58/200 [26:37<1:05:14, 27.56s/it]Training Epoch0:  30%|[34m██▉       [0m| 59/200 [27:04<1:04:40, 27.52s/it]Training Epoch0:  30%|[34m███       [0m| 60/200 [27:32<1:04:12, 27.52s/it]Training Epoch0:  30%|[34m███       [0m| 61/200 [27:59<1:03:49, 27.55s/it]Training Epoch0:  31%|[34m███       [0m| 62/200 [28:27<1:03:19, 27.53s/it]Training Epoch0:  32%|[34m███▏      [0m| 63/200 [28:54<1:02:48, 27.51s/it]Training Epoch0:  32%|[34m███▏      [0m| 64/200 [29:22<1:02:30, 27.58s/it]Training Epoch0:  32%|[34m███▎      [0m| 65/200 [29:50<1:02:08, 27.62s/it]Training Epoch0:  33%|[34m███▎      [0m| 66/200 [30:17<1:01:39, 27.60s/it]Training Epoch0:  34%|[34m███▎      [0m| 67/200 [30:45<1:01:06, 27.57s/it]Training Epoch0:  34%|[34m███▍      [0m| 68/200 [31:12<1:00:37, 27.56s/it]Training Epoch0:  34%|[34m███▍      [0m| 69/200 [31:39<1:00:02, 27.50s/it]Training Epoch0:  35%|[34m███▌      [0m| 70/200 [32:07<59:35, 27.51s/it]  Training Epoch0:  36%|[34m███▌      [0m| 71/200 [32:35<59:13, 27.55s/it]Training Epoch0:  36%|[34m███▌      [0m| 72/200 [33:02<58:45, 27.54s/it]Training Epoch0:  36%|[34m███▋      [0m| 73/200 [33:30<58:22, 27.58s/it]Training Epoch0:  37%|[34m███▋      [0m| 74/200 [33:57<57:56, 27.59s/it]Training Epoch0:  38%|[34m███▊      [0m| 75/200 [34:25<57:33, 27.62s/it]Training Epoch0:  38%|[34m███▊      [0m| 76/200 [34:53<57:05, 27.62s/it]Training Epoch0:  38%|[34m███▊      [0m| 77/200 [35:20<56:39, 27.64s/it]Training Epoch0:  39%|[34m███▉      [0m| 78/200 [35:48<56:12, 27.65s/it]Training Epoch0:  40%|[34m███▉      [0m| 79/200 [36:16<55:45, 27.65s/it]Training Epoch0:  40%|[34m████      [0m| 80/200 [36:43<55:18, 27.65s/it]Training Epoch0:  40%|[34m████      [0m| 81/200 [37:11<54:53, 27.67s/it]Training Epoch0:  41%|[34m████      [0m| 82/200 [37:39<54:26, 27.68s/it]Training Epoch0:  42%|[34m████▏     [0m| 83/200 [38:07<53:57, 27.67s/it]Training Epoch0:  42%|[34m████▏     [0m| 84/200 [38:34<53:28, 27.66s/it]Training Epoch0:  42%|[34m████▎     [0m| 85/200 [39:02<53:00, 27.66s/it]Training Epoch0:  43%|[34m████▎     [0m| 86/200 [39:29<52:23, 27.57s/it]Training Epoch0:  44%|[34m████▎     [0m| 87/200 [39:57<51:52, 27.55s/it]Training Epoch0:  44%|[34m████▍     [0m| 88/200 [40:24<51:21, 27.51s/it]Training Epoch0:  44%|[34m████▍     [0m| 89/200 [40:52<50:56, 27.54s/it]Training Epoch0:  45%|[34m████▌     [0m| 90/200 [41:19<50:29, 27.54s/it]Training Epoch0:  46%|[34m████▌     [0m| 91/200 [41:47<50:02, 27.54s/it]Training Epoch0:  46%|[34m████▌     [0m| 92/200 [42:14<49:39, 27.59s/it]Training Epoch0:  46%|[34m████▋     [0m| 93/200 [42:42<49:12, 27.59s/it]Training Epoch0:  47%|[34m████▋     [0m| 94/200 [43:10<48:42, 27.57s/it]Training Epoch0:  48%|[34m████▊     [0m| 95/200 [43:37<48:09, 27.52s/it]Training Epoch0:  48%|[34m████▊     [0m| 96/200 [44:05<47:41, 27.51s/it]Training Epoch0:  48%|[34m████▊     [0m| 97/200 [44:32<47:17, 27.55s/it]Training Epoch0:  49%|[34m████▉     [0m| 98/200 [45:00<46:52, 27.58s/it]Training Epoch0:  50%|[34m████▉     [0m| 99/200 [45:27<46:28, 27.61s/it]Training Epoch0:  50%|[34m█████     [0m| 100/200 [45:55<46:00, 27.60s/it]Training Epoch0:  50%|[34m█████     [0m| 101/200 [46:23<45:32, 27.60s/it]Training Epoch0:  51%|[34m█████     [0m| 102/200 [46:50<45:04, 27.60s/it]Training Epoch0:  52%|[34m█████▏    [0m| 103/200 [47:18<44:35, 27.59s/it]Training Epoch0:  52%|[34m█████▏    [0m| 104/200 [47:45<44:06, 27.57s/it]Training Epoch0:  52%|[34m█████▎    [0m| 105/200 [48:13<43:38, 27.57s/it]Training Epoch0:  53%|[34m█████▎    [0m| 106/200 [48:40<43:09, 27.54s/it]Training Epoch0:  54%|[34m█████▎    [0m| 107/200 [49:08<42:41, 27.55s/it]Training Epoch0:  54%|[34m█████▍    [0m| 108/200 [49:36<42:17, 27.58s/it]Training Epoch0:  55%|[34m█████▍    [0m| 109/200 [50:03<41:48, 27.56s/it]Training Epoch0:  55%|[34m█████▌    [0m| 110/200 [50:31<41:21, 27.58s/it]Training Epoch0:  56%|[34m█████▌    [0m| 111/200 [50:58<40:52, 27.56s/it]Training Epoch0:  56%|[34m█████▌    [0m| 112/200 [51:26<40:23, 27.54s/it]Training Epoch0:  56%|[34m█████▋    [0m| 113/200 [51:53<39:52, 27.50s/it]Training Epoch0:  57%|[34m█████▋    [0m| 114/200 [52:21<39:28, 27.54s/it]Training Epoch0:  57%|[34m█████▊    [0m| 115/200 [52:48<39:02, 27.56s/it]Training Epoch0:  58%|[34m█████▊    [0m| 116/200 [53:16<38:35, 27.56s/it]Training Epoch0:  58%|[34m█████▊    [0m| 117/200 [53:44<38:11, 27.60s/it]Training Epoch0:  59%|[34m█████▉    [0m| 118/200 [54:11<37:47, 27.65s/it]Training Epoch0:  60%|[34m█████▉    [0m| 119/200 [54:39<37:21, 27.67s/it]Training Epoch0:  60%|[34m██████    [0m| 120/200 [55:07<36:52, 27.65s/it]Training Epoch0:  60%|[34m██████    [0m| 121/200 [55:34<36:23, 27.64s/it]Training Epoch0:  61%|[34m██████    [0m| 122/200 [56:02<35:53, 27.60s/it]Training Epoch0:  62%|[34m██████▏   [0m| 123/200 [56:29<35:24, 27.59s/it]Training Epoch0:  62%|[34m██████▏   [0m| 124/200 [56:57<34:56, 27.58s/it]Training Epoch0:  62%|[34m██████▎   [0m| 125/200 [57:25<34:27, 27.57s/it]Training Epoch0:  63%|[34m██████▎   [0m| 126/200 [57:52<34:02, 27.60s/it]Training Epoch0:  64%|[34m██████▎   [0m| 127/200 [58:20<33:34, 27.60s/it]Training Epoch0:  64%|[34m██████▍   [0m| 128/200 [58:47<33:06, 27.59s/it]Training Epoch0:  64%|[34m██████▍   [0m| 129/200 [59:15<32:36, 27.55s/it]Training Epoch0:  65%|[34m██████▌   [0m| 130/200 [59:42<32:10, 27.58s/it]Training Epoch0:  66%|[34m██████▌   [0m| 131/200 [1:00:10<31:38, 27.51s/it]Training Epoch0:  66%|[34m██████▌   [0m| 132/200 [1:00:38<31:15, 27.58s/it]Training Epoch0:  66%|[34m██████▋   [0m| 133/200 [1:01:05<30:51, 27.63s/it]Training Epoch0:  67%|[34m██████▋   [0m| 134/200 [1:01:33<30:22, 27.62s/it]Training Epoch0:  68%|[34m██████▊   [0m| 135/200 [1:02:01<29:55, 27.62s/it]Training Epoch0:  68%|[34m██████▊   [0m| 136/200 [1:02:28<29:27, 27.61s/it]Training Epoch0:  68%|[34m██████▊   [0m| 137/200 [1:02:56<28:57, 27.58s/it]Training Epoch0:  69%|[34m██████▉   [0m| 138/200 [1:03:23<28:29, 27.57s/it]Training Epoch0:  70%|[34m██████▉   [0m| 139/200 [1:03:51<27:57, 27.51s/it]Training Epoch0:  70%|[34m███████   [0m| 140/200 [1:04:18<27:26, 27.45s/it]Training Epoch0:  70%|[34m███████   [0m| 141/200 [1:04:45<27:01, 27.48s/it]Training Epoch0:  71%|[34m███████   [0m| 142/200 [1:05:13<26:35, 27.50s/it]Training Epoch0:  72%|[34m███████▏  [0m| 143/200 [1:05:41<26:08, 27.52s/it]Training Epoch0:  72%|[34m███████▏  [0m| 144/200 [1:06:08<25:42, 27.54s/it]Training Epoch0:  72%|[34m███████▎  [0m| 145/200 [1:06:36<25:15, 27.56s/it]Training Epoch0:  73%|[34m███████▎  [0m| 146/200 [1:07:03<24:48, 27.56s/it]Training Epoch0:  74%|[34m███████▎  [0m| 147/200 [1:07:31<24:19, 27.54s/it]Training Epoch0:  74%|[34m███████▍  [0m| 148/200 [1:07:58<23:51, 27.53s/it]Training Epoch0:  74%|[34m███████▍  [0m| 149/200 [1:08:26<23:25, 27.55s/it]Training Epoch0:  75%|[34m███████▌  [0m| 150/200 [1:08:53<22:57, 27.55s/it]
 step 0 is completed and loss is 0.6250191330909729

 step 1 is completed and loss is 0.5554953813552856

 step 2 is completed and loss is 0.5113803148269653

 step 3 is completed and loss is 0.47984254360198975

 step 4 is completed and loss is 0.5095067024230957

 step 5 is completed and loss is 0.657611072063446

 step 6 is completed and loss is 0.798925518989563

 step 7 is completed and loss is 0.9212873578071594

 step 8 is completed and loss is 0.5055791735649109

 step 9 is completed and loss is 0.5641234517097473

 step 10 is completed and loss is 0.7621738910675049

 step 11 is completed and loss is 0.6242639422416687

 step 12 is completed and loss is 0.45724597573280334

 step 13 is completed and loss is 0.40364477038383484

 step 14 is completed and loss is 0.4145205616950989

 step 15 is completed and loss is 0.6699330806732178

 step 16 is completed and loss is 0.5529826879501343

 step 17 is completed and loss is 0.44825437664985657

 step 18 is completed and loss is 0.5366003513336182

 step 19 is completed and loss is 0.5394286513328552

 step 20 is completed and loss is 0.5539140105247498

 step 21 is completed and loss is 0.5224546790122986

 step 22 is completed and loss is 0.33682045340538025

 step 23 is completed and loss is 0.5308464765548706

 step 24 is completed and loss is 0.47635382413864136

 step 25 is completed and loss is 0.36415642499923706

 step 26 is completed and loss is 0.410065233707428

 step 27 is completed and loss is 0.701794445514679

 step 28 is completed and loss is 0.786088228225708

 step 29 is completed and loss is 0.3611423969268799

 step 30 is completed and loss is 0.8073915839195251

 step 31 is completed and loss is 0.3231208026409149

 step 32 is completed and loss is 0.5962550044059753

 step 33 is completed and loss is 0.3546735942363739

 step 34 is completed and loss is 0.3259352445602417

 step 35 is completed and loss is 0.4307865500450134

 step 36 is completed and loss is 0.3867531418800354

 step 37 is completed and loss is 0.47947242856025696

 step 38 is completed and loss is 0.2885544002056122

 step 39 is completed and loss is 0.7178091406822205

 step 40 is completed and loss is 0.49836665391921997

 step 41 is completed and loss is 0.5450732707977295

 step 42 is completed and loss is 0.5057857632637024

 step 43 is completed and loss is 0.4220871925354004

 step 44 is completed and loss is 0.46608904004096985

 step 45 is completed and loss is 0.6253067851066589

 step 46 is completed and loss is 0.6189549565315247

 step 47 is completed and loss is 0.485623836517334

 step 48 is completed and loss is 0.412734717130661

 step 49 is completed and loss is 0.46035242080688477

 step 50 is completed and loss is 0.4364945590496063

 step 51 is completed and loss is 0.36005815863609314

 step 52 is completed and loss is 0.5584291815757751

 step 53 is completed and loss is 0.42429718375205994

 step 54 is completed and loss is 0.5556947588920593

 step 55 is completed and loss is 0.6173613667488098

 step 56 is completed and loss is 0.28500840067863464

 step 57 is completed and loss is 0.5731761455535889

 step 58 is completed and loss is 0.2577945291996002

 step 59 is completed and loss is 0.32212626934051514

 step 60 is completed and loss is 0.3991536498069763

 step 61 is completed and loss is 0.45093005895614624

 step 62 is completed and loss is 0.3394198417663574

 step 63 is completed and loss is 0.6754112839698792

 step 64 is completed and loss is 0.36514154076576233

 step 65 is completed and loss is 0.3622404634952545

 step 66 is completed and loss is 0.42477038502693176

 step 67 is completed and loss is 0.5095375776290894

 step 68 is completed and loss is 0.46517670154571533

 step 69 is completed and loss is 0.3783566951751709

 step 70 is completed and loss is 0.4111236035823822

 step 71 is completed and loss is 0.43066808581352234

 step 72 is completed and loss is 0.5348895788192749

 step 73 is completed and loss is 0.4956885576248169

 step 74 is completed and loss is 0.5247679352760315

 step 75 is completed and loss is 0.6096778512001038

 step 76 is completed and loss is 0.40839439630508423

 step 77 is completed and loss is 0.4552842974662781

 step 78 is completed and loss is 0.3436572253704071

 step 79 is completed and loss is 0.21521404385566711

 step 80 is completed and loss is 0.4831559360027313

 step 81 is completed and loss is 0.6429739594459534

 step 82 is completed and loss is 0.3906168043613434

 step 83 is completed and loss is 0.7419795989990234

 step 84 is completed and loss is 0.3272210657596588

 step 85 is completed and loss is 0.3324776589870453

 step 86 is completed and loss is 0.42303216457366943

 step 87 is completed and loss is 0.6022685766220093

 step 88 is completed and loss is 0.4502815306186676

 step 89 is completed and loss is 0.28357505798339844

 step 90 is completed and loss is 0.4070228338241577

 step 91 is completed and loss is 0.5331108570098877

 step 92 is completed and loss is 0.6394806504249573

 step 93 is completed and loss is 0.2802354395389557

 step 94 is completed and loss is 0.30952689051628113

 step 95 is completed and loss is 0.41465920209884644

 step 96 is completed and loss is 0.3368547856807709

 step 97 is completed and loss is 0.3200959265232086

 step 98 is completed and loss is 0.7304431796073914

 step 99 is completed and loss is 0.7395025491714478

 step 100 is completed and loss is 0.314739853143692

 step 101 is completed and loss is 0.4538938105106354

 step 102 is completed and loss is 0.36950770020484924

 step 103 is completed and loss is 0.4940294623374939

 step 104 is completed and loss is 0.5958256721496582

 step 105 is completed and loss is 0.4647957980632782

 step 106 is completed and loss is 0.6770808696746826

 step 107 is completed and loss is 0.42914003133773804

 step 108 is completed and loss is 0.8305107355117798

 step 109 is completed and loss is 0.3427654504776001

 step 110 is completed and loss is 0.7463263273239136

 step 111 is completed and loss is 0.5293580889701843

 step 112 is completed and loss is 0.4033021032810211

 step 113 is completed and loss is 0.4478365480899811

 step 114 is completed and loss is 0.7052018046379089

 step 115 is completed and loss is 0.2819420099258423

 step 116 is completed and loss is 0.3327101469039917

 step 117 is completed and loss is 0.5378854274749756

 step 118 is completed and loss is 0.33610737323760986

 step 119 is completed and loss is 0.4828667938709259

 step 120 is completed and loss is 0.5126594305038452

 step 121 is completed and loss is 0.4066528081893921

 step 122 is completed and loss is 0.44863107800483704

 step 123 is completed and loss is 0.2988591194152832

 step 124 is completed and loss is 0.6256280541419983

 step 125 is completed and loss is 0.764495849609375

 step 126 is completed and loss is 0.39525049924850464

 step 127 is completed and loss is 0.48502588272094727

 step 128 is completed and loss is 0.44303399324417114

 step 129 is completed and loss is 0.4531609117984772

 step 130 is completed and loss is 0.39435404539108276

 step 131 is completed and loss is 0.5459421277046204

 step 132 is completed and loss is 0.3750486671924591

 step 133 is completed and loss is 0.6193696856498718

 step 134 is completed and loss is 0.5769856572151184

 step 135 is completed and loss is 0.437613308429718

 step 136 is completed and loss is 0.4931134283542633

 step 137 is completed and loss is 0.48926153779029846

 step 138 is completed and loss is 0.4128901958465576

 step 139 is completed and loss is 0.27938660979270935

 step 140 is completed and loss is 0.4067152142524719

 step 141 is completed and loss is 0.663224458694458

 step 142 is completed and loss is 0.35807809233665466

 step 143 is completed and loss is 0.30169665813446045

 step 144 is completed and loss is 0.41366511583328247

 step 145 is completed and loss is 0.38099291920661926

 step 146 is completed and loss is 0.5524359345436096

 step 147 is completed and loss is 0.5115877389907837

 step 148 is completed and loss is 0.7252401113510132

 step 149 is completed and loss is 0.23235322535037994
Training Epoch0:  76%|[34m███████▌  [0m| 151/200 [1:09:21<22:30, 27.57s/it]Training Epoch0:  76%|[34m███████▌  [0m| 152/200 [1:09:49<22:03, 27.57s/it]Training Epoch0:  76%|[34m███████▋  [0m| 153/200 [1:10:16<21:35, 27.56s/it]Training Epoch0:  77%|[34m███████▋  [0m| 154/200 [1:10:44<21:06, 27.54s/it]Training Epoch0:  78%|[34m███████▊  [0m| 155/200 [1:11:11<20:39, 27.55s/it]Training Epoch0:  78%|[34m███████▊  [0m| 156/200 [1:11:39<20:11, 27.54s/it]Training Epoch0:  78%|[34m███████▊  [0m| 157/200 [1:12:06<19:41, 27.47s/it]Training Epoch0:  79%|[34m███████▉  [0m| 158/200 [1:12:33<19:13, 27.46s/it]Training Epoch0:  80%|[34m███████▉  [0m| 159/200 [1:13:01<18:46, 27.47s/it]Training Epoch0:  80%|[34m████████  [0m| 160/200 [1:13:28<18:18, 27.47s/it]Training Epoch0:  80%|[34m████████  [0m| 161/200 [1:13:56<17:52, 27.49s/it]Training Epoch0:  81%|[34m████████  [0m| 162/200 [1:14:23<17:23, 27.47s/it]Training Epoch0:  82%|[34m████████▏ [0m| 163/200 [1:14:51<16:55, 27.44s/it]Training Epoch0:  82%|[34m████████▏ [0m| 164/200 [1:15:18<16:28, 27.45s/it]Training Epoch0:  82%|[34m████████▎ [0m| 165/200 [1:15:46<16:00, 27.44s/it]Training Epoch0:  83%|[34m████████▎ [0m| 166/200 [1:16:13<15:32, 27.43s/it]Training Epoch0:  84%|[34m████████▎ [0m| 167/200 [1:16:41<15:05, 27.44s/it]Training Epoch0:  84%|[34m████████▍ [0m| 168/200 [1:17:08<14:38, 27.45s/it]Training Epoch0:  84%|[34m████████▍ [0m| 169/200 [1:17:35<14:10, 27.45s/it]Training Epoch0:  85%|[34m████████▌ [0m| 170/200 [1:18:03<13:42, 27.43s/it]Training Epoch0:  86%|[34m████████▌ [0m| 171/200 [1:18:30<13:15, 27.42s/it]Training Epoch0:  86%|[34m████████▌ [0m| 172/200 [1:18:58<12:48, 27.43s/it]Training Epoch0:  86%|[34m████████▋ [0m| 173/200 [1:19:25<12:20, 27.42s/it]Training Epoch0:  87%|[34m████████▋ [0m| 174/200 [1:19:53<11:54, 27.47s/it]Training Epoch0:  88%|[34m████████▊ [0m| 175/200 [1:20:20<11:26, 27.47s/it]Training Epoch0:  88%|[34m████████▊ [0m| 176/200 [1:20:48<11:00, 27.50s/it]Training Epoch0:  88%|[34m████████▊ [0m| 177/200 [1:21:15<10:32, 27.52s/it]Training Epoch0:  89%|[34m████████▉ [0m| 178/200 [1:21:43<10:05, 27.52s/it]Training Epoch0:  90%|[34m████████▉ [0m| 179/200 [1:22:10<09:37, 27.51s/it]Training Epoch0:  90%|[34m█████████ [0m| 180/200 [1:22:38<09:10, 27.52s/it]Training Epoch0:  90%|[34m█████████ [0m| 181/200 [1:23:06<08:43, 27.58s/it]Training Epoch0:  91%|[34m█████████ [0m| 182/200 [1:23:33<08:16, 27.57s/it]Training Epoch0:  92%|[34m█████████▏[0m| 183/200 [1:24:01<07:48, 27.57s/it]Training Epoch0:  92%|[34m█████████▏[0m| 184/200 [1:24:28<07:20, 27.55s/it]Training Epoch0:  92%|[34m█████████▎[0m| 185/200 [1:24:56<06:52, 27.53s/it]Training Epoch0:  93%|[34m█████████▎[0m| 186/200 [1:25:23<06:24, 27.49s/it]Training Epoch0:  94%|[34m█████████▎[0m| 187/200 [1:25:51<05:57, 27.50s/it]Training Epoch0:  94%|[34m█████████▍[0m| 188/200 [1:26:18<05:30, 27.51s/it]Training Epoch0:  94%|[34m█████████▍[0m| 189/200 [1:26:46<05:02, 27.53s/it]Training Epoch0:  95%|[34m█████████▌[0m| 190/200 [1:27:13<04:35, 27.53s/it]Training Epoch0:  96%|[34m█████████▌[0m| 191/200 [1:27:41<04:07, 27.55s/it]Training Epoch0:  96%|[34m█████████▌[0m| 192/200 [1:28:08<03:40, 27.56s/it]Training Epoch0:  96%|[34m█████████▋[0m| 193/200 [1:28:36<03:12, 27.53s/it]Training Epoch0:  97%|[34m█████████▋[0m| 194/200 [1:29:03<02:45, 27.56s/it]Training Epoch0:  98%|[34m█████████▊[0m| 195/200 [1:29:31<02:17, 27.59s/it]Training Epoch0:  98%|[34m█████████▊[0m| 196/200 [1:29:59<01:50, 27.59s/it]Training Epoch0:  98%|[34m█████████▊[0m| 197/200 [1:30:26<01:22, 27.61s/it]Training Epoch0:  99%|[34m█████████▉[0m| 198/200 [1:30:54<00:55, 27.64s/it]Training Epoch0: 100%|[34m█████████▉[0m| 199/200 [1:31:21<00:27, 27.55s/it]Training Epoch0: 100%|[34m██████████[0m| 200/200 [1:31:49<00:00, 27.52s/it]Number of tokens in the example:  2609
Number of tokens in the example:  2416
Number of tokens in the example:  3098
Number of tokens in the example:  5148
Number of tokens in the example:  2243
Number of tokens in the example:  2973
Number of tokens in the example:  1823
Number of tokens in the example:  2652
Number of tokens in the example:  3725
Number of tokens in the example:  2306
Number of tokens in the example:  3866
Number of tokens in the example:  3906
Number of tokens in the example:  2912
Number of tokens in the example:  3901
Number of tokens in the example:  2799
Number of tokens in the example:  3804
Number of tokens in the example:  2601
Number of tokens in the example:  3474
Number of tokens in the example:  2322
Number of tokens in the example:  3776
Number of tokens in the example:  2793
Number of tokens in the example:  2844
Number of tokens in the example:  3203
Number of tokens in the example:  2372
Number of tokens in the example:  2456
Number of tokens in the example:  3333
Number of tokens in the example:  2513
Number of tokens in the example:  2733
Number of tokens in the example:  2692
Number of tokens in the example:  1843
Number of tokens in the example:  2805
Number of tokens in the example:  3050
Number of tokens in the example:  2846
Number of tokens in the example:  2476
Number of tokens in the example:  2573
Number of tokens in the example:  3750
Number of tokens in the example:  2651
Number of tokens in the example:  2776
Number of tokens in the example:  3344
Number of tokens in the example:  2910
Number of tokens in the example:  4373
Number of tokens in the example:  3128
Number of tokens in the example:  2312
Number of tokens in the example:  2353
Number of tokens in the example:  3772
Number of tokens in the example:  2715
Number of tokens in the example:  2773
Number of tokens in the example:  2906
Number of tokens in the example:  2889
Number of tokens in the example:  2629
Number of tokens in the example:  2655
Number of tokens in the example:  3133
Number of tokens in the example:  2986
Number of tokens in the example:  2850
Number of tokens in the example:  3345
Number of tokens in the example:  3795
Number of tokens in the example:  2494
Number of tokens in the example:  3712
Number of tokens in the example:  4269
Number of tokens in the example:  3141
Number of tokens in the example:  4540
Number of tokens in the example:  3709
Number of tokens in the example:  3046
Number of tokens in the example:  2070
Number of tokens in the example:  2915
Number of tokens in the example:  3952
Number of tokens in the example:  2912
Number of tokens in the example:  2616
Number of tokens in the example:  1706
Number of tokens in the example:  3587
Number of tokens in the example:  2642
Number of tokens in the example:  2813
Number of tokens in the example:  2267
Number of tokens in the example:  2896
Number of tokens in the example:  2703
Number of tokens in the example:  1968
Number of tokens in the example:  2984
Number of tokens in the example:  2658
Number of tokens in the example:  1858
Number of tokens in the example:  3478
Number of tokens in the example:  3866
Number of tokens in the example:  2113
Number of tokens in the example:  3416
Number of tokens in the example:  2708
Number of tokens in the example:  3852
Number of tokens in the example:  2778
Number of tokens in the example:  3759
Number of tokens in the example:  2673
Number of tokens in the example:  2388
Number of tokens in the example:  1994
Number of tokens in the example:  2308
Number of tokens in the example:  3366
Number of tokens in the example:  3848
Number of tokens in the example:  3882
Number of tokens in the example:  4099
Number of tokens in the example:  3427
Number of tokens in the example:  3558
Number of tokens in the example:  3053
Number of tokens in the example:  3578
Number of tokens in the example:  3168
Number of tokens in the example:  4070
Number of tokens in the example:  2731
Number of tokens in the example:  2974
Number of tokens in the example:  3161
Number of tokens in the example:  3073
Number of tokens in the example:  3130
Number of tokens in the example:  2181
Number of tokens in the example:  2694
Number of tokens in the example:  3758
Number of tokens in the example:  4118
Number of tokens in the example:  2252
Number of tokens in the example:  2907
Number of tokens in the example:  2489
Number of tokens in the example:  3309
Number of tokens in the example:  4103
Number of tokens in the example:  2796
Number of tokens in the example:  4178
Number of tokens in the example:  2321
Number of tokens in the example:  2869
Number of tokens in the example:  2948
Number of tokens in the example:  3093
Number of tokens in the example:  2132
Number of tokens in the example:  2432
Number of tokens in the example:  2236
Number of tokens in the example:  4499
Number of tokens in the example:  2767
Number of tokens in the example:  2775
Number of tokens in the example:  2739
Number of tokens in the example:  4840
Number of tokens in the example:  2622
Number of tokens in the example:  2613
Number of tokens in the example:  3159
Number of tokens in the example:  3145
Number of tokens in the example:  2724
Number of tokens in the example:  3173
Number of tokens in the example:  4642
Number of tokens in the example:  2747
Number of tokens in the example:  2293
Number of tokens in the example:  2937
Number of tokens in the example:  2398
Number of tokens in the example:  4227
Number of tokens in the example:  3903
Number of tokens in the example:  2393
Number of tokens in the example:  2805
Number of tokens in the example:  2741
Number of tokens in the example:  2494
Number of tokens in the example:  3204
Number of tokens in the example:  2506
Number of tokens in the example:  2885
Number of tokens in the example:  3888
Number of tokens in the example:  3169
Number of tokens in the example:  2665
Number of tokens in the example:  2252
Number of tokens in the example:  2359
Number of tokens in the example:  5263
Number of tokens in the example:  2544
Number of tokens in the example:  2496
Number of tokens in the example:  2913
Number of tokens in the example:  2534
Number of tokens in the example:  3422
Number of tokens in the example:  2206
Number of tokens in the example:  4598
Number of tokens in the example:  1930
Number of tokens in the example:  3667
Number of tokens in the example:  3256
Number of tokens in the example:  3320
Number of tokens in the example:  2754
Number of tokens in the example:  3372
Number of tokens in the example:  3456
Number of tokens in the example:  1848
Number of tokens in the example:  3195
Number of tokens in the example:  2565
Number of tokens in the example:  1643
Number of tokens in the example:  3571
Number of tokens in the example:  3622
Number of tokens in the example:  3410
Number of tokens in the example:  2259
Number of tokens in the example:  2430
Number of tokens in the example:  2527
Number of tokens in the example:  3666
Number of tokens in the example:  3921
Number of tokens in the example:  3336
Number of tokens in the example:  3191
Number of tokens in the example:  3577
Number of tokens in the example:  2267
Number of tokens in the example:  2165
Number of tokens in the example:  2532
Number of tokens in the example:  2603
Number of tokens in the example:  6585
Number of tokens in the example:  2872
Number of tokens in the example:  3749
Number of tokens in the example:  3644
Number of tokens in the example:  2422
Number of tokens in the example:  3188
Number of tokens in the example:  2794
Number of tokens in the example:  2868
Number of tokens in the example:  3179
Number of tokens in the example:  2883
Number of tokens in the example:  2125
Number of tokens in the example:  2527
Training Epoch0: 100%|[34m██████████[0m| 200/200 [1:31:49<00:00, 27.55s/it]

 step 150 is completed and loss is 0.5302682518959045

 step 151 is completed and loss is 0.34032824635505676

 step 152 is completed and loss is 0.40080389380455017

 step 153 is completed and loss is 0.4567677676677704

 step 154 is completed and loss is 0.47923365235328674

 step 155 is completed and loss is 0.7013326287269592

 step 156 is completed and loss is 0.40404069423675537

 step 157 is completed and loss is 0.7576054930686951

 step 158 is completed and loss is 0.5325603485107422

 step 159 is completed and loss is 0.3279130756855011

 step 160 is completed and loss is 0.40353286266326904

 step 161 is completed and loss is 0.4903396666049957

 step 162 is completed and loss is 0.5380808115005493

 step 163 is completed and loss is 0.40743619203567505

 step 164 is completed and loss is 0.34665772318840027

 step 165 is completed and loss is 0.7498384118080139

 step 166 is completed and loss is 0.4348480701446533

 step 167 is completed and loss is 0.41689738631248474

 step 168 is completed and loss is 0.37320438027381897

 step 169 is completed and loss is 0.3259231746196747

 step 170 is completed and loss is 0.7268881797790527

 step 171 is completed and loss is 0.38324007391929626

 step 172 is completed and loss is 0.3050919771194458

 step 173 is completed and loss is 0.3997267484664917

 step 174 is completed and loss is 0.3384326696395874

 step 175 is completed and loss is 0.3962850570678711

 step 176 is completed and loss is 0.4805029332637787

 step 177 is completed and loss is 0.6313626766204834

 step 178 is completed and loss is 0.3804188370704651

 step 179 is completed and loss is 0.43282651901245117

 step 180 is completed and loss is 0.49822109937667847

 step 181 is completed and loss is 0.3005482852458954

 step 182 is completed and loss is 0.4508446455001831

 step 183 is completed and loss is 0.414814293384552

 step 184 is completed and loss is 0.502862274646759

 step 185 is completed and loss is 0.4567048251628876

 step 186 is completed and loss is 0.5379153490066528

 step 187 is completed and loss is 0.45657268166542053

 step 188 is completed and loss is 0.37376996874809265

 step 189 is completed and loss is 0.4772945046424866

 step 190 is completed and loss is 0.5071732997894287

 step 191 is completed and loss is 0.5304340720176697

 step 192 is completed and loss is 0.3814086616039276

 step 193 is completed and loss is 0.34316983819007874

 step 194 is completed and loss is 0.37132182717323303

 step 195 is completed and loss is 0.3427174389362335

 step 196 is completed and loss is 0.4796711504459381

 step 197 is completed and loss is 0.5609081983566284

 step 198 is completed and loss is 0.31236982345581055

 step 199 is completed and loss is 0.34784114360809326
Max CUDA memory allocated was 40 GB
Max CUDA memory reserved was 43 GB
Peak active CUDA memory was 40 GB
Cuda Malloc retires : 0
CPU Total Peak Memory consumed during the train (max): 6 GB
evaluating Epoch:   0%|[32m          [0m| 0/50 [00:00<?, ?it/s]evaluating Epoch:   2%|[32m▏         [0m| 1/50 [00:05<04:37,  5.67s/it]evaluating Epoch:   4%|[32m▍         [0m| 2/50 [00:11<04:27,  5.58s/it]evaluating Epoch:   6%|[32m▌         [0m| 3/50 [00:16<04:20,  5.54s/it]evaluating Epoch:   8%|[32m▊         [0m| 4/50 [00:22<04:15,  5.55s/it]evaluating Epoch:  10%|[32m█         [0m| 5/50 [00:27<04:08,  5.53s/it]evaluating Epoch:  12%|[32m█▏        [0m| 6/50 [00:33<04:03,  5.52s/it]evaluating Epoch:  14%|[32m█▍        [0m| 7/50 [00:38<03:57,  5.52s/it]evaluating Epoch:  16%|[32m█▌        [0m| 8/50 [00:44<03:51,  5.51s/it]evaluating Epoch:  18%|[32m█▊        [0m| 9/50 [00:49<03:46,  5.52s/it]evaluating Epoch:  20%|[32m██        [0m| 10/50 [00:55<03:40,  5.52s/it]evaluating Epoch:  22%|[32m██▏       [0m| 11/50 [01:00<03:35,  5.52s/it]evaluating Epoch:  24%|[32m██▍       [0m| 12/50 [01:06<03:29,  5.52s/it]evaluating Epoch:  26%|[32m██▌       [0m| 13/50 [01:11<03:24,  5.52s/it]evaluating Epoch:  28%|[32m██▊       [0m| 14/50 [01:17<03:18,  5.51s/it]evaluating Epoch:  30%|[32m███       [0m| 15/50 [01:22<03:13,  5.52s/it]evaluating Epoch:  32%|[32m███▏      [0m| 16/50 [01:28<03:07,  5.52s/it]evaluating Epoch:  34%|[32m███▍      [0m| 17/50 [01:33<03:02,  5.52s/it]evaluating Epoch:  36%|[32m███▌      [0m| 18/50 [01:39<02:56,  5.52s/it]evaluating Epoch:  38%|[32m███▊      [0m| 19/50 [01:44<02:50,  5.52s/it]evaluating Epoch:  40%|[32m████      [0m| 20/50 [01:50<02:45,  5.52s/it]evaluating Epoch:  42%|[32m████▏     [0m| 21/50 [01:55<02:39,  5.51s/it]evaluating Epoch:  44%|[32m████▍     [0m| 22/50 [02:01<02:34,  5.52s/it]evaluating Epoch:  46%|[32m████▌     [0m| 23/50 [02:07<02:29,  5.52s/it]evaluating Epoch:  48%|[32m████▊     [0m| 24/50 [02:12<02:23,  5.51s/it]evaluating Epoch:  50%|[32m█████     [0m| 25/50 [02:18<02:17,  5.51s/it]evaluating Epoch:  52%|[32m█████▏    [0m| 26/50 [02:23<02:12,  5.52s/it]evaluating Epoch:  54%|[32m█████▍    [0m| 27/50 [02:29<02:06,  5.52s/it]evaluating Epoch:  56%|[32m█████▌    [0m| 28/50 [02:34<02:01,  5.52s/it]evaluating Epoch:  58%|[32m█████▊    [0m| 29/50 [02:40<01:55,  5.51s/it]evaluating Epoch:  60%|[32m██████    [0m| 30/50 [02:45<01:50,  5.52s/it]evaluating Epoch:  62%|[32m██████▏   [0m| 31/50 [02:51<01:44,  5.52s/it]evaluating Epoch:  64%|[32m██████▍   [0m| 32/50 [02:56<01:39,  5.51s/it]evaluating Epoch:  66%|[32m██████▌   [0m| 33/50 [03:02<01:33,  5.52s/it]evaluating Epoch:  68%|[32m██████▊   [0m| 34/50 [03:07<01:28,  5.51s/it]evaluating Epoch:  70%|[32m███████   [0m| 35/50 [03:13<01:22,  5.51s/it]evaluating Epoch:  72%|[32m███████▏  [0m| 36/50 [03:18<01:17,  5.52s/it]evaluating Epoch:  74%|[32m███████▍  [0m| 37/50 [03:24<01:11,  5.52s/it]evaluating Epoch:  76%|[32m███████▌  [0m| 38/50 [03:29<01:06,  5.52s/it]evaluating Epoch:  78%|[32m███████▊  [0m| 39/50 [03:35<01:00,  5.52s/it]evaluating Epoch:  80%|[32m████████  [0m| 40/50 [03:40<00:55,  5.52s/it]evaluating Epoch:  82%|[32m████████▏ [0m| 41/50 [03:46<00:49,  5.53s/it]evaluating Epoch:  84%|[32m████████▍ [0m| 42/50 [03:51<00:44,  5.52s/it]evaluating Epoch:  86%|[32m████████▌ [0m| 43/50 [03:57<00:38,  5.52s/it]evaluating Epoch:  88%|[32m████████▊ [0m| 44/50 [04:02<00:33,  5.52s/it]evaluating Epoch:  90%|[32m█████████ [0m| 45/50 [04:08<00:27,  5.52s/it]evaluating Epoch:  92%|[32m█████████▏[0m| 46/50 [04:13<00:22,  5.52s/it]evaluating Epoch:  94%|[32m█████████▍[0m| 47/50 [04:19<00:16,  5.52s/it]evaluating Epoch:  96%|[32m█████████▌[0m| 48/50 [04:24<00:11,  5.51s/it]evaluating Epoch:  98%|[32m█████████▊[0m| 49/50 [04:30<00:05,  5.51s/it]evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:35<00:00,  5.51s/it]Number of tokens in the example:  2609
Number of tokens in the example:  2416
Number of tokens in the example:  3098
Number of tokens in the example:  5148
Number of tokens in the example:  2243
Number of tokens in the example:  2973
Number of tokens in the example:  1823
Number of tokens in the example:  2652
Number of tokens in the example:  3725
Number of tokens in the example:  2306
Number of tokens in the example:  3866
Number of tokens in the example:  3906
Number of tokens in the example:  2912
Number of tokens in the example:  3901
Number of tokens in the example:  2799
Number of tokens in the example:  3804
Number of tokens in the example:  2601
Number of tokens in the example:  3474
Number of tokens in the example:  2322
Number of tokens in the example:  3776
Number of tokens in the example:  2793
Number of tokens in the example:  2844
Number of tokens in the example:  3203
Number of tokens in the example:  2372
Number of tokens in the example:  2456
Number of tokens in the example:  3333
Number of tokens in the example:  2513
Number of tokens in the example:  2733
Number of tokens in the example:  2692
Number of tokens in the example:  1843
Number of tokens in the example:  2805
Number of tokens in the example:  3050
Number of tokens in the example:  2846
Number of tokens in the example:  2476
Number of tokens in the example:  2573
Number of tokens in the example:  3750
Number of tokens in the example:  2651
Number of tokens in the example:  2776
Number of tokens in the example:  3344
Number of tokens in the example:  2910
Number of tokens in the example:  4373
Number of tokens in the example:  3128
Number of tokens in the example:  2312
Number of tokens in the example:  2353
Number of tokens in the example:  3772
Number of tokens in the example:  2715
Number of tokens in the example:  2773
Number of tokens in the example:  2906
Number of tokens in the example:  2889
Number of tokens in the example:  2629
evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:36<00:00,  5.52s/it]
 eval_ppl=tensor(1.5013, device='cuda:0') eval_epoch_loss=tensor(0.4063, device='cuda:0')
we are about to save the PEFT modules
PEFT modules are saved in FT-vicuna-13b-v1.5-16k directory
best eval loss on epoch 0 is 0.4063194990158081
Epoch 1: train_perplexity=1.6099, train_epoch_loss=0.4761, epcoh time 5509.670249344s
Training Epoch1:   0%|[34m          [0m| 0/200 [00:00<?, ?it/s]Training Epoch1:   0%|[34m          [0m| 1/200 [00:27<1:30:42, 27.35s/it]Training Epoch1:   1%|[34m          [0m| 2/200 [00:54<1:29:59, 27.27s/it]Training Epoch1:   2%|[34m▏         [0m| 3/200 [01:21<1:29:16, 27.19s/it]Training Epoch1:   2%|[34m▏         [0m| 4/200 [01:48<1:28:56, 27.23s/it]Training Epoch1:   2%|[34m▎         [0m| 5/200 [02:16<1:28:55, 27.36s/it]Training Epoch1:   3%|[34m▎         [0m| 6/200 [02:44<1:28:45, 27.45s/it]Training Epoch1:   4%|[34m▎         [0m| 7/200 [03:11<1:28:22, 27.47s/it]Training Epoch1:   4%|[34m▍         [0m| 8/200 [03:39<1:28:03, 27.52s/it]Training Epoch1:   4%|[34m▍         [0m| 9/200 [04:06<1:27:42, 27.55s/it]Training Epoch1:   5%|[34m▌         [0m| 10/200 [04:34<1:27:10, 27.53s/it]Training Epoch1:   6%|[34m▌         [0m| 11/200 [05:01<1:26:45, 27.54s/it]Training Epoch1:   6%|[34m▌         [0m| 12/200 [05:29<1:26:13, 27.52s/it]Training Epoch1:   6%|[34m▋         [0m| 13/200 [05:56<1:25:43, 27.51s/it]Training Epoch1:   7%|[34m▋         [0m| 14/200 [06:24<1:25:12, 27.49s/it]Training Epoch1:   8%|[34m▊         [0m| 15/200 [06:51<1:24:43, 27.48s/it]Training Epoch1:   8%|[34m▊         [0m| 16/200 [07:19<1:24:13, 27.47s/it]Training Epoch1:   8%|[34m▊         [0m| 17/200 [07:46<1:23:40, 27.44s/it]Number of tokens in the example:  2609
Number of tokens in the example:  2416
Number of tokens in the example:  3098
Number of tokens in the example:  5148
Number of tokens in the example:  2243
Number of tokens in the example:  2973
Number of tokens in the example:  1823
Number of tokens in the example:  2652
Number of tokens in the example:  3725
Number of tokens in the example:  2306
Number of tokens in the example:  3866
Number of tokens in the example:  3906
Number of tokens in the example:  2912
Number of tokens in the example:  3901
Number of tokens in the example:  2799
Number of tokens in the example:  3804
Number of tokens in the example:  2601
Number of tokens in the example:  3474
Number of tokens in the example:  2322
Number of tokens in the example:  3776
Training Epoch1:   8%|[34m▊         [0m| 17/200 [07:49<1:24:10, 27.60s/it]

 step 0 is completed and loss is 0.3852332830429077

 step 1 is completed and loss is 0.3781697750091553

 step 2 is completed and loss is 0.34786373376846313

 step 3 is completed and loss is 0.31446903944015503

 step 4 is completed and loss is 0.3743688762187958

 step 5 is completed and loss is 0.5165154933929443

 step 6 is completed and loss is 0.6102421879768372

 step 7 is completed and loss is 0.6805212497711182

 step 8 is completed and loss is 0.39271870255470276

 step 9 is completed and loss is 0.40273287892341614

 step 10 is completed and loss is 0.45219162106513977

 step 11 is completed and loss is 0.4930455684661865

 step 12 is completed and loss is 0.31844985485076904

 step 13 is completed and loss is 0.3095240294933319

 step 14 is completed and loss is 0.3343360424041748

 step 15 is completed and loss is 0.56951904296875

 step 16 is completed and loss is 0.4611594080924988
Traceback (most recent call last):
  File "/root/llama-recipes/llama_finetuning.py", line 253, in <module>
    fire.Fire(main)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/llama-recipes/llama_finetuning.py", line 236, in main
    results = train(
  File "/root/llama-recipes/utils/train_utils.py", line 93, in train
    loss = model(**batch).loss
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/peft/peft_model.py", line 931, in forward
    return self.base_model(
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
    return self.model.forward(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 690, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 686, in custom_forward
    return module(*inputs, past_key_value, output_attentions)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 368, in forward
    attn_output = self.o_proj(attn_output)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 303, in forward
    CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold)
  File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1634, in double_quant
    nnz = nnz_row_ptr[-1].item()
KeyboardInterrupt