forked from meta-llama/llama-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnohup.out
1176 lines (946 loc) · 58.1 KB
/
nohup.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:04<00:08, 4.49s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:08<00:04, 4.49s/it]/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/cuda/memory.py:303: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
warnings.warn(
--> Model lmsys/vicuna-13b-v1.5-16k
--> lmsys/vicuna-13b-v1.5-16k has 328.09472 Million params
trainable params: 6,553,600 || all params: 13,022,417,920 || trainable%: 0.05032552357220002
--> Training Set Length = 200
--> Validation Set Length = 50
evaluating Epoch: 0%|[32m [0m| 0/50 [00:00<?, ?it/s]/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
evaluating Epoch: 2%|[32m▏ [0m| 1/50 [00:07<06:07, 7.49s/it]evaluating Epoch: 4%|[32m▍ [0m| 2/50 [00:13<05:11, 6.50s/it]evaluating Epoch: 6%|[32m▌ [0m| 3/50 [00:19<04:51, 6.20s/it]evaluating Epoch: 8%|[32m▊ [0m| 4/50 [00:25<04:39, 6.07s/it]evaluating Epoch: 10%|[32m█ [0m| 5/50 [00:30<04:28, 5.97s/it]evaluating Epoch: 12%|[32m█▏ [0m| 6/50 [00:36<04:20, 5.93s/it]evaluating Epoch: 14%|[32m█▍ [0m| 7/50 [00:42<04:13, 5.88s/it]evaluating Epoch: 16%|[32m█▌ [0m| 8/50 [00:48<04:06, 5.87s/it]evaluating Epoch: 18%|[32m█▊ [0m| 9/50 [00:54<04:00, 5.86s/it]evaluating Epoch: 20%|[32m██ [0m| 10/50 [00:59<03:54, 5.86s/it]evaluating Epoch: 22%|[32m██▏ [0m| 11/50 [01:05<03:48, 5.86s/it]evaluating Epoch: 24%|[32m██▍ [0m| 12/50 [01:11<03:42, 5.86s/it]evaluating Epoch: 26%|[32m██▌ [0m| 13/50 [01:17<03:37, 5.89s/it]evaluating Epoch: 28%|[32m██▊ [0m| 14/50 [01:23<03:31, 5.88s/it]evaluating Epoch: 30%|[32m███ [0m| 15/50 [01:29<03:25, 5.88s/it]evaluating Epoch: 32%|[32m███▏ [0m| 16/50 [01:35<03:19, 5.88s/it]evaluating Epoch: 34%|[32m███▍ [0m| 17/50 [01:40<03:11, 5.81s/it]evaluating Epoch: 36%|[32m███▌ [0m| 18/50 [01:46<03:05, 5.81s/it]evaluating Epoch: 38%|[32m███▊ [0m| 19/50 [01:52<02:59, 5.79s/it]evaluating Epoch: 40%|[32m████ [0m| 20/50 [01:58<02:54, 5.80s/it]evaluating Epoch: 42%|[32m████▏ [0m| 21/50 [02:04<02:48, 5.82s/it]evaluating Epoch: 44%|[32m████▍ [0m| 22/50 [02:09<02:42, 5.80s/it]evaluating Epoch: 46%|[32m████▌ [0m| 23/50 [02:15<02:36, 5.79s/it]evaluating Epoch: 48%|[32m████▊ [0m| 24/50 [02:21<02:30, 5.79s/it]evaluating Epoch: 50%|[32m█████ [0m| 25/50 [02:27<02:25, 5.81s/it]evaluating Epoch: 52%|[32m█████▏ [0m| 26/50 [02:33<02:19, 5.81s/it]evaluating Epoch: 54%|[32m█████▍ [0m| 27/50 [02:38<02:13, 5.81s/it]evaluating Epoch: 56%|[32m█████▌ [0m| 28/50 [02:44<02:08, 5.82s/it]evaluating Epoch: 58%|[32m█████▊ [0m| 29/50 [02:50<02:02, 5.82s/it]evaluating Epoch: 60%|[32m██████ [0m| 30/50 [02:56<01:56, 5.82s/it]evaluating Epoch: 62%|[32m██████▏ [0m| 31/50 [03:02<01:50, 5.80s/it]evaluating Epoch: 64%|[32m██████▍ [0m| 32/50 [03:07<01:44, 5.79s/it]evaluating Epoch: 66%|[32m██████▌ [0m| 33/50 [03:13<01:38, 5.81s/it]evaluating Epoch: 68%|[32m██████▊ [0m| 34/50 [03:19<01:33, 5.82s/it]evaluating Epoch: 70%|[32m███████ [0m| 35/50 [03:25<01:27, 5.82s/it]evaluating Epoch: 72%|[32m███████▏ [0m| 36/50 [03:31<01:21, 5.83s/it]evaluating Epoch: 74%|[32m███████▍ [0m| 37/50 [03:37<01:15, 5.83s/it]evaluating Epoch: 76%|[32m███████▌ [0m| 38/50 [03:43<01:10, 5.84s/it]evaluating Epoch: 78%|[32m███████▊ [0m| 39/50 [03:48<01:04, 5.83s/it]evaluating Epoch: 80%|[32m████████ [0m| 40/50 [03:54<00:58, 5.82s/it]evaluating Epoch: 82%|[32m████████▏ [0m| 41/50 [04:00<00:52, 5.81s/it]evaluating Epoch: 84%|[32m████████▍ [0m| 42/50 [04:06<00:46, 5.80s/it]evaluating Epoch: 86%|[32m████████▌ [0m| 43/50 [04:12<00:40, 5.82s/it]evaluating Epoch: 88%|[32m████████▊ [0m| 44/50 [04:17<00:34, 5.80s/it]evaluating Epoch: 90%|[32m█████████ [0m| 45/50 [04:23<00:28, 5.78s/it]evaluating Epoch: 92%|[32m█████████▏[0m| 46/50 [04:29<00:23, 5.78s/it]evaluating Epoch: 94%|[32m█████████▍[0m| 47/50 [04:35<00:17, 5.80s/it]evaluating Epoch: 96%|[32m█████████▌[0m| 48/50 [04:40<00:11, 5.81s/it]evaluating Epoch: 98%|[32m█████████▊[0m| 49/50 [04:46<00:05, 5.80s/it]evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:52<00:00, 5.79s/it]Number of tokens in the example: 2609
Number of tokens in the example: 2416
Number of tokens in the example: 3098
Number of tokens in the example: 5148
Number of tokens in the example: 2243
Number of tokens in the example: 2973
Number of tokens in the example: 1823
Number of tokens in the example: 2652
Number of tokens in the example: 3725
Number of tokens in the example: 2306
Number of tokens in the example: 3866
Number of tokens in the example: 3906
Number of tokens in the example: 2912
Number of tokens in the example: 3901
Number of tokens in the example: 2799
Number of tokens in the example: 3804
Number of tokens in the example: 2601
Number of tokens in the example: 3474
Number of tokens in the example: 2322
Number of tokens in the example: 3776
Number of tokens in the example: 2793
Number of tokens in the example: 2844
Number of tokens in the example: 3203
Number of tokens in the example: 2372
Number of tokens in the example: 2456
Number of tokens in the example: 3333
Number of tokens in the example: 2513
Number of tokens in the example: 2733
Number of tokens in the example: 2692
Number of tokens in the example: 1843
Number of tokens in the example: 2805
Number of tokens in the example: 3050
Number of tokens in the example: 2846
Number of tokens in the example: 2476
Number of tokens in the example: 2573
Number of tokens in the example: 3750
Number of tokens in the example: 2651
Number of tokens in the example: 2776
Number of tokens in the example: 3344
Number of tokens in the example: 2910
Number of tokens in the example: 4373
Number of tokens in the example: 3128
Number of tokens in the example: 2312
Number of tokens in the example: 2353
Number of tokens in the example: 3772
Number of tokens in the example: 2715
Number of tokens in the example: 2773
Number of tokens in the example: 2906
Number of tokens in the example: 2889
Number of tokens in the example: 2629
evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:52<00:00, 5.85s/it]
eval_ppl=tensor(1.7953, device='cuda:0') eval_epoch_loss=tensor(0.5852, device='cuda:0')
Training Epoch0: 0%|[34m [0m| 0/200 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Training Epoch0: 0%|[34m [0m| 1/200 [00:28<1:33:45, 28.27s/it]Training Epoch0: 1%|[34m [0m| 2/200 [00:55<1:31:30, 27.73s/it]Training Epoch0: 2%|[34m▏ [0m| 3/200 [01:22<1:30:30, 27.57s/it]Training Epoch0: 2%|[34m▏ [0m| 4/200 [01:50<1:29:47, 27.49s/it]Training Epoch0: 2%|[34m▎ [0m| 5/200 [02:17<1:29:17, 27.48s/it]Training Epoch0: 3%|[34m▎ [0m| 6/200 [02:45<1:28:52, 27.49s/it]Training Epoch0: 4%|[34m▎ [0m| 7/200 [03:12<1:28:11, 27.42s/it]Training Epoch0: 4%|[34m▍ [0m| 8/200 [03:39<1:27:37, 27.38s/it]Training Epoch0: 4%|[34m▍ [0m| 9/200 [04:07<1:27:15, 27.41s/it]Training Epoch0: 5%|[34m▌ [0m| 10/200 [04:34<1:26:53, 27.44s/it]Training Epoch0: 6%|[34m▌ [0m| 11/200 [05:02<1:26:26, 27.44s/it]Training Epoch0: 6%|[34m▌ [0m| 12/200 [05:29<1:26:02, 27.46s/it]Training Epoch0: 6%|[34m▋ [0m| 13/200 [05:57<1:25:30, 27.44s/it]Training Epoch0: 7%|[34m▋ [0m| 14/200 [06:24<1:24:59, 27.42s/it]Training Epoch0: 8%|[34m▊ [0m| 15/200 [06:51<1:24:31, 27.41s/it]Training Epoch0: 8%|[34m▊ [0m| 16/200 [07:19<1:24:03, 27.41s/it]Training Epoch0: 8%|[34m▊ [0m| 17/200 [07:46<1:23:41, 27.44s/it]Training Epoch0: 9%|[34m▉ [0m| 18/200 [08:14<1:23:14, 27.44s/it]Training Epoch0: 10%|[34m▉ [0m| 19/200 [08:41<1:22:44, 27.43s/it]Training Epoch0: 10%|[34m█ [0m| 20/200 [09:09<1:22:13, 27.41s/it]Training Epoch0: 10%|[34m█ [0m| 21/200 [09:36<1:21:45, 27.41s/it]Training Epoch0: 11%|[34m█ [0m| 22/200 [10:04<1:21:25, 27.45s/it]Training Epoch0: 12%|[34m█▏ [0m| 23/200 [10:31<1:21:06, 27.49s/it]Training Epoch0: 12%|[34m█▏ [0m| 24/200 [10:59<1:20:37, 27.49s/it]Training Epoch0: 12%|[34m█▎ [0m| 25/200 [11:26<1:20:09, 27.48s/it]Training Epoch0: 13%|[34m█▎ [0m| 26/200 [11:54<1:19:46, 27.51s/it]Training Epoch0: 14%|[34m█▎ [0m| 27/200 [12:21<1:19:18, 27.51s/it]Training Epoch0: 14%|[34m█▍ [0m| 28/200 [12:49<1:18:53, 27.52s/it]Training Epoch0: 14%|[34m█▍ [0m| 29/200 [13:16<1:18:27, 27.53s/it]Training Epoch0: 15%|[34m█▌ [0m| 30/200 [13:44<1:18:03, 27.55s/it]Training Epoch0: 16%|[34m█▌ [0m| 31/200 [14:11<1:17:36, 27.56s/it]Training Epoch0: 16%|[34m█▌ [0m| 32/200 [14:39<1:17:13, 27.58s/it]Training Epoch0: 16%|[34m█▋ [0m| 33/200 [15:07<1:16:53, 27.63s/it]Training Epoch0: 17%|[34m█▋ [0m| 34/200 [15:35<1:16:30, 27.65s/it]Training Epoch0: 18%|[34m█▊ [0m| 35/200 [16:02<1:16:03, 27.66s/it]Training Epoch0: 18%|[34m█▊ [0m| 36/200 [16:30<1:15:35, 27.65s/it]Training Epoch0: 18%|[34m█▊ [0m| 37/200 [16:57<1:15:00, 27.61s/it]Training Epoch0: 19%|[34m█▉ [0m| 38/200 [17:25<1:14:34, 27.62s/it]Training Epoch0: 20%|[34m█▉ [0m| 39/200 [17:53<1:14:05, 27.61s/it]Training Epoch0: 20%|[34m██ [0m| 40/200 [18:20<1:13:37, 27.61s/it]Training Epoch0: 20%|[34m██ [0m| 41/200 [18:48<1:13:10, 27.62s/it]Training Epoch0: 21%|[34m██ [0m| 42/200 [19:15<1:12:41, 27.61s/it]Training Epoch0: 22%|[34m██▏ [0m| 43/200 [19:43<1:12:15, 27.61s/it]Training Epoch0: 22%|[34m██▏ [0m| 44/200 [20:11<1:11:44, 27.60s/it]Training Epoch0: 22%|[34m██▎ [0m| 45/200 [20:38<1:11:15, 27.58s/it]Training Epoch0: 23%|[34m██▎ [0m| 46/200 [21:06<1:10:48, 27.59s/it]Training Epoch0: 24%|[34m██▎ [0m| 47/200 [21:33<1:10:20, 27.58s/it]Training Epoch0: 24%|[34m██▍ [0m| 48/200 [22:01<1:09:51, 27.58s/it]Training Epoch0: 24%|[34m██▍ [0m| 49/200 [22:28<1:09:25, 27.58s/it]Training Epoch0: 25%|[34m██▌ [0m| 50/200 [22:56<1:08:58, 27.59s/it]Training Epoch0: 26%|[34m██▌ [0m| 51/200 [23:24<1:08:30, 27.59s/it]Training Epoch0: 26%|[34m██▌ [0m| 52/200 [23:51<1:08:00, 27.57s/it]Training Epoch0: 26%|[34m██▋ [0m| 53/200 [24:19<1:07:32, 27.57s/it]Training Epoch0: 27%|[34m██▋ [0m| 54/200 [24:46<1:07:01, 27.54s/it]Training Epoch0: 28%|[34m██▊ [0m| 55/200 [25:14<1:06:34, 27.55s/it]Training Epoch0: 28%|[34m██▊ [0m| 56/200 [25:41<1:06:11, 27.58s/it]Training Epoch0: 28%|[34m██▊ [0m| 57/200 [26:09<1:05:46, 27.60s/it]Training Epoch0: 29%|[34m██▉ [0m| 58/200 [26:37<1:05:14, 27.56s/it]Training Epoch0: 30%|[34m██▉ [0m| 59/200 [27:04<1:04:40, 27.52s/it]Training Epoch0: 30%|[34m███ [0m| 60/200 [27:32<1:04:12, 27.52s/it]Training Epoch0: 30%|[34m███ [0m| 61/200 [27:59<1:03:49, 27.55s/it]Training Epoch0: 31%|[34m███ [0m| 62/200 [28:27<1:03:19, 27.53s/it]Training Epoch0: 32%|[34m███▏ [0m| 63/200 [28:54<1:02:48, 27.51s/it]Training Epoch0: 32%|[34m███▏ [0m| 64/200 [29:22<1:02:30, 27.58s/it]Training Epoch0: 32%|[34m███▎ [0m| 65/200 [29:50<1:02:08, 27.62s/it]Training Epoch0: 33%|[34m███▎ [0m| 66/200 [30:17<1:01:39, 27.60s/it]Training Epoch0: 34%|[34m███▎ [0m| 67/200 [30:45<1:01:06, 27.57s/it]Training Epoch0: 34%|[34m███▍ [0m| 68/200 [31:12<1:00:37, 27.56s/it]Training Epoch0: 34%|[34m███▍ [0m| 69/200 [31:39<1:00:02, 27.50s/it]Training Epoch0: 35%|[34m███▌ [0m| 70/200 [32:07<59:35, 27.51s/it] Training Epoch0: 36%|[34m███▌ [0m| 71/200 [32:35<59:13, 27.55s/it]Training Epoch0: 36%|[34m███▌ [0m| 72/200 [33:02<58:45, 27.54s/it]Training Epoch0: 36%|[34m███▋ [0m| 73/200 [33:30<58:22, 27.58s/it]Training Epoch0: 37%|[34m███▋ [0m| 74/200 [33:57<57:56, 27.59s/it]Training Epoch0: 38%|[34m███▊ [0m| 75/200 [34:25<57:33, 27.62s/it]Training Epoch0: 38%|[34m███▊ [0m| 76/200 [34:53<57:05, 27.62s/it]Training Epoch0: 38%|[34m███▊ [0m| 77/200 [35:20<56:39, 27.64s/it]Training Epoch0: 39%|[34m███▉ [0m| 78/200 [35:48<56:12, 27.65s/it]Training Epoch0: 40%|[34m███▉ [0m| 79/200 [36:16<55:45, 27.65s/it]Training Epoch0: 40%|[34m████ [0m| 80/200 [36:43<55:18, 27.65s/it]Training Epoch0: 40%|[34m████ [0m| 81/200 [37:11<54:53, 27.67s/it]Training Epoch0: 41%|[34m████ [0m| 82/200 [37:39<54:26, 27.68s/it]Training Epoch0: 42%|[34m████▏ [0m| 83/200 [38:07<53:57, 27.67s/it]Training Epoch0: 42%|[34m████▏ [0m| 84/200 [38:34<53:28, 27.66s/it]Training Epoch0: 42%|[34m████▎ [0m| 85/200 [39:02<53:00, 27.66s/it]Training Epoch0: 43%|[34m████▎ [0m| 86/200 [39:29<52:23, 27.57s/it]Training Epoch0: 44%|[34m████▎ [0m| 87/200 [39:57<51:52, 27.55s/it]Training Epoch0: 44%|[34m████▍ [0m| 88/200 [40:24<51:21, 27.51s/it]Training Epoch0: 44%|[34m████▍ [0m| 89/200 [40:52<50:56, 27.54s/it]Training Epoch0: 45%|[34m████▌ [0m| 90/200 [41:19<50:29, 27.54s/it]Training Epoch0: 46%|[34m████▌ [0m| 91/200 [41:47<50:02, 27.54s/it]Training Epoch0: 46%|[34m████▌ [0m| 92/200 [42:14<49:39, 27.59s/it]Training Epoch0: 46%|[34m████▋ [0m| 93/200 [42:42<49:12, 27.59s/it]Training Epoch0: 47%|[34m████▋ [0m| 94/200 [43:10<48:42, 27.57s/it]Training Epoch0: 48%|[34m████▊ [0m| 95/200 [43:37<48:09, 27.52s/it]Training Epoch0: 48%|[34m████▊ [0m| 96/200 [44:05<47:41, 27.51s/it]Training Epoch0: 48%|[34m████▊ [0m| 97/200 [44:32<47:17, 27.55s/it]Training Epoch0: 49%|[34m████▉ [0m| 98/200 [45:00<46:52, 27.58s/it]Training Epoch0: 50%|[34m████▉ [0m| 99/200 [45:27<46:28, 27.61s/it]Training Epoch0: 50%|[34m█████ [0m| 100/200 [45:55<46:00, 27.60s/it]Training Epoch0: 50%|[34m█████ [0m| 101/200 [46:23<45:32, 27.60s/it]Training Epoch0: 51%|[34m█████ [0m| 102/200 [46:50<45:04, 27.60s/it]Training Epoch0: 52%|[34m█████▏ [0m| 103/200 [47:18<44:35, 27.59s/it]Training Epoch0: 52%|[34m█████▏ [0m| 104/200 [47:45<44:06, 27.57s/it]Training Epoch0: 52%|[34m█████▎ [0m| 105/200 [48:13<43:38, 27.57s/it]Training Epoch0: 53%|[34m█████▎ [0m| 106/200 [48:40<43:09, 27.54s/it]Training Epoch0: 54%|[34m█████▎ [0m| 107/200 [49:08<42:41, 27.55s/it]Training Epoch0: 54%|[34m█████▍ [0m| 108/200 [49:36<42:17, 27.58s/it]Training Epoch0: 55%|[34m█████▍ [0m| 109/200 [50:03<41:48, 27.56s/it]Training Epoch0: 55%|[34m█████▌ [0m| 110/200 [50:31<41:21, 27.58s/it]Training Epoch0: 56%|[34m█████▌ [0m| 111/200 [50:58<40:52, 27.56s/it]Training Epoch0: 56%|[34m█████▌ [0m| 112/200 [51:26<40:23, 27.54s/it]Training Epoch0: 56%|[34m█████▋ [0m| 113/200 [51:53<39:52, 27.50s/it]Training Epoch0: 57%|[34m█████▋ [0m| 114/200 [52:21<39:28, 27.54s/it]Training Epoch0: 57%|[34m█████▊ [0m| 115/200 [52:48<39:02, 27.56s/it]Training Epoch0: 58%|[34m█████▊ [0m| 116/200 [53:16<38:35, 27.56s/it]Training Epoch0: 58%|[34m█████▊ [0m| 117/200 [53:44<38:11, 27.60s/it]Training Epoch0: 59%|[34m█████▉ [0m| 118/200 [54:11<37:47, 27.65s/it]Training Epoch0: 60%|[34m█████▉ [0m| 119/200 [54:39<37:21, 27.67s/it]Training Epoch0: 60%|[34m██████ [0m| 120/200 [55:07<36:52, 27.65s/it]Training Epoch0: 60%|[34m██████ [0m| 121/200 [55:34<36:23, 27.64s/it]Training Epoch0: 61%|[34m██████ [0m| 122/200 [56:02<35:53, 27.60s/it]Training Epoch0: 62%|[34m██████▏ [0m| 123/200 [56:29<35:24, 27.59s/it]Training Epoch0: 62%|[34m██████▏ [0m| 124/200 [56:57<34:56, 27.58s/it]Training Epoch0: 62%|[34m██████▎ [0m| 125/200 [57:25<34:27, 27.57s/it]Training Epoch0: 63%|[34m██████▎ [0m| 126/200 [57:52<34:02, 27.60s/it]Training Epoch0: 64%|[34m██████▎ [0m| 127/200 [58:20<33:34, 27.60s/it]Training Epoch0: 64%|[34m██████▍ [0m| 128/200 [58:47<33:06, 27.59s/it]Training Epoch0: 64%|[34m██████▍ [0m| 129/200 [59:15<32:36, 27.55s/it]Training Epoch0: 65%|[34m██████▌ [0m| 130/200 [59:42<32:10, 27.58s/it]Training Epoch0: 66%|[34m██████▌ [0m| 131/200 [1:00:10<31:38, 27.51s/it]Training Epoch0: 66%|[34m██████▌ [0m| 132/200 [1:00:38<31:15, 27.58s/it]Training Epoch0: 66%|[34m██████▋ [0m| 133/200 [1:01:05<30:51, 27.63s/it]Training Epoch0: 67%|[34m██████▋ [0m| 134/200 [1:01:33<30:22, 27.62s/it]Training Epoch0: 68%|[34m██████▊ [0m| 135/200 [1:02:01<29:55, 27.62s/it]Training Epoch0: 68%|[34m██████▊ [0m| 136/200 [1:02:28<29:27, 27.61s/it]Training Epoch0: 68%|[34m██████▊ [0m| 137/200 [1:02:56<28:57, 27.58s/it]Training Epoch0: 69%|[34m██████▉ [0m| 138/200 [1:03:23<28:29, 27.57s/it]Training Epoch0: 70%|[34m██████▉ [0m| 139/200 [1:03:51<27:57, 27.51s/it]Training Epoch0: 70%|[34m███████ [0m| 140/200 [1:04:18<27:26, 27.45s/it]Training Epoch0: 70%|[34m███████ [0m| 141/200 [1:04:45<27:01, 27.48s/it]Training Epoch0: 71%|[34m███████ [0m| 142/200 [1:05:13<26:35, 27.50s/it]Training Epoch0: 72%|[34m███████▏ [0m| 143/200 [1:05:41<26:08, 27.52s/it]Training Epoch0: 72%|[34m███████▏ [0m| 144/200 [1:06:08<25:42, 27.54s/it]Training Epoch0: 72%|[34m███████▎ [0m| 145/200 [1:06:36<25:15, 27.56s/it]Training Epoch0: 73%|[34m███████▎ [0m| 146/200 [1:07:03<24:48, 27.56s/it]Training Epoch0: 74%|[34m███████▎ [0m| 147/200 [1:07:31<24:19, 27.54s/it]Training Epoch0: 74%|[34m███████▍ [0m| 148/200 [1:07:58<23:51, 27.53s/it]Training Epoch0: 74%|[34m███████▍ [0m| 149/200 [1:08:26<23:25, 27.55s/it]Training Epoch0: 75%|[34m███████▌ [0m| 150/200 [1:08:53<22:57, 27.55s/it]
step 0 is completed and loss is 0.6250191330909729
step 1 is completed and loss is 0.5554953813552856
step 2 is completed and loss is 0.5113803148269653
step 3 is completed and loss is 0.47984254360198975
step 4 is completed and loss is 0.5095067024230957
step 5 is completed and loss is 0.657611072063446
step 6 is completed and loss is 0.798925518989563
step 7 is completed and loss is 0.9212873578071594
step 8 is completed and loss is 0.5055791735649109
step 9 is completed and loss is 0.5641234517097473
step 10 is completed and loss is 0.7621738910675049
step 11 is completed and loss is 0.6242639422416687
step 12 is completed and loss is 0.45724597573280334
step 13 is completed and loss is 0.40364477038383484
step 14 is completed and loss is 0.4145205616950989
step 15 is completed and loss is 0.6699330806732178
step 16 is completed and loss is 0.5529826879501343
step 17 is completed and loss is 0.44825437664985657
step 18 is completed and loss is 0.5366003513336182
step 19 is completed and loss is 0.5394286513328552
step 20 is completed and loss is 0.5539140105247498
step 21 is completed and loss is 0.5224546790122986
step 22 is completed and loss is 0.33682045340538025
step 23 is completed and loss is 0.5308464765548706
step 24 is completed and loss is 0.47635382413864136
step 25 is completed and loss is 0.36415642499923706
step 26 is completed and loss is 0.410065233707428
step 27 is completed and loss is 0.701794445514679
step 28 is completed and loss is 0.786088228225708
step 29 is completed and loss is 0.3611423969268799
step 30 is completed and loss is 0.8073915839195251
step 31 is completed and loss is 0.3231208026409149
step 32 is completed and loss is 0.5962550044059753
step 33 is completed and loss is 0.3546735942363739
step 34 is completed and loss is 0.3259352445602417
step 35 is completed and loss is 0.4307865500450134
step 36 is completed and loss is 0.3867531418800354
step 37 is completed and loss is 0.47947242856025696
step 38 is completed and loss is 0.2885544002056122
step 39 is completed and loss is 0.7178091406822205
step 40 is completed and loss is 0.49836665391921997
step 41 is completed and loss is 0.5450732707977295
step 42 is completed and loss is 0.5057857632637024
step 43 is completed and loss is 0.4220871925354004
step 44 is completed and loss is 0.46608904004096985
step 45 is completed and loss is 0.6253067851066589
step 46 is completed and loss is 0.6189549565315247
step 47 is completed and loss is 0.485623836517334
step 48 is completed and loss is 0.412734717130661
step 49 is completed and loss is 0.46035242080688477
step 50 is completed and loss is 0.4364945590496063
step 51 is completed and loss is 0.36005815863609314
step 52 is completed and loss is 0.5584291815757751
step 53 is completed and loss is 0.42429718375205994
step 54 is completed and loss is 0.5556947588920593
step 55 is completed and loss is 0.6173613667488098
step 56 is completed and loss is 0.28500840067863464
step 57 is completed and loss is 0.5731761455535889
step 58 is completed and loss is 0.2577945291996002
step 59 is completed and loss is 0.32212626934051514
step 60 is completed and loss is 0.3991536498069763
step 61 is completed and loss is 0.45093005895614624
step 62 is completed and loss is 0.3394198417663574
step 63 is completed and loss is 0.6754112839698792
step 64 is completed and loss is 0.36514154076576233
step 65 is completed and loss is 0.3622404634952545
step 66 is completed and loss is 0.42477038502693176
step 67 is completed and loss is 0.5095375776290894
step 68 is completed and loss is 0.46517670154571533
step 69 is completed and loss is 0.3783566951751709
step 70 is completed and loss is 0.4111236035823822
step 71 is completed and loss is 0.43066808581352234
step 72 is completed and loss is 0.5348895788192749
step 73 is completed and loss is 0.4956885576248169
step 74 is completed and loss is 0.5247679352760315
step 75 is completed and loss is 0.6096778512001038
step 76 is completed and loss is 0.40839439630508423
step 77 is completed and loss is 0.4552842974662781
step 78 is completed and loss is 0.3436572253704071
step 79 is completed and loss is 0.21521404385566711
step 80 is completed and loss is 0.4831559360027313
step 81 is completed and loss is 0.6429739594459534
step 82 is completed and loss is 0.3906168043613434
step 83 is completed and loss is 0.7419795989990234
step 84 is completed and loss is 0.3272210657596588
step 85 is completed and loss is 0.3324776589870453
step 86 is completed and loss is 0.42303216457366943
step 87 is completed and loss is 0.6022685766220093
step 88 is completed and loss is 0.4502815306186676
step 89 is completed and loss is 0.28357505798339844
step 90 is completed and loss is 0.4070228338241577
step 91 is completed and loss is 0.5331108570098877
step 92 is completed and loss is 0.6394806504249573
step 93 is completed and loss is 0.2802354395389557
step 94 is completed and loss is 0.30952689051628113
step 95 is completed and loss is 0.41465920209884644
step 96 is completed and loss is 0.3368547856807709
step 97 is completed and loss is 0.3200959265232086
step 98 is completed and loss is 0.7304431796073914
step 99 is completed and loss is 0.7395025491714478
step 100 is completed and loss is 0.314739853143692
step 101 is completed and loss is 0.4538938105106354
step 102 is completed and loss is 0.36950770020484924
step 103 is completed and loss is 0.4940294623374939
step 104 is completed and loss is 0.5958256721496582
step 105 is completed and loss is 0.4647957980632782
step 106 is completed and loss is 0.6770808696746826
step 107 is completed and loss is 0.42914003133773804
step 108 is completed and loss is 0.8305107355117798
step 109 is completed and loss is 0.3427654504776001
step 110 is completed and loss is 0.7463263273239136
step 111 is completed and loss is 0.5293580889701843
step 112 is completed and loss is 0.4033021032810211
step 113 is completed and loss is 0.4478365480899811
step 114 is completed and loss is 0.7052018046379089
step 115 is completed and loss is 0.2819420099258423
step 116 is completed and loss is 0.3327101469039917
step 117 is completed and loss is 0.5378854274749756
step 118 is completed and loss is 0.33610737323760986
step 119 is completed and loss is 0.4828667938709259
step 120 is completed and loss is 0.5126594305038452
step 121 is completed and loss is 0.4066528081893921
step 122 is completed and loss is 0.44863107800483704
step 123 is completed and loss is 0.2988591194152832
step 124 is completed and loss is 0.6256280541419983
step 125 is completed and loss is 0.764495849609375
step 126 is completed and loss is 0.39525049924850464
step 127 is completed and loss is 0.48502588272094727
step 128 is completed and loss is 0.44303399324417114
step 129 is completed and loss is 0.4531609117984772
step 130 is completed and loss is 0.39435404539108276
step 131 is completed and loss is 0.5459421277046204
step 132 is completed and loss is 0.3750486671924591
step 133 is completed and loss is 0.6193696856498718
step 134 is completed and loss is 0.5769856572151184
step 135 is completed and loss is 0.437613308429718
step 136 is completed and loss is 0.4931134283542633
step 137 is completed and loss is 0.48926153779029846
step 138 is completed and loss is 0.4128901958465576
step 139 is completed and loss is 0.27938660979270935
step 140 is completed and loss is 0.4067152142524719
step 141 is completed and loss is 0.663224458694458
step 142 is completed and loss is 0.35807809233665466
step 143 is completed and loss is 0.30169665813446045
step 144 is completed and loss is 0.41366511583328247
step 145 is completed and loss is 0.38099291920661926
step 146 is completed and loss is 0.5524359345436096
step 147 is completed and loss is 0.5115877389907837
step 148 is completed and loss is 0.7252401113510132
step 149 is completed and loss is 0.23235322535037994
Training Epoch0: 76%|[34m███████▌ [0m| 151/200 [1:09:21<22:30, 27.57s/it]Training Epoch0: 76%|[34m███████▌ [0m| 152/200 [1:09:49<22:03, 27.57s/it]Training Epoch0: 76%|[34m███████▋ [0m| 153/200 [1:10:16<21:35, 27.56s/it]Training Epoch0: 77%|[34m███████▋ [0m| 154/200 [1:10:44<21:06, 27.54s/it]Training Epoch0: 78%|[34m███████▊ [0m| 155/200 [1:11:11<20:39, 27.55s/it]Training Epoch0: 78%|[34m███████▊ [0m| 156/200 [1:11:39<20:11, 27.54s/it]Training Epoch0: 78%|[34m███████▊ [0m| 157/200 [1:12:06<19:41, 27.47s/it]Training Epoch0: 79%|[34m███████▉ [0m| 158/200 [1:12:33<19:13, 27.46s/it]Training Epoch0: 80%|[34m███████▉ [0m| 159/200 [1:13:01<18:46, 27.47s/it]Training Epoch0: 80%|[34m████████ [0m| 160/200 [1:13:28<18:18, 27.47s/it]Training Epoch0: 80%|[34m████████ [0m| 161/200 [1:13:56<17:52, 27.49s/it]Training Epoch0: 81%|[34m████████ [0m| 162/200 [1:14:23<17:23, 27.47s/it]Training Epoch0: 82%|[34m████████▏ [0m| 163/200 [1:14:51<16:55, 27.44s/it]Training Epoch0: 82%|[34m████████▏ [0m| 164/200 [1:15:18<16:28, 27.45s/it]Training Epoch0: 82%|[34m████████▎ [0m| 165/200 [1:15:46<16:00, 27.44s/it]Training Epoch0: 83%|[34m████████▎ [0m| 166/200 [1:16:13<15:32, 27.43s/it]Training Epoch0: 84%|[34m████████▎ [0m| 167/200 [1:16:41<15:05, 27.44s/it]Training Epoch0: 84%|[34m████████▍ [0m| 168/200 [1:17:08<14:38, 27.45s/it]Training Epoch0: 84%|[34m████████▍ [0m| 169/200 [1:17:35<14:10, 27.45s/it]Training Epoch0: 85%|[34m████████▌ [0m| 170/200 [1:18:03<13:42, 27.43s/it]Training Epoch0: 86%|[34m████████▌ [0m| 171/200 [1:18:30<13:15, 27.42s/it]Training Epoch0: 86%|[34m████████▌ [0m| 172/200 [1:18:58<12:48, 27.43s/it]Training Epoch0: 86%|[34m████████▋ [0m| 173/200 [1:19:25<12:20, 27.42s/it]Training Epoch0: 87%|[34m████████▋ [0m| 174/200 [1:19:53<11:54, 27.47s/it]Training Epoch0: 88%|[34m████████▊ [0m| 175/200 [1:20:20<11:26, 27.47s/it]Training Epoch0: 88%|[34m████████▊ [0m| 176/200 [1:20:48<11:00, 27.50s/it]Training Epoch0: 88%|[34m████████▊ [0m| 177/200 [1:21:15<10:32, 27.52s/it]Training Epoch0: 89%|[34m████████▉ [0m| 178/200 [1:21:43<10:05, 27.52s/it]Training Epoch0: 90%|[34m████████▉ [0m| 179/200 [1:22:10<09:37, 27.51s/it]Training Epoch0: 90%|[34m█████████ [0m| 180/200 [1:22:38<09:10, 27.52s/it]Training Epoch0: 90%|[34m█████████ [0m| 181/200 [1:23:06<08:43, 27.58s/it]Training Epoch0: 91%|[34m█████████ [0m| 182/200 [1:23:33<08:16, 27.57s/it]Training Epoch0: 92%|[34m█████████▏[0m| 183/200 [1:24:01<07:48, 27.57s/it]Training Epoch0: 92%|[34m█████████▏[0m| 184/200 [1:24:28<07:20, 27.55s/it]Training Epoch0: 92%|[34m█████████▎[0m| 185/200 [1:24:56<06:52, 27.53s/it]Training Epoch0: 93%|[34m█████████▎[0m| 186/200 [1:25:23<06:24, 27.49s/it]Training Epoch0: 94%|[34m█████████▎[0m| 187/200 [1:25:51<05:57, 27.50s/it]Training Epoch0: 94%|[34m█████████▍[0m| 188/200 [1:26:18<05:30, 27.51s/it]Training Epoch0: 94%|[34m█████████▍[0m| 189/200 [1:26:46<05:02, 27.53s/it]Training Epoch0: 95%|[34m█████████▌[0m| 190/200 [1:27:13<04:35, 27.53s/it]Training Epoch0: 96%|[34m█████████▌[0m| 191/200 [1:27:41<04:07, 27.55s/it]Training Epoch0: 96%|[34m█████████▌[0m| 192/200 [1:28:08<03:40, 27.56s/it]Training Epoch0: 96%|[34m█████████▋[0m| 193/200 [1:28:36<03:12, 27.53s/it]Training Epoch0: 97%|[34m█████████▋[0m| 194/200 [1:29:03<02:45, 27.56s/it]Training Epoch0: 98%|[34m█████████▊[0m| 195/200 [1:29:31<02:17, 27.59s/it]Training Epoch0: 98%|[34m█████████▊[0m| 196/200 [1:29:59<01:50, 27.59s/it]Training Epoch0: 98%|[34m█████████▊[0m| 197/200 [1:30:26<01:22, 27.61s/it]Training Epoch0: 99%|[34m█████████▉[0m| 198/200 [1:30:54<00:55, 27.64s/it]Training Epoch0: 100%|[34m█████████▉[0m| 199/200 [1:31:21<00:27, 27.55s/it]Training Epoch0: 100%|[34m██████████[0m| 200/200 [1:31:49<00:00, 27.52s/it]Number of tokens in the example: 2609
Number of tokens in the example: 2416
Number of tokens in the example: 3098
Number of tokens in the example: 5148
Number of tokens in the example: 2243
Number of tokens in the example: 2973
Number of tokens in the example: 1823
Number of tokens in the example: 2652
Number of tokens in the example: 3725
Number of tokens in the example: 2306
Number of tokens in the example: 3866
Number of tokens in the example: 3906
Number of tokens in the example: 2912
Number of tokens in the example: 3901
Number of tokens in the example: 2799
Number of tokens in the example: 3804
Number of tokens in the example: 2601
Number of tokens in the example: 3474
Number of tokens in the example: 2322
Number of tokens in the example: 3776
Number of tokens in the example: 2793
Number of tokens in the example: 2844
Number of tokens in the example: 3203
Number of tokens in the example: 2372
Number of tokens in the example: 2456
Number of tokens in the example: 3333
Number of tokens in the example: 2513
Number of tokens in the example: 2733
Number of tokens in the example: 2692
Number of tokens in the example: 1843
Number of tokens in the example: 2805
Number of tokens in the example: 3050
Number of tokens in the example: 2846
Number of tokens in the example: 2476
Number of tokens in the example: 2573
Number of tokens in the example: 3750
Number of tokens in the example: 2651
Number of tokens in the example: 2776
Number of tokens in the example: 3344
Number of tokens in the example: 2910
Number of tokens in the example: 4373
Number of tokens in the example: 3128
Number of tokens in the example: 2312
Number of tokens in the example: 2353
Number of tokens in the example: 3772
Number of tokens in the example: 2715
Number of tokens in the example: 2773
Number of tokens in the example: 2906
Number of tokens in the example: 2889
Number of tokens in the example: 2629
Number of tokens in the example: 2655
Number of tokens in the example: 3133
Number of tokens in the example: 2986
Number of tokens in the example: 2850
Number of tokens in the example: 3345
Number of tokens in the example: 3795
Number of tokens in the example: 2494
Number of tokens in the example: 3712
Number of tokens in the example: 4269
Number of tokens in the example: 3141
Number of tokens in the example: 4540
Number of tokens in the example: 3709
Number of tokens in the example: 3046
Number of tokens in the example: 2070
Number of tokens in the example: 2915
Number of tokens in the example: 3952
Number of tokens in the example: 2912
Number of tokens in the example: 2616
Number of tokens in the example: 1706
Number of tokens in the example: 3587
Number of tokens in the example: 2642
Number of tokens in the example: 2813
Number of tokens in the example: 2267
Number of tokens in the example: 2896
Number of tokens in the example: 2703
Number of tokens in the example: 1968
Number of tokens in the example: 2984
Number of tokens in the example: 2658
Number of tokens in the example: 1858
Number of tokens in the example: 3478
Number of tokens in the example: 3866
Number of tokens in the example: 2113
Number of tokens in the example: 3416
Number of tokens in the example: 2708
Number of tokens in the example: 3852
Number of tokens in the example: 2778
Number of tokens in the example: 3759
Number of tokens in the example: 2673
Number of tokens in the example: 2388
Number of tokens in the example: 1994
Number of tokens in the example: 2308
Number of tokens in the example: 3366
Number of tokens in the example: 3848
Number of tokens in the example: 3882
Number of tokens in the example: 4099
Number of tokens in the example: 3427
Number of tokens in the example: 3558
Number of tokens in the example: 3053
Number of tokens in the example: 3578
Number of tokens in the example: 3168
Number of tokens in the example: 4070
Number of tokens in the example: 2731
Number of tokens in the example: 2974
Number of tokens in the example: 3161
Number of tokens in the example: 3073
Number of tokens in the example: 3130
Number of tokens in the example: 2181
Number of tokens in the example: 2694
Number of tokens in the example: 3758
Number of tokens in the example: 4118
Number of tokens in the example: 2252
Number of tokens in the example: 2907
Number of tokens in the example: 2489
Number of tokens in the example: 3309
Number of tokens in the example: 4103
Number of tokens in the example: 2796
Number of tokens in the example: 4178
Number of tokens in the example: 2321
Number of tokens in the example: 2869
Number of tokens in the example: 2948
Number of tokens in the example: 3093
Number of tokens in the example: 2132
Number of tokens in the example: 2432
Number of tokens in the example: 2236
Number of tokens in the example: 4499
Number of tokens in the example: 2767
Number of tokens in the example: 2775
Number of tokens in the example: 2739
Number of tokens in the example: 4840
Number of tokens in the example: 2622
Number of tokens in the example: 2613
Number of tokens in the example: 3159
Number of tokens in the example: 3145
Number of tokens in the example: 2724
Number of tokens in the example: 3173
Number of tokens in the example: 4642
Number of tokens in the example: 2747
Number of tokens in the example: 2293
Number of tokens in the example: 2937
Number of tokens in the example: 2398
Number of tokens in the example: 4227
Number of tokens in the example: 3903
Number of tokens in the example: 2393
Number of tokens in the example: 2805
Number of tokens in the example: 2741
Number of tokens in the example: 2494
Number of tokens in the example: 3204
Number of tokens in the example: 2506
Number of tokens in the example: 2885
Number of tokens in the example: 3888
Number of tokens in the example: 3169
Number of tokens in the example: 2665
Number of tokens in the example: 2252
Number of tokens in the example: 2359
Number of tokens in the example: 5263
Number of tokens in the example: 2544
Number of tokens in the example: 2496
Number of tokens in the example: 2913
Number of tokens in the example: 2534
Number of tokens in the example: 3422
Number of tokens in the example: 2206
Number of tokens in the example: 4598
Number of tokens in the example: 1930
Number of tokens in the example: 3667
Number of tokens in the example: 3256
Number of tokens in the example: 3320
Number of tokens in the example: 2754
Number of tokens in the example: 3372
Number of tokens in the example: 3456
Number of tokens in the example: 1848
Number of tokens in the example: 3195
Number of tokens in the example: 2565
Number of tokens in the example: 1643
Number of tokens in the example: 3571
Number of tokens in the example: 3622
Number of tokens in the example: 3410
Number of tokens in the example: 2259
Number of tokens in the example: 2430
Number of tokens in the example: 2527
Number of tokens in the example: 3666
Number of tokens in the example: 3921
Number of tokens in the example: 3336
Number of tokens in the example: 3191
Number of tokens in the example: 3577
Number of tokens in the example: 2267
Number of tokens in the example: 2165
Number of tokens in the example: 2532
Number of tokens in the example: 2603
Number of tokens in the example: 6585
Number of tokens in the example: 2872
Number of tokens in the example: 3749
Number of tokens in the example: 3644
Number of tokens in the example: 2422
Number of tokens in the example: 3188
Number of tokens in the example: 2794
Number of tokens in the example: 2868
Number of tokens in the example: 3179
Number of tokens in the example: 2883
Number of tokens in the example: 2125
Number of tokens in the example: 2527
Training Epoch0: 100%|[34m██████████[0m| 200/200 [1:31:49<00:00, 27.55s/it]
step 150 is completed and loss is 0.5302682518959045
step 151 is completed and loss is 0.34032824635505676
step 152 is completed and loss is 0.40080389380455017
step 153 is completed and loss is 0.4567677676677704
step 154 is completed and loss is 0.47923365235328674
step 155 is completed and loss is 0.7013326287269592
step 156 is completed and loss is 0.40404069423675537
step 157 is completed and loss is 0.7576054930686951
step 158 is completed and loss is 0.5325603485107422
step 159 is completed and loss is 0.3279130756855011
step 160 is completed and loss is 0.40353286266326904
step 161 is completed and loss is 0.4903396666049957
step 162 is completed and loss is 0.5380808115005493
step 163 is completed and loss is 0.40743619203567505
step 164 is completed and loss is 0.34665772318840027
step 165 is completed and loss is 0.7498384118080139
step 166 is completed and loss is 0.4348480701446533
step 167 is completed and loss is 0.41689738631248474
step 168 is completed and loss is 0.37320438027381897
step 169 is completed and loss is 0.3259231746196747
step 170 is completed and loss is 0.7268881797790527
step 171 is completed and loss is 0.38324007391929626
step 172 is completed and loss is 0.3050919771194458
step 173 is completed and loss is 0.3997267484664917
step 174 is completed and loss is 0.3384326696395874
step 175 is completed and loss is 0.3962850570678711
step 176 is completed and loss is 0.4805029332637787
step 177 is completed and loss is 0.6313626766204834
step 178 is completed and loss is 0.3804188370704651
step 179 is completed and loss is 0.43282651901245117
step 180 is completed and loss is 0.49822109937667847
step 181 is completed and loss is 0.3005482852458954
step 182 is completed and loss is 0.4508446455001831
step 183 is completed and loss is 0.414814293384552
step 184 is completed and loss is 0.502862274646759
step 185 is completed and loss is 0.4567048251628876
step 186 is completed and loss is 0.5379153490066528
step 187 is completed and loss is 0.45657268166542053
step 188 is completed and loss is 0.37376996874809265
step 189 is completed and loss is 0.4772945046424866
step 190 is completed and loss is 0.5071732997894287
step 191 is completed and loss is 0.5304340720176697
step 192 is completed and loss is 0.3814086616039276
step 193 is completed and loss is 0.34316983819007874
step 194 is completed and loss is 0.37132182717323303
step 195 is completed and loss is 0.3427174389362335
step 196 is completed and loss is 0.4796711504459381
step 197 is completed and loss is 0.5609081983566284
step 198 is completed and loss is 0.31236982345581055
step 199 is completed and loss is 0.34784114360809326
Max CUDA memory allocated was 40 GB
Max CUDA memory reserved was 43 GB
Peak active CUDA memory was 40 GB
Cuda Malloc retires : 0
CPU Total Peak Memory consumed during the train (max): 6 GB
evaluating Epoch: 0%|[32m [0m| 0/50 [00:00<?, ?it/s]evaluating Epoch: 2%|[32m▏ [0m| 1/50 [00:05<04:37, 5.67s/it]evaluating Epoch: 4%|[32m▍ [0m| 2/50 [00:11<04:27, 5.58s/it]evaluating Epoch: 6%|[32m▌ [0m| 3/50 [00:16<04:20, 5.54s/it]evaluating Epoch: 8%|[32m▊ [0m| 4/50 [00:22<04:15, 5.55s/it]evaluating Epoch: 10%|[32m█ [0m| 5/50 [00:27<04:08, 5.53s/it]evaluating Epoch: 12%|[32m█▏ [0m| 6/50 [00:33<04:03, 5.52s/it]evaluating Epoch: 14%|[32m█▍ [0m| 7/50 [00:38<03:57, 5.52s/it]evaluating Epoch: 16%|[32m█▌ [0m| 8/50 [00:44<03:51, 5.51s/it]evaluating Epoch: 18%|[32m█▊ [0m| 9/50 [00:49<03:46, 5.52s/it]evaluating Epoch: 20%|[32m██ [0m| 10/50 [00:55<03:40, 5.52s/it]evaluating Epoch: 22%|[32m██▏ [0m| 11/50 [01:00<03:35, 5.52s/it]evaluating Epoch: 24%|[32m██▍ [0m| 12/50 [01:06<03:29, 5.52s/it]evaluating Epoch: 26%|[32m██▌ [0m| 13/50 [01:11<03:24, 5.52s/it]evaluating Epoch: 28%|[32m██▊ [0m| 14/50 [01:17<03:18, 5.51s/it]evaluating Epoch: 30%|[32m███ [0m| 15/50 [01:22<03:13, 5.52s/it]evaluating Epoch: 32%|[32m███▏ [0m| 16/50 [01:28<03:07, 5.52s/it]evaluating Epoch: 34%|[32m███▍ [0m| 17/50 [01:33<03:02, 5.52s/it]evaluating Epoch: 36%|[32m███▌ [0m| 18/50 [01:39<02:56, 5.52s/it]evaluating Epoch: 38%|[32m███▊ [0m| 19/50 [01:44<02:50, 5.52s/it]evaluating Epoch: 40%|[32m████ [0m| 20/50 [01:50<02:45, 5.52s/it]evaluating Epoch: 42%|[32m████▏ [0m| 21/50 [01:55<02:39, 5.51s/it]evaluating Epoch: 44%|[32m████▍ [0m| 22/50 [02:01<02:34, 5.52s/it]evaluating Epoch: 46%|[32m████▌ [0m| 23/50 [02:07<02:29, 5.52s/it]evaluating Epoch: 48%|[32m████▊ [0m| 24/50 [02:12<02:23, 5.51s/it]evaluating Epoch: 50%|[32m█████ [0m| 25/50 [02:18<02:17, 5.51s/it]evaluating Epoch: 52%|[32m█████▏ [0m| 26/50 [02:23<02:12, 5.52s/it]evaluating Epoch: 54%|[32m█████▍ [0m| 27/50 [02:29<02:06, 5.52s/it]evaluating Epoch: 56%|[32m█████▌ [0m| 28/50 [02:34<02:01, 5.52s/it]evaluating Epoch: 58%|[32m█████▊ [0m| 29/50 [02:40<01:55, 5.51s/it]evaluating Epoch: 60%|[32m██████ [0m| 30/50 [02:45<01:50, 5.52s/it]evaluating Epoch: 62%|[32m██████▏ [0m| 31/50 [02:51<01:44, 5.52s/it]evaluating Epoch: 64%|[32m██████▍ [0m| 32/50 [02:56<01:39, 5.51s/it]evaluating Epoch: 66%|[32m██████▌ [0m| 33/50 [03:02<01:33, 5.52s/it]evaluating Epoch: 68%|[32m██████▊ [0m| 34/50 [03:07<01:28, 5.51s/it]evaluating Epoch: 70%|[32m███████ [0m| 35/50 [03:13<01:22, 5.51s/it]evaluating Epoch: 72%|[32m███████▏ [0m| 36/50 [03:18<01:17, 5.52s/it]evaluating Epoch: 74%|[32m███████▍ [0m| 37/50 [03:24<01:11, 5.52s/it]evaluating Epoch: 76%|[32m███████▌ [0m| 38/50 [03:29<01:06, 5.52s/it]evaluating Epoch: 78%|[32m███████▊ [0m| 39/50 [03:35<01:00, 5.52s/it]evaluating Epoch: 80%|[32m████████ [0m| 40/50 [03:40<00:55, 5.52s/it]evaluating Epoch: 82%|[32m████████▏ [0m| 41/50 [03:46<00:49, 5.53s/it]evaluating Epoch: 84%|[32m████████▍ [0m| 42/50 [03:51<00:44, 5.52s/it]evaluating Epoch: 86%|[32m████████▌ [0m| 43/50 [03:57<00:38, 5.52s/it]evaluating Epoch: 88%|[32m████████▊ [0m| 44/50 [04:02<00:33, 5.52s/it]evaluating Epoch: 90%|[32m█████████ [0m| 45/50 [04:08<00:27, 5.52s/it]evaluating Epoch: 92%|[32m█████████▏[0m| 46/50 [04:13<00:22, 5.52s/it]evaluating Epoch: 94%|[32m█████████▍[0m| 47/50 [04:19<00:16, 5.52s/it]evaluating Epoch: 96%|[32m█████████▌[0m| 48/50 [04:24<00:11, 5.51s/it]evaluating Epoch: 98%|[32m█████████▊[0m| 49/50 [04:30<00:05, 5.51s/it]evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:35<00:00, 5.51s/it]Number of tokens in the example: 2609
Number of tokens in the example: 2416
Number of tokens in the example: 3098
Number of tokens in the example: 5148
Number of tokens in the example: 2243
Number of tokens in the example: 2973
Number of tokens in the example: 1823
Number of tokens in the example: 2652
Number of tokens in the example: 3725
Number of tokens in the example: 2306
Number of tokens in the example: 3866
Number of tokens in the example: 3906
Number of tokens in the example: 2912
Number of tokens in the example: 3901
Number of tokens in the example: 2799
Number of tokens in the example: 3804
Number of tokens in the example: 2601
Number of tokens in the example: 3474
Number of tokens in the example: 2322
Number of tokens in the example: 3776
Number of tokens in the example: 2793
Number of tokens in the example: 2844
Number of tokens in the example: 3203
Number of tokens in the example: 2372
Number of tokens in the example: 2456
Number of tokens in the example: 3333
Number of tokens in the example: 2513
Number of tokens in the example: 2733
Number of tokens in the example: 2692
Number of tokens in the example: 1843
Number of tokens in the example: 2805
Number of tokens in the example: 3050
Number of tokens in the example: 2846
Number of tokens in the example: 2476
Number of tokens in the example: 2573
Number of tokens in the example: 3750
Number of tokens in the example: 2651
Number of tokens in the example: 2776
Number of tokens in the example: 3344
Number of tokens in the example: 2910
Number of tokens in the example: 4373
Number of tokens in the example: 3128
Number of tokens in the example: 2312
Number of tokens in the example: 2353
Number of tokens in the example: 3772
Number of tokens in the example: 2715
Number of tokens in the example: 2773
Number of tokens in the example: 2906
Number of tokens in the example: 2889
Number of tokens in the example: 2629
evaluating Epoch: 100%|[32m██████████[0m| 50/50 [04:36<00:00, 5.52s/it]
eval_ppl=tensor(1.5013, device='cuda:0') eval_epoch_loss=tensor(0.4063, device='cuda:0')
we are about to save the PEFT modules
PEFT modules are saved in FT-vicuna-13b-v1.5-16k directory
best eval loss on epoch 0 is 0.4063194990158081
Epoch 1: train_perplexity=1.6099, train_epoch_loss=0.4761, epcoh time 5509.670249344s
Training Epoch1: 0%|[34m [0m| 0/200 [00:00<?, ?it/s]Training Epoch1: 0%|[34m [0m| 1/200 [00:27<1:30:42, 27.35s/it]Training Epoch1: 1%|[34m [0m| 2/200 [00:54<1:29:59, 27.27s/it]Training Epoch1: 2%|[34m▏ [0m| 3/200 [01:21<1:29:16, 27.19s/it]Training Epoch1: 2%|[34m▏ [0m| 4/200 [01:48<1:28:56, 27.23s/it]Training Epoch1: 2%|[34m▎ [0m| 5/200 [02:16<1:28:55, 27.36s/it]Training Epoch1: 3%|[34m▎ [0m| 6/200 [02:44<1:28:45, 27.45s/it]Training Epoch1: 4%|[34m▎ [0m| 7/200 [03:11<1:28:22, 27.47s/it]Training Epoch1: 4%|[34m▍ [0m| 8/200 [03:39<1:28:03, 27.52s/it]Training Epoch1: 4%|[34m▍ [0m| 9/200 [04:06<1:27:42, 27.55s/it]Training Epoch1: 5%|[34m▌ [0m| 10/200 [04:34<1:27:10, 27.53s/it]Training Epoch1: 6%|[34m▌ [0m| 11/200 [05:01<1:26:45, 27.54s/it]Training Epoch1: 6%|[34m▌ [0m| 12/200 [05:29<1:26:13, 27.52s/it]Training Epoch1: 6%|[34m▋ [0m| 13/200 [05:56<1:25:43, 27.51s/it]Training Epoch1: 7%|[34m▋ [0m| 14/200 [06:24<1:25:12, 27.49s/it]Training Epoch1: 8%|[34m▊ [0m| 15/200 [06:51<1:24:43, 27.48s/it]Training Epoch1: 8%|[34m▊ [0m| 16/200 [07:19<1:24:13, 27.47s/it]Training Epoch1: 8%|[34m▊ [0m| 17/200 [07:46<1:23:40, 27.44s/it]Number of tokens in the example: 2609
Number of tokens in the example: 2416
Number of tokens in the example: 3098
Number of tokens in the example: 5148
Number of tokens in the example: 2243
Number of tokens in the example: 2973
Number of tokens in the example: 1823
Number of tokens in the example: 2652
Number of tokens in the example: 3725
Number of tokens in the example: 2306
Number of tokens in the example: 3866
Number of tokens in the example: 3906
Number of tokens in the example: 2912
Number of tokens in the example: 3901
Number of tokens in the example: 2799
Number of tokens in the example: 3804
Number of tokens in the example: 2601
Number of tokens in the example: 3474
Number of tokens in the example: 2322
Number of tokens in the example: 3776
Training Epoch1: 8%|[34m▊ [0m| 17/200 [07:49<1:24:10, 27.60s/it]
step 0 is completed and loss is 0.3852332830429077
step 1 is completed and loss is 0.3781697750091553
step 2 is completed and loss is 0.34786373376846313
step 3 is completed and loss is 0.31446903944015503
step 4 is completed and loss is 0.3743688762187958
step 5 is completed and loss is 0.5165154933929443
step 6 is completed and loss is 0.6102421879768372
step 7 is completed and loss is 0.6805212497711182
step 8 is completed and loss is 0.39271870255470276
step 9 is completed and loss is 0.40273287892341614
step 10 is completed and loss is 0.45219162106513977
step 11 is completed and loss is 0.4930455684661865
step 12 is completed and loss is 0.31844985485076904
step 13 is completed and loss is 0.3095240294933319
step 14 is completed and loss is 0.3343360424041748
step 15 is completed and loss is 0.56951904296875
step 16 is completed and loss is 0.4611594080924988
Traceback (most recent call last):
File "/root/llama-recipes/llama_finetuning.py", line 253, in <module>
fire.Fire(main)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/llama-recipes/llama_finetuning.py", line 236, in main
results = train(
File "/root/llama-recipes/utils/train_utils.py", line 93, in train
loss = model(**batch).loss
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/peft/peft_model.py", line 931, in forward
return self.base_model(
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 690, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 686, in custom_forward
return module(*inputs, past_key_value, output_attentions)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 368, in forward
attn_output = self.o_proj(attn_output)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 303, in forward
CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold)
File "/root/miniconda3/envs/llama_ft/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1634, in double_quant
nnz = nnz_row_ptr[-1].item()
KeyboardInterrupt