Skip to content

Commit

Permalink
ok i tried bringing back original init again and this time it makes a…
Browse files Browse the repository at this point in the history
… ton of difference and works much better than default. i'm not sure what was different with my earlier experiment where i saw a slight regression. may try to dissect commits later, for now merged the original mingpt init (following gpt-2 paper) as default.
  • Loading branch information
karpathy committed Jan 27, 2023
1 parent 23a0bfa commit f29a9ff
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,6 @@ Features / APIs

Suspiciousness

- Current initialization (PyTorch default) departs from GPT-2. In a very quick experiment I found it to be superior to the one suggested in the papers, but that can't be right?
- I am still not 100% confident that my GPT-2 small reproduction hyperparameters are good, if someone has reproduced GPT-2 I'd be eager to exchange notes ty
- I keep seeing different values cited for weight decay and AdamW betas, look into
- I can't exactly reproduce Chinchilla paper results, see [scaling_laws.ipynb](scaling_laws.ipynb) notebook
Expand Down

0 comments on commit f29a9ff

Please sign in to comment.