Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining Hyperparameters #3

Open
wormyu opened this issue Jul 24, 2023 · 11 comments
Open

Pretraining Hyperparameters #3

wormyu opened this issue Jul 24, 2023 · 11 comments

Comments

@wormyu
Copy link

wormyu commented Jul 24, 2023

Hi, thanks for the nice work.

I'm trying to reproduce paper's result but notice that the hyperparameter you provide in this repositary (by pretraining script, config.json ) is a little different from your paper (ex : learning rate, gradient accumulation steps). I'm wondering which version should be used to reproduce the paper result, and which version of hyperparameter you use to get the checkpoint you provide?

Thanks for the reading!

@wormyu wormyu changed the title Pretraining Hyperparameter Pretraining Hyperparameters Jul 24, 2023
@Hannibal046
Copy link
Owner

You could try the hyperparameters in this repo.

@wormyu
Copy link
Author

wormyu commented Jul 25, 2023

Thank you for your response!

I also wanted to confirm if the pre-training in this work follows the two-step approach similar to original BERT paper and NVIDIA/BERT . In those approaches, 90% of the training steps are done with a sequence length of 128 (phase 1), and the remaining 10% with a sequence length of 512 (phase 2). However, in the pre-training script provided in the PlugLM repository, I noticed a phase 2 pre-training with max_train_step=8000, but there was no explicit mention of phase 1 pre-training.

Could you please clarify if phase 1 pre-training is conducted in this work, and the time cost for total pre-training process? I appreciate your assistance!

@Hannibal046
Copy link
Owner

All the baselines and PlugLM are pre-trained with only stage-2.

@wormyu
Copy link
Author

wormyu commented Aug 2, 2023

Thanks for your kindly reply. I have another question, do you remember the training time of pre-training stage using 8 a100 gpus?

@wormyu
Copy link
Author

wormyu commented Aug 2, 2023

Sorry for bothering again, I want to make sure I'm using the right knowledge corpus for AMAZON reviews. According to your README.md the amazon review dataset should be download using huggingface datasets, but there are several dataset relevant to amazon reviews on it, is this the one you use in domain adaptation task? Or did you download from https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html ?

Again, really thanks for your time to answer my question.

@Hannibal046
Copy link
Owner

Hi,

  • if i remember it correctly, the training time of pre-training stage takes around 20 days in one single node.
  • for domain corpus, please follow the instruction from don't stop pre-training paper.

@wormyu
Copy link
Author

wormyu commented Aug 6, 2023

Really thanks for you reply! According to this issue it seems all the corpus data should be download from their original source.

Sorry to bother but I have another question again. The PubMed dataset link you provided on their GitHub page they offers three options for the PubMed dataset. Could you kindly specify which one among those links was used as the knowledge base for the in-domain pretraining task? Furthermore, I'm curious if any preprocessing was conducted on the downloaded raw data.

Thanks for clarifying all this for me!

@Hannibal046
Copy link
Owner

Hi, sorry for the late reply. Been busy recently. If I remember it correctly:

  • There are some license problems in DAPT dataset, and we just use the public available one, not in-house data.
  • We used this version: PubMed Central Full Texts .

@wormyu
Copy link
Author

wormyu commented Aug 21, 2023

Hi, thanks again for replying, it solves my question.

I'm wondering what fine-tuning step you take in all the downstream tasks. I can only find run_classification.py will take 10 epochs in the script you provide, but for other tasks, I can't find relevant information in the README.md file or the paper. Can you give me some hints about this? Maybe I miss some parts of the code.

Thanks again for helping me!

@Hannibal046
Copy link
Owner

Hi, for other tasks other than classification, you could write your own because you can simply take PlugLM as a BERT with same interface for downstream tasks. For biomed relevant tasks, you could refer to this: https://github.com/dmis-lab/biobert

@wormyu
Copy link
Author

wormyu commented Aug 27, 2023

Hi,
Thanks for replying and sorry for my misleading question, my question is "how many" fine-tuning steps do you take, not "what" fine-tuning steps. Because I'm trying to compare your model performance in the paper with mine, it only makes sense when comparing under the same training parameters.
I know you have kindly shared Python files for other downstream tasks, and thanks for clarifying the biomed relevant task source for me, I appreciate it a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants