-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining Hyperparameters #3
Comments
You could try the hyperparameters in this repo. |
Thank you for your response! I also wanted to confirm if the pre-training in this work follows the two-step approach similar to original BERT paper and NVIDIA/BERT . In those approaches, 90% of the training steps are done with a sequence length of 128 (phase 1), and the remaining 10% with a sequence length of 512 (phase 2). However, in the pre-training script provided in the PlugLM repository, I noticed a phase 2 pre-training with Could you please clarify if phase 1 pre-training is conducted in this work, and the time cost for total pre-training process? I appreciate your assistance! |
All the baselines and PlugLM are pre-trained with only stage-2. |
Thanks for your kindly reply. I have another question, do you remember the training time of pre-training stage using 8 a100 gpus? |
Sorry for bothering again, I want to make sure I'm using the right knowledge corpus for AMAZON reviews. According to your Again, really thanks for your time to answer my question. |
Hi,
|
Really thanks for you reply! According to this issue it seems all the corpus data should be download from their original source. Sorry to bother but I have another question again. The PubMed dataset link you provided on their GitHub page they offers three options for the PubMed dataset. Could you kindly specify which one among those links was used as the knowledge base for the in-domain pretraining task? Furthermore, I'm curious if any preprocessing was conducted on the downloaded raw data. Thanks for clarifying all this for me! |
Hi, sorry for the late reply. Been busy recently. If I remember it correctly:
|
Hi, thanks again for replying, it solves my question. I'm wondering what fine-tuning step you take in all the downstream tasks. I can only find Thanks again for helping me! |
Hi, for other tasks other than classification, you could write your own because you can simply take PlugLM as a BERT with same interface for downstream tasks. For biomed relevant tasks, you could refer to this: https://github.com/dmis-lab/biobert |
Hi, |
Hi, thanks for the nice work.
I'm trying to reproduce paper's result but notice that the hyperparameter you provide in this repositary (by pretraining script, config.json ) is a little different from your paper (ex : learning rate, gradient accumulation steps). I'm wondering which version should be used to reproduce the paper result, and which version of hyperparameter you use to get the checkpoint you provide?
Thanks for the reading!
The text was updated successfully, but these errors were encountered: