Pretraining Hyperparameters #3

wormyu · 2023-07-24T18:47:19Z

Hi, thanks for the nice work.

I'm trying to reproduce paper's result but notice that the hyperparameter you provide in this repositary (by pretraining script, config.json ) is a little different from your paper (ex : learning rate, gradient accumulation steps). I'm wondering which version should be used to reproduce the paper result, and which version of hyperparameter you use to get the checkpoint you provide?

Thanks for the reading!

Hannibal046 · 2023-07-25T02:41:19Z

You could try the hyperparameters in this repo.

wormyu · 2023-07-25T06:44:47Z

Thank you for your response!

I also wanted to confirm if the pre-training in this work follows the two-step approach similar to original BERT paper and NVIDIA/BERT . In those approaches, 90% of the training steps are done with a sequence length of 128 (phase 1), and the remaining 10% with a sequence length of 512 (phase 2). However, in the pre-training script provided in the PlugLM repository, I noticed a phase 2 pre-training with max_train_step=8000, but there was no explicit mention of phase 1 pre-training.

Could you please clarify if phase 1 pre-training is conducted in this work, and the time cost for total pre-training process? I appreciate your assistance!

Hannibal046 · 2023-07-26T10:46:05Z

All the baselines and PlugLM are pre-trained with only stage-2.

wormyu · 2023-08-02T12:57:14Z

Thanks for your kindly reply. I have another question, do you remember the training time of pre-training stage using 8 a100 gpus?

wormyu · 2023-08-02T21:40:00Z

Sorry for bothering again, I want to make sure I'm using the right knowledge corpus for AMAZON reviews. According to your README.md the amazon review dataset should be download using huggingface datasets, but there are several dataset relevant to amazon reviews on it, is this the one you use in domain adaptation task? Or did you download from https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html ?

Again, really thanks for your time to answer my question.

Hannibal046 · 2023-08-04T08:11:51Z

Hi,

if i remember it correctly, the training time of pre-training stage takes around 20 days in one single node.
for domain corpus, please follow the instruction from don't stop pre-training paper.

wormyu · 2023-08-06T19:23:48Z

Really thanks for you reply! According to this issue it seems all the corpus data should be download from their original source.

Sorry to bother but I have another question again. The PubMed dataset link you provided on their GitHub page they offers three options for the PubMed dataset. Could you kindly specify which one among those links was used as the knowledge base for the in-domain pretraining task? Furthermore, I'm curious if any preprocessing was conducted on the downloaded raw data.

Thanks for clarifying all this for me!

Hannibal046 · 2023-08-11T06:08:42Z

Hi, sorry for the late reply. Been busy recently. If I remember it correctly:

There are some license problems in DAPT dataset, and we just use the public available one, not in-house data.
We used this version: PubMed Central Full Texts .

wormyu · 2023-08-21T17:26:49Z

Hi, thanks again for replying, it solves my question.

I'm wondering what fine-tuning step you take in all the downstream tasks. I can only find run_classification.py will take 10 epochs in the script you provide, but for other tasks, I can't find relevant information in the README.md file or the paper. Can you give me some hints about this? Maybe I miss some parts of the code.

Thanks again for helping me!

Hannibal046 · 2023-08-26T08:40:57Z

Hi, for other tasks other than classification, you could write your own because you can simply take PlugLM as a BERT with same interface for downstream tasks. For biomed relevant tasks, you could refer to this: https://github.com/dmis-lab/biobert

wormyu · 2023-08-27T14:21:00Z

Hi,
Thanks for replying and sorry for my misleading question, my question is "how many" fine-tuning steps do you take, not "what" fine-tuning steps. Because I'm trying to compare your model performance in the paper with mine, it only makes sense when comparing under the same training parameters.
I know you have kindly shared Python files for other downstream tasks, and thanks for clarifying the biomed relevant task source for me, I appreciate it a lot!

wormyu changed the title ~~Pretraining Hyperparameter~~ Pretraining Hyperparameters Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining Hyperparameters #3

Pretraining Hyperparameters #3

wormyu commented Jul 24, 2023

Hannibal046 commented Jul 25, 2023

wormyu commented Jul 25, 2023

Hannibal046 commented Jul 26, 2023

wormyu commented Aug 2, 2023

wormyu commented Aug 2, 2023

Hannibal046 commented Aug 4, 2023

wormyu commented Aug 6, 2023

Hannibal046 commented Aug 11, 2023

wormyu commented Aug 21, 2023

Hannibal046 commented Aug 26, 2023

wormyu commented Aug 27, 2023

Pretraining Hyperparameters #3

Pretraining Hyperparameters #3

Comments

wormyu commented Jul 24, 2023

Hannibal046 commented Jul 25, 2023

wormyu commented Jul 25, 2023

Hannibal046 commented Jul 26, 2023

wormyu commented Aug 2, 2023

wormyu commented Aug 2, 2023

Hannibal046 commented Aug 4, 2023

wormyu commented Aug 6, 2023

Hannibal046 commented Aug 11, 2023

wormyu commented Aug 21, 2023

Hannibal046 commented Aug 26, 2023

wormyu commented Aug 27, 2023