[v2 QUESTION]: test dataset of ensemble #1130

jia-huang · 2024-12-19T15:10:01Z

When using the following commands for training, is the test set the same for each model? If it is the same, how can we adjust the settings to ensure that each model's test set is different?

chemprop train --data-path FDA-smiles.csv --task-type classification --output-dir FDA_checkpoints_rdkit_2d-hpopt-20ensembl-8 --molecule-featurizers v1_rdkit_2d_normalized --no-descriptor-scaling --ensemble-size 20 --config-path best_config_zidong.toml --epochs 30

Additionally, how can we configure the settings to implement 20-fold cross-validation during training?

shihchengli · 2024-12-19T15:57:07Z

If you use --ensemble-size, it trains 20 models on the same data split. To run 20-fold cross-validation, you need to create a JSON file with 20 splits on your own (an example can be found here).

jia-huang added the question Further information is requested label Dec 19, 2024

shihchengli closed this as completed Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2 QUESTION]: test dataset of ensemble #1130

[v2 QUESTION]: test dataset of ensemble #1130

jia-huang commented Dec 19, 2024

shihchengli commented Dec 19, 2024

[v2 QUESTION]: test dataset of ensemble #1130

[v2 QUESTION]: test dataset of ensemble #1130

Comments

jia-huang commented Dec 19, 2024

shihchengli commented Dec 19, 2024