Skip to content

Latest commit

 

History

History
 
 

experiments

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Working with the Stability cluster

We currently run our large scale experiments on the Stability AI HPC cluster. This subdirectory features a few helpful scripts that can help you get up and running on the cluster.

If you believe you need access to the cluster for your work please reach out to the core team on Discord.

  1. Install Miniconda - installs miniconda for your cluster environment.

  2. Create Environment - creates a basic conda environment for experiments.

    • Creates a conda environment at the prefix CONDA_ENV_PATH path.

      Using the positional argument passed into the script

    • Clones chemnlp into your personal cluster USER directory.
    • Installs the current revision of the chemnlp repository and dependencies that are in your personal directory into the conda environment.
    # general case
    source experiments/scripts/stability-cluster/env_creation.sh where/to/store/conda where/to/build/conda/from/
    
    # for creating a personal environment
    source experiments/scripts/stability-cluster/env_creation.sh jack/ jack/
  3. Running Experiment - runs a GPT-NeoX training pipeline

    • creates a conda environment using the env_creation.sh script.
    • runs the GPT-NeoX train.py script using the user configuration

      as GPT-NeoX configurations can be combined, the PEFT configurations are held separately to the full model training and cluster configurations

    # general case
    sbatch experiments/scripts/stability-cluster/sbatch_run.sh where/to/store/conda where/to/build/conda/from/ <cluster-config-name.yml> <training-config-names.yml>
    
    # for typical small model finetuning experiments
    sbatch experiments/scripts/stability-cluster/sbatch_run.sh experiments/my-experiment jack cluster_setup.yml 160M.yml
    
    # for typical small model soft-prompt experiments
    sbatch experiments/scripts/stability-cluster/sbatch_run.sh experiments/my-experiment jack cluster_setup.yml 160M.yml soft_prompt.yml

    To interact with WandB services you need to authenticate yourself as per the Stability HPC guidelines to append a username + password to your .netrc file.