Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer integration #83

Merged
merged 5 commits into from
Dec 27, 2021
Merged

Layer integration #83

merged 5 commits into from
Dec 27, 2021

Conversation

kurisusnowdeng
Copy link
Member

@kurisusnowdeng kurisusnowdeng commented Dec 21, 2021

  • Integrated parallel layers for ease of building models
  • Renamed few apis
  • Reworked metric (e.g. accuracy) and moved it into colossalai.nn
  • Added throughput hook, log metric by step hook
  • Reworked initialization
  • Reworked model_zoo, with only one implementation for each model now.
  • Added benchmark, including cifar10 and imagenet100 training scripts
  • Updated related unit tests by replacing models from config with models in model_zoo
  • Tested all parallel layers and cifar10 convergence

@FrankLeeeee FrankLeeeee self-requested a review December 21, 2021 05:19
@FrankLeeeee
Copy link
Contributor

@kurisusnowdeng The checks are not passed

@FrankLeeeee FrankLeeeee added the enhancement New feature or request label Dec 21, 2021
@kurisusnowdeng kurisusnowdeng requested review from FrankLeeeee and removed request for FrankLeeeee December 21, 2021 13:33
@kurisusnowdeng kurisusnowdeng changed the title Model integration Layer integration Dec 21, 2021
@FrankLeeeee FrankLeeeee requested review from FrankLeeeee and removed request for FrankLeeeee December 22, 2021 05:33
@kurisusnowdeng kurisusnowdeng requested review from FrankLeeeee and removed request for FrankLeeeee December 22, 2021 07:04
@FrankLeeeee
Copy link
Contributor

FrankLeeeee commented Dec 24, 2021

Hi @kurisusnowdeng , I have reviewed your code and some issues remain.

benchmark/cifar/configs/vit_1d.py

  1. the correct field name for gradient_clipping is clip_grad_norm. This applies to all other config files.

colossalai/nn/init.py

  1. seems fan_in and fan_out can be directly obtained from the weight shape, they should be hidden from the user.
  2. fan_in should not be set to None as default because it cannot be default in the assert statement in lecun initialization.
def lecun_uniform_():
    # adapted from jax.nn.initializers
    def initializer(tensor: Tensor, fan_in: int = None, fan_out: int = None):
        assert fan_in is not None, 'Fan_in is not provided.'

        var = 1.0 / fan_in
        bound = math.sqrt(3 * var)
        return nn.init.uniform_(tensor, -bound, bound)

    return initializer


def lecun_normal_():
    # adapted from jax.nn.initializers
    def initializer(tensor: Tensor, fan_in: int = None, fan_out: int = None):
        assert fan_in is not None, 'Fan_in is not provided.'

        std = math.sqrt(1.0 / fan_in)
        return nn.init.trunc_normal_(tensor, std=std / .87962566103423978)

    return initializer

colossalai/nn/layer/__init__.py

  1. from line 6 to line 24, why are all the modules imported?
  2. Are there reasons for setting the default initializer? e.g. kaiming for weight and xavier for bias. The user is not aware of our setting and my suggestion would be to fall back to default torch initialization.
  3. I would suggest putting the unified layers into a single file (e.g. unified_layer.py) instead of __init__.py as __init__.py serves for intialization/configuration purpose.

colossalai/nn/layer/parallel_1d/_utils.py

  1. the utils functions here are for general use, not specific for 1D, I don’t think they should be put under 1D utils.

colossalai/nn/metric/accuracy_2d.py

  1. accuracy is still using autograd function and this applies to other metrics as well.

@kurisusnowdeng
Copy link
Member Author

Fixed. Please check again. @FrankLeeeee

@kurisusnowdeng kurisusnowdeng merged commit 0fedef4 into hpcaitech:main Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants