Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Batch Normalization with learnable gamma and beta #126

Open
Hitomamacs opened this issue Jan 5, 2025 — with Huly for GitHub · 0 comments
Open

Proper Batch Normalization with learnable gamma and beta #126

Hitomamacs opened this issue Jan 5, 2025 — with Huly for GitHub · 0 comments

Comments

Copy link
Contributor

Hitomamacs commented Jan 5, 2025

Feature Description

A fully-fledged Batch Normalization (BN) layer that follows the standard approach in deep learning. This layer should include learnable parameters (gamma, beta), running statistics (running mean, running variance), and the correct forward/backward passes with mean/variance computation.

Problem Statement

Currently, we only have a min-max normalization layer, which is insufficient for most neural network architectures that rely on BN’s ability to stabilize training by normalizing activations to zero mean and unit variance. Min-max normalization does not address batch variance nor learnable scaling factors, which are crucial for many deep-learning frameworks.

Suggested Solution

Implement a Batch Normalization layer that:

  1. Maintains Learnable Parameters:
    • Gamma ((\gamma)): A scale parameter.
    • Beta ((\beta)): A shift parameter.
  2. Tracks Running Statistics:
    • Running Mean: An exponential moving average of each feature’s mean during training.
    • Running Variance: An exponential moving average of each feature’s variance.
  3. Computes Forward Pass:
    • Training Mode: Compute the current batch’s mean and variance, normalize input, and update running statistics.
    • Inference Mode: Use running mean and variance to normalize.
  4. Implements Backward Pass:
    • Correctly compute gradients w.r.t. the inputs and the learnable parameters ((\gamma) and (\beta)), typically via chain rule involving partial derivatives of the normalization formula.

Additional Context

  • BN often accelerates convergence and improves stability of deep networks.
  • Proper dimension checks should be enforced, ensuring the last dimension (or a specified dimension) matches the number of features.
  • We may use an additional flag (e.g., is_training) to switch between training and inference modes.
  • The layer’s interface should be consistent with existing layers, supporting init, forward, backward, and possibly separate functions or arguments for updating (\gamma) and (\beta).

Huly®: X_PI-158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant