Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added smooth_initialization option to NaiveFourierKANLayer #4

Merged
merged 1 commit into from
May 24, 2024

Conversation

JeremyIV
Copy link

With the default initialization scheme for fouriercoeffs, all frequencies draw their coefficients from the same distribution. This means that as gridsize becomes large, there is more and more contribution from the high frequencies, making KAN's initial scalar functions very high-frequency. In these high-frequency functions, the output values for nearby inputs are uncorrelated. This means that the initial KAN function is highly "scrambled"; and cannot "unscramble" itself during training.

For example, here is a KAN with 3 layers, 10 hidden units, and a grid size of 120 trained to encode an image using the coordinate network paradigm, for example see SIREN

Target image

image

With the default initialization,

Before training:

image

after training:

image

With smooth initialization,

Before training:

image

after training:
image

@unrealwill
Copy link
Collaborator

Thanks for the Pull Request.
A few thoughts/notes for myself :

  • Which camel case convention policy ?
  • Enforcing grid_norm_factor to be a 4D tensor with shape (1,1,1,gridsize) to avoid error when shuffling dimensions
  • Which type of noise is best for initialization : smooth_initialization is brownian noise, maybe try various 1/f^alpha noise.
  • Is the unit mean and variance preserved for various type of input noises, are there missing constants ?
  • Making smooth_initialization=True the default ? (breaking change for existing users vs good default for new users)

I am merging the pull request.
I'll add some line in Readme to explain this new parameter.

One usual way of dealing with Fourier higher frequency terms, is adding a regularization term which penalize the higher frequencies in the way you want. The merit of that being that the function will be enforced smoothed as training progresses, and not just at initialization.

One thing to study is probably how well is the frequency noise type preserved or changed during training.

@unrealwill unrealwill merged commit 3646754 into GistNoesis:main May 24, 2024
@JeremyIV
Copy link
Author

Thanks for merging! Here are some quick sloppy experiments in response to your comments:

  • re: different noise spectra, here's a tiny hyperparameter sweep of different alpha values for my coordinate network problem, which suggests alpha=1.5 may be slightly better than 2, but many more experiments would be needed to make a conclusive choice:

image

  • For the current smooth initialization with alpha=2, you can verify experimentally that the initial layer appears to preserve mean 0 variance 1. I have not tried to confirm this mathematically.
  • With smooth_initialization, for any alpha value, the spectral power density does not appear to change much during training (initial and final plots overlap):
    image
  • Whereas With the default initialization, the last layer learns to reduce the power of the high frequencies, but the previous layers do not:
    image

Regularization

I tried the default initialization with L2 regularization of the fourier coefficients, weighted by f^alpha, for alpha=0,0.5,1,1,5,2,2.5

image

And here is the power spectra before and after training with alpha=1.5:
image

@unrealwill
Copy link
Collaborator

Thanks a lot for doing some experiments.

In the KAN paper, they mention doing their experiment with LBFGS, hinting at a second order method.

FourierKAN use cos and sin (Cinf functions), so it can probably benefit from using second order optimizer to take advantage of the curvature.

Something like hessian-free optimization (something like https://github.com/fmeirinhos/pytorch-hessianfree (author warning "Not fully tested, use with caution!") ) should do the trick, and help distinguish optimizing issues from model expressiveness.

Also standard general neural network architecture tricks like resnet, and normalization should also help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants