Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve KimCNN results #183

Open
daemon opened this issue Mar 10, 2019 · 1 comment
Open

Improve KimCNN results #183

daemon opened this issue Mar 10, 2019 · 1 comment

Comments

@daemon
Copy link
Member

daemon commented Mar 10, 2019

One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:

  • Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
  • Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
  • Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
  • Kim used the dev loss as the criterion for model selection. This is a difference in Castor.

After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 multichannel now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.

Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py

@daemon
Copy link
Member Author

daemon commented Mar 10, 2019

Seems like I spoke too early. Results fluctuate from high 85s to 87s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant