You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:
Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
Kim used the dev loss as the criterion for model selection. This is a difference in Castor.
After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 multichannel now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.
One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:
After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2
multichannel
now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py
The text was updated successfully, but these errors were encountered: