Use generation probability of pre-trained language models to model commonsense reasoning.
Unofficial implementation of the paper Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning.
No finetuning: 75.0
Finetune with margin loss (margin=0.5): 91.6