Open
Description
Hi thanks for sharing the excellent work. I wonder how the feature re-use is performed. As mentioned in your paper, feature reuse brings a huge gain, but I can not find the corresponding code for this part. From the paper, I am not sure if the feature re-use is performed just before self-attention or just before FFN.
Metadata
Assignees
Labels
No labels