Local Softmax parallelize the softmax computation by splitting the tensor into smaller sub-tensors and applying the softmax function on each of these smaller tensors independently. In other words, we want to compute a "local" softmax on each chunk of the tensor, instead of on the entire tensor.
- Lucidrains
- Agorians
pip install local-sfmx
import torch
from local_sfmx import local_softmax
tensor = torch.rand(10, 5)
result = local_softmax(tensor, 2)
print(result)
function LocalSoftmax(tensor, num_chunks):
split tensors into num_chunks
smaller tensors
for each smaller tensor:
apply standard softmax
concatenate the results
return concatenated tensor
MIT