Skip to content

ristea/septr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SepTr: Separable Transformer for Audio Spectrogram Processing @ INTERSPEECH 2022 (official repository)

We propose the Separable Transformer (SepTr), an architecture that employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval.

The original paper is available at: https://www.isca-speech.org/archive/interspeech_2022/ristea22_interspeech.html

The arxiv version is avalable at: https://arxiv.org/pdf/2203.09581.pdf

This code is released under the CC BY-SA 4.0 license.


map


Information

Our architecture does not impose a certain axis (time or frequency) for the first transformer block, being flexible in this regard. Without loss of generality, in the above Figure, we illustrate a model that separates the tokens along the time axis first. Our separable transformer block can be repeated L times to increase the depth of the architecture. The final prediction of our model is made by the MLP block.

Implementation

We implemented the model in PyTorch and provide all scripts to run our architecture.

In order to make it work properly, we recommend a python version newer than 3.6

We used the python 3.6.8 version.

Cite us

@inproceedings{Ristea-INTERSPEECH-2022,
  title={SepTr: Separable Transformer for Audio Spectrogram Processing},
  author={Ristea, Nicolae-Catalin and Ionescu, Radu Tudor and Khan, Fahad Shahbaz},
  year={2022},
  booktitle={Proceedings of INTERSPEECH},
  pages={4103--4107},
  doi={10.21437/Interspeech.2022-249}
}

Related Projects

pytorch-vit

You can send your questions or suggestions to:

r.catalin196@yahoo.ro, raducu.ionescu@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages