Skip to content
/ Barcha Public

Open source NLP resources for the Tunisian arabic dialect.

License

Notifications You must be signed in to change notification settings

wa3dbk/Barcha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barcha

Open source speech and natural language processing resources for the Tunisian arabic dialect (work in progress).


Ressources

The data and ressources collected within this project is multi-purpose ; named entities recognition, machine translation, language modelling, ..

Named entities

List of named entities :

People

Institutions, associations and companies

Places

Institutions

Todo :

  • Collect more raw text data in Tunisian arabic.
  • Develop cleaning / spelling correction scripts for Tunisian arabic.
  • Develop CODA-compatible normalization scripts for Tunisian arabic.
  • Develop Arabizi / arabic conversion scripts.
  • Develop scrapers for Tunisian news/forums websites.
  • Build parallel datasets for machine translation between Tunisian <-> english / MSA.
  • Develop translation systems for Tunisian <-> English and Tunisian <-> MSA.

References

CODA: Habash, Nizar, Mona T. Diab, and Owen Rambow. "Conventional Orthography for Dialectal Arabic." LREC. 2012.

Zribi, Inès, et al. "A Conventional Orthography for Tunisian Arabic." LREC. 2014.

Turki, Houcemeddine, et al. "A conventional orthography for maghrebi arabic." Proceedings of the International Conference on Language Resources and Evaluation (LREC), Portoroz, Slovenia. 2016.

Arabizi : Darwish, Kareem. *"Arabizi detection and conversion to Arabic." * arXiv preprint arXiv:1306.6755 (2013).

Yaghan, Mohammad Ali. "“Arabizi”: A contemporary style of Arabic Slang." Design issues 24.2 (2008): 39-52.

Masmoudi, Abir, et al. "Transliteration of arabizi into arabic script for tunisian dialect." ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19.2 (2019): 1-21.

About

Open source NLP resources for the Tunisian arabic dialect.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published