“No one is born hating another person because of the colour of his skin, or his background, or his religion. People must learn to hate, and if they can learn to hate, they can be taught to love, for love comes more naturally to the human heart than its opposite.” — Nelson Mandela, Long Walk to Freedom
The dataset Release Coming Soon! For inquiries and collaboration, feel free to reach out to us. Check out our dataset paper: https://arxiv.org/pdf/2501.08284.
Online hate is a growing problem worldwide, causing harm to users who are exposed to it, polluting and disrupting online communities, and leading to psychological harm and offline violence. Social media platforms facilitate the propagation of hate and offensive speech by allowing users to rapidly create and spread hateful content.
Social media organizations have taken various steps to protect their users from the spread of hate speech in different parts of the world. However, in Africa, efforts tackling hateful content have primarily focused on high-profile individuals, and are addressed through time-intensive human labour. This approach is not scalable and fails to effectively moderate the vast majority of the content directed to less prominent individuals. Moreover, African languages are under-served in NLP with very few to no assistive machine learning tools to help with the moderation process. African users are therefore subject to restrictive interventions such as the removal of social media content based on certain keywords, regardless of their context or their intent.
Since African languages have been comparatively low-resource in NLP research mainly due to the lack of labeled datasets, we will introduce AfriHate, the first high-quality labeled Twitter dataset collection for detecting hate and abusive languages in 18 African languages.
# | Language | Country | Language Coordinators |
---|---|---|---|
1. | Hausa | Nigeria | Saminu Mohammad Aliyu |
2. | Yoruba | Nigeria | David Ifeoluwa Adelani |
3. | Igbo | Nigeria | Chiamaka Ijeoma Chukwuneke |
4. | Nigerian-Pidgin | Nigeria | Saminu Mohammad Aliyu |
5. | Amharic | Ethiopia | Ebrahim Chekol Jibril |
6. | Tigrinya | Ethiopia | Hagos Tesfahun Gebremichael |
7. | Oromo | Ethiopia | Tadesse Kebede Guge |
8. | Somali | Ethiopia | Elyas Abdi Ismail |
9. | Twi | Ghana | Abigail Oppong |
10. | Swahili | Kenya | Lilian D. A. Wanzare |
11. | Moroccan Arabic | Morocco | Oumaima Hourrane |
12. | Kinyarwanda | Rwanda | Samuel Rutunda |
13. | isiZulu | South Africa | Rooweither Mabuya |
14. | isiXhosa | South Africa | Andiswa Bukula |
15. | Algerian Arabic | Algeria | Nedjma Ousidhoum |
This is a collaborative project with team members from different universities, institutions, and the industry. Team members include:
Name | Affiliation |
---|---|
Shamsuddeen Hassan Muhammad | Bayero University, Kano Nigeria; MasaKhane |
Esubalew Alemneh | Bahir Dar University, Bahir Dar, Ethiopia |
Seid Muhie Yimam | University of Humberg; MasaKhane, EthioNLP |
Idris Abdulmumin | Ahmadu Bello University, Zaria, Nigeria |
Ibrahim Sa’id Ahmad | Bayero University, Kano; MasaKhane |
Abinew Ali | Bahir Dar University, EthioNLP |
David Ifeoluwa Adelani | MasaKhane; Saarland University |
Saminu Mohammad Aliyu | Bayero University, Kano; MasaKhane |
Nedjma Ousidhoum | University of Cambridge |
Paul Röttger | University of Oxford |
We acknowledge that current hate speech detection models have a limited ability to classify subtle content and tend to generate false positives and false negatives. We do not claim that systems trained on our datasets will not suffer from the same shortcomings and do not intend to deploy any of our systems for automated content removal, surveillance, censorship, profiling or law enforcement. Our goal is to study the overlooked underlying socio-linguistic phenomena in African languages to avoid false generalizations, educate people on unconscious biases, and build useful assistive moderation technologies in the future.
This work was carried out with support from Lacuna Fund, an initiative co-founded by The Rocke-feller Foundation, Google.org, and Canada’s International Development Research Centre. The views expressed herein do not necessarily represent those of Lacuna Fund, its Steering Committee, its funders, or Meridian Institute. We also express our profound gratitude to the Nigerian Artificial Intelligence Research Scheme (NAIRS) for providing a grant in this project.