Paper Thread - LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones?
My position paper “LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones?” got accepted to the C3NLP workshop co-located with NAACL 2025.
I share concerns about the missed opportunities with the rise of Arabic-specific LLMs.
-
Arabic speakers have substantial cultural similarities (see map below). This does not imply they have one single homogenous culture!
-
Their views tend to be ignored, even for largely diverse alignment datasets (e.g., PRISM, Kirk et al., 2024).


-
The NLP community acknowledges the rich diversity of the Arabic dialects, which are a manifestation of cultural differences across the region.
-
While Arabic-specific LLMs are still marketed as serving all Arabs, our alignment data/benchmarks are scarce and not inclusive enough!


I share some preliminary thoughts for four steps that could help in building culturally representative models.
Lastly, I hope this will spark discussions within the Arabic NLP community, and the broader NLP community interested in serving marginalized speech communities!



