🦋 Thread 📜 on arXiv

My position paper “LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones?” got accepted to the C3NLP workshop co-located with NAACL 2025.

I share concerns about the missed opportunities with the rise of Arabic-specific LLMs.


  • Arabic speakers have substantial cultural similarities (see map below). This does not imply they have one single homogenous culture!

  • Their views tend to be ignored, even for largely diverse alignment datasets (e.g., PRISM, Kirk et al., 2024).


  • The NLP community acknowledges the rich diversity of the Arabic dialects, which are a manifestation of cultural differences across the region.

  • While Arabic-specific LLMs are still marketed as serving all Arabs, our alignment data/benchmarks are scarce and not inclusive enough!


I share some preliminary thoughts for four steps that could help in building culturally representative models.

Lastly, I hope this will spark discussions within the Arabic NLP community, and the broader NLP community interested in serving marginalized speech communities!