Amr Keleg

Affiliation

The university of Edinburgh

Edinburgh, Scotland

Hello (أهلًا وسهلًا)! 👋👋

My name is Amr Keleg عمرو قلج (/ʕamr/ /kɯˈɫɯtʃ/). I am a PhD student (CDT in NLP) at the University of Edinburgh, working under the supervision of Walid Magdy and Sharon Goldwater. I am currently studying the variation across and between the Arabic dialects, their mutual intelligibility, and the implications of this variation on the creation of multi-dialect Arabic datasets.

Additionally, I am interested in Arabizi (the Romanized form of Arabic). I developed a rule-based tool transliterating Arabizi into Arabic script. Ping me if you are interested in sharing ideas related to Arabizi (identification/transliteration/…), and/or collaborating on that!

Multilingualism is another field/cause that I am becoming more and more interested in!

As an undergraduate student, I was a competitive programming addict (lots of fun experiences 😄). I am also an advocate of open-sourcing data/models/projects (twice a Google Summer of Code student for Apertium, and GNU Octave + contributor to other projects like Facebook/Duckling).

News

Aug 14, 2024 Our paper “Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets” got an Outstanding Paper Award 🎖️🎖️🎖️
Jul 1, 2024 Gave an online talk to the ARBml community under the title Distinguishing between the Varieties of Arabic: Dialect Identification is nether Solved nor the Solution.. Check the slides: here.
May 15, 2024 Had a short paper “Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets” accepted to ACL 2024 🎉🎉 See you in Thailand!
Feb 1, 2024 Co-organizing the NADI 2024 shared task as part of the ArabicNLP 2024 conference.
Dec 7, 2023 Presented my EMNLP 2023 paper ALDi: Quantifying the Arabic Level of Dialectness of Text, and my ArabicNLP 2023 paper Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification in Singapore 🎉🎉
More news...

Selected Publications

  1. Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets
    Keleg, Amr, Magdy, Walid, and Goldwater, Sharon
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2024
    ACL 2024 Outstanding Paper Award
  2. NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
    Abdul-Mageed, Muhammad, Keleg, Amr, Elmadany, AbdelRahim, Zhang, Chiyu, Hamed, Injy, Magdy, Walid, Bouamor, Houda, and Habash, Nizar
    In Proceedings of The Second Arabic Natural Language Processing Conference 2024
  3. ALDi: Quantifying the Arabic Level of Dialectness of Text
    Keleg, Amr, Goldwater, Sharon, and Magdy, Walid
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
  4. Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
    Keleg, Amr, and Magdy, Walid
    In Proceedings of ArabicNLP 2023 2023
  5. DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models
    Keleg, Amr, and Magdy, Walid
    In Findings of the Association for Computational Linguistics: ACL 2023 2023
  6. SMASH at Qur’an QA 2022: Creating Better Faithful Data Splits for Low-resourced Question Answering Scenarios
    Keleg, Amr, and Magdy, Walid
    In Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection 2022