Abstract:The third wave started in 2010, when the research focused on Arabic NLP came back to the Arab world. This period witnessed a proliferation of Arab researchers and graduate students interested in Arabic NLP and an increase in publications in top conferences from the Arab world. Active universities include New York University Abu Dhabi (NYUAD),<a href="#FNB">b</a> American University in Beirut (AUB), Carnegie Mellon University in Qatar (CMUQ), King Saud University (KSU), Birzeit University (BZU), Cairo University, and others. Active research centers include Qatar Computing Research Institute (QCRI),<a href="#FNC">c</a> King Abdulaziz City for Science and Technology (KACST), and more. It should be noted that there are many actively contributing researchers in smaller groups across the Arab world. This period also overlapped with two major independent developments: the rise of deep learning and neural models, and the rise of social media. The first development affected the direction of research, pushing it further into the ML space; the second led to the increase in social media data, which introduced many new challenges at a larger scale: more dialects and more noise. This period also witnessed a welcome increase in Arabic language resources and processing tools, and a heightened awareness of the importance of AI for the future of the region—for example, the UAE now has a ministry specifically for AI. Finally, new young and ambitious companies such as Mawdoo3 are competing for a growing market and expectations in the Arab world.In the Arab world, the efforts are relatively limited in terms of creating annotated corpora. Examples include BZU's Curras, the Palestinian Arabic annotated corpus, NYUAD's Gumar, the Emirati Arabic annotated corpus, and Al-Mus'haf Quranic Arabic corpus. Another annotation effort with a focus on MSA spelling and grammar correction is the Qatar Arabic Language Bank (QALB), a project involving Columbia and CMUQ. Other specialized annotated corpora developed in the Arab world include NYUAD's parallel gender corpus with sentences in masculine and feminine for anti-gender bias research, the Arab-Acquis corpus pairing Arabic with all of Europe's languages for a portion of European parliamentary proceedings, and the MADAR corpus of parallel dialects created in collaboration with CMUQ.Although some progress has been made for both L1 and L2 PA, the dearth of resources compared with English remains the bottleneck for future progress. Resource-building efforts have focused on L1 readers with particular emphasis on grade school curricula. There is a push to inform the enhancement of curricula using pedagogical tools and to compare curricula across Arab countries. The L2 PAs are even more constrained, with limited corpora and a disproportionate focus on beginners.<a href="#FNN">n</a> There is a definite need for augmenting these corpora in a reasoned way, taking into consideration different text features and learners, both young and old, beefing up the sparsely populated levels with authentic material, and exploiting technologies such as text simplification and text error analysis and correction. Learner corpora, which as the name suggests are produced by learners of Arabic, can inform the creation of tools and corpora. A recent effort developed a large-scale Arabic readability lexicon compatible with an existing morphological analysis system.Another information retrieval-related problem is question answering, which comes in many flavors, the most common of which is attempting to identify a passage or a sentence that answers a question. Performing such a task may employ a large set of NLP tools such as parsing, NER, co-reference resolution, and text semantic representation. There has been limited research on this problem, and existing commercial solutions such as Ujeeb.com are rudimentary.

Advancements in Arabic Text-to-Speech Systems: A 22-Year Literature Review

Towards Zero-Shot Text-To-Speech for Arabic Dialects

An Expert System for Automatic Reading of A Text Written in Standard Arabic

Arabic Speech Recognition: Advancement and Challenges

Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

A panoramic survey of natural language processing in the Arab world

Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning

Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review

A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain

A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

End-to-End Speech Recognition For Arabic Dialects

A Survey of Large Language Models for Arabic Language and its Dialects

A Survey of Arabic Dialogues Understanding for Spontaneous Dialogues and Instant Message

Text-To-Speech based dictation platform for students with learning difficulties

Arabic question answering system: a survey

Linguistic disparities in cross-language automatic speech recognition transfer from Arabic to Tashlhiyt

Generative artificial intelligence in topic-sentiment classification for Arabic text: a comparative study with possible future directions

A Survey of Arabic Text Mining