Abstract:In recent years, both research and industry have shown an increasing interest in developing reliable information retrieval (IR) systems that can effectively address the growing demands of users worldwide. In spite of the relative success of IR systems in addressing the needs of users and even adapting to their environments, many problems remain unresolved. One main problem is lexical ambiguity which has negative impacts on the performance and reliability of IR systems. To date, lexical ambiguity has been one of the most frequently reported problems in the Arabic IR systems despite the development of different word sense disambiguation (WSD) techniques. This is largely attributed to the limitations of such techniques in addressing the issue of linguistic peculiarities. Hence, this study addresses these limitations by exploring the reasons for lexical ambiguity in IR applications in Arabic as one step towards reliable and practical solutions. For this purpose, the performances of six search engines Google, Bing, Baidu, Yahoo, Yandex, and Ask are evaluated. Results indicate that lexical ambiguities in Arabic IR applications are mainly due to the unique morphological and orthographic system of the Arabic language, in addition to its diglossia and the multiple colloquial dialects where sometimes mutual intelligibility is not achieved. For better disambiguation and IR performances in Arabic, this study proposes that clustering models based on supervised machine learning theory should be trained to address the morphological diversity of Arabic and its unique orthographic system. Search engines should also be adapted to the geographic location of the users in order to address the issue of vernacular dialects of Arabic. They should also be trained to automatically identify the different dialects. Finally, search engines should consider all varieties of Arabic and be able to interpret the queries regardless of the particular language adopted by the user.

Context-aware Urdu Information Retrieval System

Lexical Ambiguity in Arabic Information Retrieval: The Case of Six Web-Based Search Engines

End-to-end vertical web search pseudo relevance feedback queries recommendation software

Semantic Information Retrieval Using Ontology In University Domain

CURE: Collection for Urdu Information Retrieval Evaluation and Ranking

An Efficient Indexing and Searching Technique for Information Retrieval for Urdu Language

Urdu Speech and Text Based Sentiment Analyzer

NERWS: Towards Improving Information Retrieval of Digital Library Management System Using Named Entity Recognition and Word Sense

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

UQA: Corpus for Urdu Question Answering

End-to-end pseudo relevance feedback based vertical web search queries recommendation

Personalized and context-aware retrieval based on fuzzy ontology profiling

A framework for contextual information retrieval from the WWW

A machine learning approach for Urdu text sentiment analysis

Building a Multilevel Inflection Handling Stemmer to Improve Search Effectiveness for Urdu Language

Developing a Large Benchmark Corpus for Urdu Semantic Word Similarity

A Literature Review of Keyword Spotting Technologies for Urdu

Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study

WER We Stand: Benchmarking Urdu ASR Models

Information Retrieval System for College Search

Distributed and Cooperative Information Retrieval on the World Wide Web