Embedding Search for Quranic Texts based on Large Language Models

Mohammed Alqarni
DOI: https://doi.org/10.34028/21/2/7
2024-01-01
The International Arab Journal of Information Technology
Abstract:Semantic search is the process of retrieving relevant information from a large corpus of texts based on the meaning and context of the query. This paper is introduced in order to explore the use of large language models for semantic search of Quranic texts. The Quran, which is the central religious text of Islam, contains rich and complex linguistic and semantic features that pose challenges for traditional keyword-based search methods. This study investigates a semantic search approach utilizing. Large Language Models (LLM) embedding and assess the performance of LLM embedding in comparison to a baseline embedding-based search method using a set of queries that represent different semantic search levels. In addition, this study will also discuss the limitations and implications of using large language models for semantic search of Quranic texts and suggest directions for future research. A significant finding in this study is the consistent effectiveness of the LLM embedding across varying semantic complexities. This suggests that embedding using LLMs can capture deep semantic connections effectively. On the other hand, as a second finding, the state-of-the-art transformer, AraT5, outperforms LLM embeddings in low-level semantic searches, indicating potential for further LLM fine-tuning on Arabic text corpora
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?