Writing Style Matters: An Examination of Bias and Fairness in Information Retrieval Systems

Hongliu Cao
DOI: https://doi.org/10.1145/3701551.3703514
2024-11-20
Abstract:The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These biases in text embedding based IR systems can inadvertently silence or marginalize certain communication styles, thereby posing a significant threat to fairness in information retrieval. Finally, we also compare the answer styles of Retrieval Augmented Generation (RAG) systems based on different LLMs and find out that most text embedding models are biased towards LLM's answer styles when used as evaluation metrics for answer correctness. This study sheds light on the critical issue of writing style based bias in IR systems, offering valuable insights for the development of more fair and robust models.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the bias and fairness issues exhibited by text embedding models in information retrieval (IR) systems when dealing with documents and queries of different writing styles. Specifically, the paper focuses on: 1. **Preference of text embedding models for different writing styles**: Do different text embedding models have a preference for specific writing styles? For example, some models may be more inclined towards formal and concise writing styles and perform poorly on informal or emotional writing styles. 2. **Influence of query writing style on retrieval results**: Can the writing style of a query introduce bias? For example, if a system is mainly trained with queries of a specific writing style, it may perform poorly on queries of other styles. 3. **Fairness of IR systems based on text embedding**: How do these biases affect the fairness of information retrieval systems? In particular, will certain writing styles be inadvertently marginalized or ignored by the system? 4. **Influence of data generated by LLM on text embedding models**: Will fine - tuning text embedding models with synthetic data generated by large language models (LLM) introduce specific writing style preferences? ### Specific problems - **Document writing style bias**: The paper explores the performance of different text embedding models when dealing with documents of different writing styles. For example, some models may be more inclined towards formal and detailed writing styles and perform poorly on informal or emotional writing styles. - **Query writing style bias**: The paper also studies the influence of queries of different writing styles on retrieval results. For example, some models may match the corresponding document writing style according to the writing style of the query, thus affecting the diversity of retrieval results. - **Influence of data generated by LLM**: The paper analyzes the influence of synthetic data generated by LLM on text embedding models, especially whether these models have a preference for the writing style generated by LLM. ### Research methods To study these problems, the paper designed a series of experiments, using multiple top - level text embedding models and comparing their performance when dealing with documents and queries of different writing styles. The experiments include: - **Document writing style experiment**: Rewrite the original document into 9 different writing styles and calculate the average ranking of each writing style to evaluate the model's preference for different writing styles. - **Query writing style experiment**: Rewrite the original query into 9 different writing styles and evaluate the influence of these different - style queries on retrieval results. ### Conclusions The paper found that most text embedding models do have an obvious preference for certain writing styles, which may lead to unfairness in information retrieval systems. For example, some models are more inclined towards formal and concise writing styles and perform poorly on informal or emotional writing styles. In addition, the writing style of a query also affects retrieval results, and some models will match the corresponding document writing style according to the writing style of the query. In general, this paper reveals the bias problems existing in text embedding models when dealing with different writing styles and emphasizes the importance of developing more fair and robust information retrieval systems.