Abstract:The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These biases in text embedding based IR systems can inadvertently silence or marginalize certain communication styles, thereby posing a significant threat to fairness in information retrieval. Finally, we also compare the answer styles of Retrieval Augmented Generation (RAG) systems based on different LLMs and find out that most text embedding models are biased towards LLM's answer styles when used as evaluation metrics for answer correctness. This study sheds light on the critical issue of writing style based bias in IR systems, offering valuable insights for the development of more fair and robust models.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the bias and fairness issues exhibited by text embedding models in information retrieval (IR) systems when dealing with documents and queries of different writing styles. Specifically, the paper focuses on: 1. **Preference of text embedding models for different writing styles**: Do different text embedding models have a preference for specific writing styles? For example, some models may be more inclined towards formal and concise writing styles and perform poorly on informal or emotional writing styles. 2. **Influence of query writing style on retrieval results**: Can the writing style of a query introduce bias? For example, if a system is mainly trained with queries of a specific writing style, it may perform poorly on queries of other styles. 3. **Fairness of IR systems based on text embedding**: How do these biases affect the fairness of information retrieval systems? In particular, will certain writing styles be inadvertently marginalized or ignored by the system? 4. **Influence of data generated by LLM on text embedding models**: Will fine - tuning text embedding models with synthetic data generated by large language models (LLM) introduce specific writing style preferences? ### Specific problems - **Document writing style bias**: The paper explores the performance of different text embedding models when dealing with documents of different writing styles. For example, some models may be more inclined towards formal and detailed writing styles and perform poorly on informal or emotional writing styles. - **Query writing style bias**: The paper also studies the influence of queries of different writing styles on retrieval results. For example, some models may match the corresponding document writing style according to the writing style of the query, thus affecting the diversity of retrieval results. - **Influence of data generated by LLM**: The paper analyzes the influence of synthetic data generated by LLM on text embedding models, especially whether these models have a preference for the writing style generated by LLM. ### Research methods To study these problems, the paper designed a series of experiments, using multiple top - level text embedding models and comparing their performance when dealing with documents and queries of different writing styles. The experiments include: - **Document writing style experiment**: Rewrite the original document into 9 different writing styles and calculate the average ranking of each writing style to evaluate the model's preference for different writing styles. - **Query writing style experiment**: Rewrite the original query into 9 different writing styles and evaluate the influence of these different - style queries on retrieval results. ### Conclusions The paper found that most text embedding models do have an obvious preference for certain writing styles, which may lead to unfairness in information retrieval systems. For example, some models are more inclined towards formal and concise writing styles and perform poorly on informal or emotional writing styles. In addition, the writing style of a query also affects retrieval results, and some models will match the corresponding document writing style according to the writing style of the query. In general, this paper reveals the bias problems existing in text embedding models when dealing with different writing styles and emphasizes the importance of developing more fair and robust information retrieval systems.

Writing Style Matters: An Examination of Bias and Fairness in Information Retrieval Systems

Neural Retrievers are Biased Towards LLM-Generated Content

Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era

"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation

Learning to Generate Text in Arbitrary Writing Styles

Style Over Substance: Evaluation Biases for Large Language Models

Style Transfer in Text: Exploration and Evaluation

Bias in Text Embedding Models

Debiasing Gender Bias in Information Retrieval Models

Does writing style affect gender differences in the research performance of articles?: An empirical study of BERT-based textual sentiment analysis

Analysis of Media Writing Style Bias through Text-Embedding Networks

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems

From Lists to Emojis: How Format Bias Affects Model Alignment

Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

Implications of bias in automated writing quality scores for fair and equitable assessment decisions.

Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance

Are Models Biased on Text without Gender-related Language?

On Debiasing Text Embeddings Through Context Injection