Zia Qi,Brian E. Perron,Miao Wang,Cao Fang,Sitao Chen,Bryan G. Victor
Abstract:Objective: This study examines how well leading Chinese and Western large language models understand and apply Chinese social work principles, focusing on their foundational knowledge within a non-Western professional setting. We test whether the cultural context in the developing country influences model reasoning and accuracy.
Method: Using a published self-study version of the Chinese National Social Work Examination (160 questions) covering jurisprudence and applied knowledge, we administered three testing conditions to eight cloud-based large language models - four Chinese and four Western. We examined their responses following official guidelines and evaluated their explanations' reasoning quality.
Results: Seven models exceeded the 60-point passing threshold in both sections. Chinese models performed better in jurisprudence (median = 77.0 vs. 70.3) but slightly lower in applied knowledge (median = 65.5 vs. 67.0). Both groups showed cultural biases, particularly regarding gender equality and family dynamics. Models demonstrated strong professional terminology knowledge but struggled with culturally specific interventions. Valid reasoning in incorrect answers ranged from 16.4% to 45.0%.
Conclusions: While both Chinese and Western models show foundational knowledge of Chinese social work principles, technical language proficiency does not ensure cultural competence. Chinese models demonstrate advantages in regulatory content, yet both Chinese and Western models struggle with culturally nuanced practice scenarios. These findings contribute to informing responsible AI integration into cross-cultural social work practice.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the performance of large - language models (LLMs) in non - Western contexts, especially in the professional standards of social work in China. Specifically, the research aims to answer the following questions:
1. **Evaluating the breadth and depth of social work knowledge**: Can these models understand and apply the core concepts, ethics, and policies of Chinese social work?
2. **Evaluating the model's reasoning and knowledge - application ability**: When dealing with social work concepts, do these models base on true understanding or merely rely on pattern matching? Can they reasonably apply the learned knowledge in a professional environment?
### Research Background
With the development of artificial intelligence technology, especially the application of large - language models, the field of social work has also begun to explore how to integrate these technologies into practical work. However, most of the existing research focuses on Western contexts, and the performance of these models in social work practices in non - Western countries (such as China) has not been fully evaluated.
### Research Methods
In order to systematically evaluate the performance of LLMs in Chinese social work, the researchers used the "China National Social Worker Professional Level Examination" (CNSWE) as an evaluation tool. CNSWE covers social work theories, ethical principles, case management, community development, and knowledge in specific fields, such as child welfare, mental health, and gerontology. This examination is divided into three parts: regulation test, applied - knowledge test, and scenario assessment. This research mainly focuses on the regulation test and the applied - knowledge test.
### Main Findings
- **Overall performance**: Seven out of eight models exceeded the passing score of 60 in both the regulation test and the applied - knowledge test. Only one model (Deepseek) was slightly below the passing score (59.5) in the applied - knowledge test.
- **Cultural differences**: Chinese models performed better in the regulation test (median score of 77.0), but were slightly inferior in the applied - knowledge test (median score of 65.5). Western models had a slight advantage in the applied - knowledge test (median score of 67.0).
- **Quality of reasoning**: Although most models were able to provide correct answers, when explaining their reasoning processes, they often showed cultural biases, especially in scenarios involving gender equality and family dynamics.
### Conclusion
This research shows that although large - language models technically possess certain language abilities, it does not mean that they have sufficient cultural sensitivity and professional competence to deal with complex cross - cultural social work practices. The research results emphasize the importance of evaluating AI tools in different cultural contexts and provide valuable references for future research and practice.
Through this research, the author hopes to promote more discussions on the responsible application of AI in social work, ensuring that these technologies can be effectively and fairly applied in different cultural contexts.