Abstract:As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the commonsense reasoning capability for LLMs on the curated dataset, we establish a series of dialogue-level reasoning and detection tasks, including commonsense knowledge filling, commonsense knowledge generation, commonsense conflict phrase detection, domain identification, slot identification, and event causal inference. A wide variety of existing open-source Chinese LLMs are evaluated with these tasks on our dataset. Experimental results demonstrate that these models are not competent to predict CORECODE's plentiful reasoning content, and even ChatGPT could only achieve 0.275 and 0.084 accuracy on the domain identification and slot identification tasks under the zero-shot setting. We release the data and codes of CORECODE at <a class="link-external link-https" href="https://github.com/danshi777/CORECODE" rel="external noopener nofollow">this https URL</a> to promote commonsense reasoning evaluation and study of LLMs in the context of daily conversations.

A Manually Annotated Chinese Corpus for Non-task-oriented Dialogue Systems

A corpus-based approach for cooperative response generation in a dialog system

CNAMD Corpus: A Chinese Natural Audiovisual Multimodal Database of Conversations for Social Interactive Agents

Building Context-Related Dialogue Systems Based on Chinese-Script-Dialogue Corpus

Manually Crafted Chinese Text Corpus for Text Emotion Recognition.

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset forE-commerce Customer Service

Annotating the Contemporary Chinese Corpus

A large synchronous corpus as monitoring corpus: Some comparative content analysis of Chinese and Japanese language developments

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

A Pilot Study on Dialogue-Level Dependency Parsing for Chinese

Dialog Act Annotation for Chinese Daily Conversation

Opinion Annotation in On-line Chinese Product Reviews

CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation

Chinese Emotional Dialogue Response Generation via Reinforcement Learning

Automatically Annotate TV Series Subtitles for Dialogue Corpus Construction

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

The Moral Foundations Weibo Corpus

Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

NewsDialogues: Towards Proactive News Grounded Conversation.

An Expressive Mandarin Speech Corpus

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models