Abstract:The Gaokao, also known as China's national college entrance exam, is a high-stakes exam for nearly all Chinese students. English has been one of the three most important subjects for a long time, and listening plays an important role in the Gaokao English test. However, relatively little research has been conducted on local versions of Gaokao's English listening tests. This study analyzed the linguistic features and corresponding functional dimensions of the three different text types in the Gaokao's listening test, investigating whether the papers used in three major regions of China were differentiated in terms of the co-occurrence patterns of lexicogrammatical features and dimensions of the transcripts. A corpus consisting of 170 sets of test papers (134,913 words) covering 31 provinces and cities from 2000 to 2022 was analyzed using a multidimensional analysis wherein six exclusive dimensions were extracted. The results showed that there were meaningful differences across short conversations, long conversations, and monologues with regard to the six dimensions' scores, and regions further had significant differences in three dimensions: Syntactic and Clausal Complexity, Oral versus Literate Discourse, and Procedural Discourse, while Time Period was not associated with any differences. Implications for language teaching and assessment are discussed.

A Study on the Appropriate Size of the Mongolian General Corpus

Score Regulation Based on GMM Token Ratio Similarity for Speaker Recognition

How Will Text Size Influence the Length of Its Linguistic Constituents?

How Many is Enough?—Statistical Principles for Lexicostatistics

A Diachronic Study of Chinese Word Length Distribution.

How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Character Usage in Chinese Short Message Service (SMS): a Real-World Study in Mainland China

A Statistic Study of Three-character Unknown Words in Chinese.

Heaps' Law in GPT-Neo Large Language Model Emulated Corpora

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

A Multidimensional Analysis of a High-Stakes English Listening Test: A Corpus-Based Approach

MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset

Chinese Word Frequency Approximation Based on Multitype Corpora.

Statistical Analyses on Chinese Ancient Books fo Information Retrieval

A Miniature Chinese TTS System Based on Tailored Corpus

Research on Evaluation of Token Imbalance Degree in NMT Corpus

A corpus based analysis of lexical richness of Beijing Mandarin speakers: variable identification and model construction

A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

Language resource construction for Mongolian.

A large synchronous corpus as monitoring corpus: Some comparative content analysis of Chinese and Japanese language developments

Tokenization Falling Short: On Subword Robustness in Large Language Models