A corpus based analysis of lexical richness of Beijing Mandarin speakers: variable identification and model construction

Yanhui Zhang
DOI: https://doi.org/10.1016/j.langsci.2013.12.003
IF: 0.816
2014-07-01
Language Sciences
Abstract:This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject’s spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.
linguistics
What problem does this paper attempt to address?