Revisiting Keyword Analysis in a Specialized Corpus: Religious Terminology Extraction

Hsin-Yi Lien
DOI: https://doi.org/10.1080/09296174.2020.1865668
2021-01-01
Journal of Quantitative Linguistics
Abstract:This study investigates keyword extraction using a compiled Buddhist corpus. It sets out the fundamental mode of generation and refinement of keywords with statistical measures and manual screening with specific criteria. The Buddhist Word List contains 1244 keywords with 375 Pali words in Buddhist literacy. We compared the results of applying occurring frequency, log-likelihood (LL), and odds ratio (OR) in keyword analyses, each of which resulted in different keyword rankings. Our results show that statistical measures are useful for the identification of particular keywords in specific fields and OR is more effective in identifying technical terms. We demonstrate that multilevel keyword analysis is more effective at the identification of high-frequency technical words than either of these methods used alone. Multilevel methods are recommended for the creation of future domain-specific vocabulary lists to overcome the inherent flaws of individual analytic methods.
linguistics
What problem does this paper attempt to address?