Corpus Analysis with spaCy

Megan S. Kane
DOI: https://doi.org/10.46430/phen0113
2023-11-02
Abstract:This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
What problem does this paper attempt to address?