Abstract:Complex networks offer a powerful framework for modeling linguistic phenomena. This study compares five distinct methods for representing sentences as networks, each with unique edge definitions: (1) a lines approach, where edges represent token (e.g., word) adjacency; (2) a close-range co-occurrence approach, where edges are based on the probability of tokens co-occurring at distance one or two; (3) a cliques approach, where edges connect tokens co-occurring within the same sentence; (4) a dependency-based approach, where edges are defined by syntactic dependencies extracted by a parser; (5) an IF -trimmed-subgraphs approach, where edges are determined by the Incidence-Fidelity ( IF ) Index. While the first four approaches are well established in the literature, the last one is a novel proposal. We also examined the effects of limiting the vertices to lemmas (i.e., words with inflections removed) and to lexical lemmas (i.e., nouns, adjectives, verbs, and adverbs) as opposed to the unaltered words. Our results reveal that these approaches yield networks with varying average minimal path lengths and degrees, influencing the interpretation of results. While small-world behavior remains consistent across networks, scale-free behavior analysis is affected. Notably, excluding functional words significantly alters degree distributions. We suggest, in order of relevance and according to the resources available, the dependency-based, the close-range co-occurrence, and the lines approaches for cases in which syntactic relations are central, and the IF-trimmed-subgraphs and the cliques approaches for cases in which semantic relations are central.

Comparison study of using semantic and syntactic network characteristics to do text clustering

Statistical Properties of Chinese Semantic Networks

The Complexity of Chinese Syntactic Dependency Networks

Central Nodes of the Chinese Syntactic Networks

Classifying Syntactic Categories in the Chinese Dependency Network.

Semantic Correlation Network Based Text Clustering

Language Clustering with Word Co-Occurrence Networks Based on Parallel Texts

Can syntactic networks indicate morphological complexity of a language?

Language clusters based on linguistic complex networks

Application of Quantitative Characteristics of Chinese Genres in Text Clustering

Chinese Writing of Deaf or Hard-of-hearing Students and Normal-hearing Peers from Complex Network Approach

Word Class,Syntactic Function and Style: A Comparative Study Based on Annotated Corpora

Modeling texts with networks: comparing five approaches to sentence representation

Chinese Word Similarity Computing Based on Semantic Tree

A Survey Of Semantic Similarity And Its Application To Social Network Analysis

Entropy in Different Text Types.

How Do Local Syntactic Structures Influence Global Properties in Language Networks?

Chinese Syntactic and Typological Properties Based on Dependency Syntactic Treebanks

Authorship Identification Based on Semantic Analysis

Stylistic Syntactic Structure Extraction and Semantic Clustering for Different Registers

Semantic Roles or Syntactic Functions: the Effects of Annotation Scheme on the Results of Dependency Measures