Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths

Tiago P. Peixoto
DOI: https://doi.org/10.1017/9781009118897
2023-07-06
Abstract:Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.
Physics and Society,Social and Information Networks,Data Analysis, Statistics and Probability,Methodology,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the misunderstandings and misperceptions in the selection and application of community detection methods in network science. Specifically: 1. **The difference between descriptive and inferential community detection**: The paper first distinguishes between descriptive and inferential community detection methods. Descriptive methods focus on dividing communities according to patterns in the network, while inferential methods explain how these patterns are formed through generative models and can statistically distinguish between structure and randomness. 2. **The shortcomings of existing methods**: Although inferential methods are more theoretically rigorous, in practical applications, many researchers still tend to use the earlier - proposed descriptive methods, which have many serious flaws, such as over - fitting, resolution limitations, etc. 3. **The guiding principles for method selection**: The paper provides a simple "touchstone test" to help researchers determine whether to choose a descriptive method or an inferential method for a specific task. The core question of the test is: "If we learn that the analyzed network is maximally random, is our conclusion useful?" If the answer is yes, then an inferential method is required; otherwise, a descriptive method may be sufficient. 4. **Common misunderstandings and half - truths**: The paper discusses in detail the common misunderstandings and half - truths in community detection, such as whether modularity maximization is equivalent to generative model inference, whether consensus clustering can eliminate over - fitting, and whether setting the modularity maximization resolution parameter can solve the resolution limitation. The author refutes these views one by one and provides more reasonable explanations and suggestions. In short, this paper aims to improve researchers' understanding of community detection methods and avoid errors and misleadings caused by improper method selection in practical applications. By clarifying these misunderstandings, the paper hopes to promote more effective and scientific applications of community detection methods.