Community detection in networks: A user guide

Santo Fortunato,Darko Hric
DOI: https://doi.org/10.1016/j.physrep.2016.09.002
IF: 30.51
2016-11-01
Physics Reports
Abstract:Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the community detection problem in networks. Community detection is a very popular topic in network science. A community or cluster is usually a set of nodes where the probability of connection between nodes is higher than the probability of connection with other group members. However, identifying communities is an ill - defined problem, because at present there are no unified standards or protocols for the definition of communities themselves, algorithm verification, and performance comparison. These ambiguities and misunderstandings have hindered the progress in this field. Therefore, this paper aims to provide guidance on the main aspects of the community detection problem through the following aspects: 1. **The concept of community**: Discuss the development of the community concept, from the traditional sub - graph - based view to the modern statistical interpretation. 2. **The verification problem**: Emphasize the importance of artificial benchmarks, the importance of selecting partition similarity scores, the conditions for detecting communities, the usefulness of metadata, and the unique characteristics of community structures in real - world networks. 3. **Methodological discussion**: Critically discuss some popular clustering methods, explore methods for determining the number of clusters, the possibility of generating robust solutions by combining multiple partitions, the main methods for discovering dynamic communities, and methods for evaluating the importance of clustering. 4. **Software resources**: Point out where useful software tools can be found. Through these discussions, the author hopes to provide a comprehensive guide for practitioners and also enable readers with basic network science knowledge to understand this complex problem.