The Paradox of Second-Order Homophily in Networks

Anna Evtushenko,Jon Kleinberg
DOI: https://doi.org/10.1038/s41598-021-92719-6
2021-07-16
Abstract:Homophily -- the tendency of nodes to connect to others of the same type -- is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neighbors). Through this view, we find a surprising result for homophily values that applies with only minimal assumptions on the graph topology. It can be phrased most simply as "in a graph of red and blue nodes, red friends of red nodes are on average more homophilous than red friends of blue nodes." This gap in averages defies simple intuitive explanations, applies to globally heterophilous and homophilous networks and is reminiscent of but structurally distinct from the Friendship Paradox. The existence of this gap suggests intrinsic biases in homophily measurements between groups, and hence is relevant to empirical studies of homophily in networks.
Social and Information Networks,Discrete Mathematics,Combinatorics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is a counter - intuitive phenomenon in the measurement of homophily between nodes in the network, namely the "Second - Order Homophily Paradox". Specifically, the paper explores that in a network containing two types of nodes (for example, red and blue nodes), on average, the friends of red nodes (red friends) have higher homophily than the friends of blue nodes (red friends). This finding reveals that in network research, there may be an inherent bias in the measurement of homophily between different groups, which is similar to the traditional Friendship Paradox but structurally different. ### Main problems 1. **Definition of homophily**: - **First - Order Homophily**: The proportion of nodes of the same type among the neighbors of a node. - **Second - Order Homophily**: The set or mean of the first - order homophily values of a node's neighbors. 2. **Research background**: - Homophily is a core concept in network research, referring to the tendency of nodes to connect with other nodes with similar properties. - Traditional homophily research mainly focuses on global properties, while this paper starts from a local perspective to explore the homophily of each node. 3. **Main findings**: - In a network containing red and blue nodes, the average homophily value of red friends of red nodes is strictly greater than the average homophily value of red friends of blue nodes. - This result applies to almost all undirected network topologies, whether the network is globally homogeneous or heterogeneous. ### Mathematical representation - **First - Order Homophily**: The first - order homophily \(h_i\) of node \(i\) is defined as the proportion of nodes of the same type among its neighbors. - **Second - Order Homophily**: The second - order homophily \(s_i^{(R)}\) of node \(i\) is defined as the list or mean of the first - order homophily values of its red neighbors. ### Main results - The mean of the second - order homophily of red nodes \(\mu_R^{(R)}\) is greater than the mean of the second - order homophily of blue nodes \(\mu_B^{(R)}\). - The "red gap" \(g^{(R)}=\mu_R^{(R)}-\mu_B^{(R)}\) is positive. ### Proof ideas - **List form**: By calculating the list means of the second - order homophily values of red and blue nodes, prove that \(g^{(R)} > 0\). - **Single form**: Consider another aggregation method, that is, taking the mean of the first - order homophily values of neighbors, and prove that in a specific case \(g^{(R,\text{sing})}>0\). ### Empirical analysis - **Dataset**: Use 100 anonymous social networks in the Facebook100 dataset for empirical analysis. - **Results**: In all 97 co - educational schools, the second - order homophily gaps between female and male nodes are all positive, and the size of the gap is highly correlated with the diversity of the first - order homophily. ### Significance - **Theoretical significance**: Reveals the inherent bias in homophily measurement and provides a new understanding of the concept of homophily in network research. - **Application value**: Helps to analyze and interpret homophily phenomena in networks more accurately, especially in the fields of social networks, biological networks, etc. Through these studies, the paper not only proposes a new theoretical result, but also verifies the universality and importance of this result through empirical data.