Abstract:In recent years, the concepts of ``diversity'' and ``inclusion'' have attracted considerable attention across a range of fields, encompassing both social and biological disciplines. To fully understand these concepts, it is critical to not only examine the number of categories but also the similarities and relationships among them. In this study, I introduce a novel index for diversity and inclusion that considers similarities and network connections. I analyzed the properties of these indices and investigated their mathematical relationships using established measures of diversity and networks. Moreover, I developed a methodology for estimating similarities based on the utility of diversity. I also created a method for visualizing proportions, similarities, and network connections. Finally, I evaluated the correlation with external metrics using real-world data, confirming that both the proposed indices and our index can be effectively utilized. This study contributes to a more nuanced understanding of diversity and inclusion analysis.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to effectively integrate similarity and network relationships into a comprehensive diversity and inclusion index in order to measure and understand diversity and inclusion in different fields (such as society, biology, etc.) more accurately. Specifically, the paper proposes the following two key questions:
1. **How to effectively incorporate similarity and network relationships into a composite index?**
2. **In this context, how should similarity be defined and quantified?**
To solve these problems, the author proposes a new Diversity and Inclusion Index (Diversity and Inclusion Index with Networks and Similarity, DSN), which not only considers the proportion of categories but also combines the similarity and network connections between categories. By introducing this new index, the author hopes to overcome some limitations of existing diversity indices, such as the failure to fully consider the similarity and network interaction between categories.
In addition, the author also develops a method for estimating similarity based on the concept of utility and proposes a visualization method to help interpret the proportion, similarity, and network connections in the data. Finally, the author uses real - world data to verify the effectiveness of the new index, proving that it has more advantages in measuring diversity and inclusion than existing indices.
### Formula Summary
The diversity index \(D_q(p)\) mentioned in the paper is defined as follows:
\[
D_q(p) =
\begin{cases}
\left( \sum_{i = 1}^n p_i^q \right)^{\frac{1}{1 - q}}, & \text{if } q \neq 1, \\
\exp \left( - \sum_{i = 1}^n p_i \ln p_i \right), & \text{if } q = 1.
\end{cases}
\]
And the proposed diversity and inclusion index \(D_{\bar{Z}}^q(p, E)\) is defined as:
\[
D_{\bar{Z}}^q(p, E) =
\begin{cases}
\left( \sum_{i = 1}^n p_i \left( (L - \bar{Z} \circ E) p \right)_i^{q - 1} \right)^{\frac{1}{1 - q}}, & \text{if } q \neq 1, \infty, \\
\prod_{i = 1}^n \left( (L - \bar{Z} \circ E) p \right)_i^{-p_i}, & \text{if } q = 1, \\
\left( \max_{i \in \{1, \ldots, n\}} \left( (L - \bar{Z} \circ E) p \right)_i \right)^{-1}, & \text{if } q = \infty,
\end{cases}
\]
where:
- \(p=(p_1, \ldots, p_n)^T\) is the proportion vector,
- \(Z=(Z_{i,j})_{1 \leq i,j \leq n} \in [0,1]\) is the similarity matrix,
- \(\bar{Z}=L - Z\) is the dissimilarity matrix (distance matrix),
- \(E=(E_{i,j})_{1 \leq i,j \leq n} \in [0,1]\) is the adjacency matrix, representing the network,
- \(q \in [0, \infty]\) is the type specifying diversity,
- \((L - \bar{Z} \circ E) p\) represents the similarity - weighted proportion adjusted by the network.
These formulas are used to measure and analyze the influence of the similarity and network relationships between different categories on diversity.