Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Momose Oyama,Hiroaki Yamagiwa,Hidetoshi Shimodaira
2024-10-09
Abstract:Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.
Computation and Language
What problem does this paper attempt to address?