Towards a rigorous analysis of mutual information in contrastive learning

Kyungeun Lee,Jaeill Kim,Suhyun Kang,Wonjong Rhee
DOI: https://doi.org/10.1016/j.neunet.2024.106584
Abstract:Contrastive learning has emerged as a cornerstone in unsupervised representation learning. Its primary paradigm involves an instance discrimination task utilizing InfoNCE loss where the loss has been proven to be a form of mutual information. Consequently, it has become a common practice to analyze contrastive learning using mutual information as a measure. Yet, this analysis approach presents difficulties due to the necessity of estimating mutual information for real-world applications. This creates a gap between the elegance of its mathematical foundation and the complexity of its estimation, thereby hampering the ability to derive solid and meaningful insights from mutual information analysis. In this study, we introduce three novel methods and a few related theorems, aimed at enhancing the rigor of mutual information analysis. Despite their simplicity, these methods can carry substantial utility. Leveraging these approaches, we reassess three instances of contrastive learning analysis, illustrating the capacity of the proposed methods to facilitate deeper comprehension or to rectify pre-existing misconceptions. The main results can be summarized as follows: (1) While small batch sizes influence the range of training loss, they do not inherently limit learned representation's information content or affect downstream performance adversely; (2) Mutual information, with careful selection of positive pairings and post-training estimation, proves to be a superior measure for evaluating practical networks; and (3) Distinguishing between task-relevant and irrelevant information presents challenges, yet irrelevant information sources do not necessarily compromise the generalization of downstream tasks.
What problem does this paper attempt to address?