A Hierarchical Information Compression Approach for Knowledge Discovery from Social Multimedia

Zheng Liu,Yu Weng,Ruiyang Xu,Chaomurilige,Honghao Gao
DOI: https://doi.org/10.1109/tcss.2024.3440997
2024-01-01
IEEE Transactions on Computational Social Systems
Abstract:Knowledge discovery is an ongoing research endeavor aimed at uncovering valuable insights and patterns from large volumes of data in massive social systems (MSSs). Although recent advances in deep learning have made significant progress in knowledge discovery, the “data dimensionality reduction” problem still poses practical challenges. To address this, we have introduced a hierarchical information compression (IC) approach, which emphasizes the elimination of redundant and irrelevant features and the generation of high-quality knowledge representation, aiming to enhance the information density of the knowledge discovery process. Our approach consists of coarse-grained and fine-grained stages for data compression. In the coarse-grained stage, our method employs the key feature distiller based on the Siamese network to effectively identify a substantial number of irrelevant features and latent redundancies within coarse-grained data blocks. Moving on to the fine-grained stage, our model further compresses the internal features of the data, extracting the most crucial knowledge and facilitating data compression by cross-block learning. By implementing these two stages, the approach achieves both inter and innerblock IC while preserving essential knowledge. To validate the performance of our proposed model, we conducted several experiments using WikiSum, a large knowledge corpus based on English Wikipedia in MSSs. The experimental results demonstrate that our model achieved a 2.38% increase on recall-oriented understudy for gisting evaluation (ROUGE)-2 and an improvement of over 7% on the informativeness and conciseness metrics, as evidenced by the improved scores obtained from both automatic and human evaluations. The experimental results prove that our model can effectively select the most pertinent and meaningful content and reduce the redundancy to generate better knowledge representation.
What problem does this paper attempt to address?