Text Fingerprint Extraction Based on Latent Semantic Analysis

Tongtong CUI,Rongyi CUI
DOI: https://doi.org/10.3969/j.issn.1003-0077.2018.05.010
2018-01-01
Abstract:The arrival of the era of network and big data enriches the information resources in cyberspace .However , the diversity and the rapid grow th of data bring pressure and challenge to the storage and the effective utilization of information resources .A text fingerprint extraction method based on latent semantic analysis was presented in this paper .The proposed method is a compression representation of data information ,and it is an improvement on the semantic deficiency of current fingerprint extraction methods .By this method ,the semantic latent semantic features of document were obtained using singular value decomposition ,and furthermore ,the original document vector space was transformed into the corresponding latent semantic space .Finally ,according to the random hyperplane princi-ple ,the document in the space was transformed into binary digital fingerprint ,and the difference between finger-prints was measured by Hamming distance .The proposed method was verified by the similarity experiments and clustering experiments with the academic literature from CNKI .The experimental results show that the method can better characterize the semantic information of the document with accurate and effective compressed representation .
What problem does this paper attempt to address?