Application of cosine similarity in university comprehensive information system

Hao Zhu,Defu Lian,Zhihong Zuo,Kai Yan
DOI: https://doi.org/10.3969/j.issn.1001-0505.2017.S1.024
2017-01-01
Abstract:Aiming at the data problem of the academic papers filled by the teachers in the compre-hensive information system of University of Electronic Science and Technology of China,a solution to find the standard journal names or the conference names by calculating the cosine similarity is presented.First,the filled names are pretreated and the names crawled from the Internet are cleaned, and then the test names are generated.Through a classic TF-IDF method,all of the test names and the standard journal names are divided into words and the stop words of the names are removed. Then the words are taken from the names.After the TF-IDF value of every words is calculated,all of the test names and the standard journal names are converted into multidimensional vectors consis-ting of the TF -IDF value of every words.By calculating the cosine similarity between the test names and the standard journal names,the correct standard journal names are identified.The identi-fication results show that the cosine similarity calculation can improve the quality of the filled data for the academic papers.
What problem does this paper attempt to address?