Power Law Based Foundation for the Measurement of Discrimination Information for Human Knowledge Representation

Zheng Xu,Xiangfeng Luo,Yunhuai Liu,Lin Mei,Chuanping Hu
DOI: https://doi.org/10.1016/j.future.2016.10.021
IF: 7.307
2016-01-01
Future Generation Computer Systems
Abstract:The discrimination information (DI) of keyword plays an important role in information retrieval and data mining. However, the measurement of DI is still a challenge because the existing methods cannot leverage the contradiction between accuracy and complexity. In this paper, a new model is proposed, does not need any prior knowledge and the computing complexity is O(nm) for a collection of m documents with n keywords. Firstly, we define three types of keywords according to the document frequency spectrum, which divides the spectrum of keywords into two monotonically spectrums that can give a qualitative analysis of DI. Secondly, in order to decrease the complexity, the power law function of keywords’ document frequencies is built. Thirdly, we propose an algorithm to classify keywords by using the distances between the adjacent points on the linear regression line. Finally, a piecewise function is used for computing DI according to the monotonically spectrums, which transforms DI into a scalable value to be used directly, thereby reducing the computing complexity of DI significantly. Moreover, a new weighting scheme of keywords based on DI is employed for document clustering, which shows that DI has a good prospect on the information retrieval area.
What problem does this paper attempt to address?