Rapid hierarchical document querying method

Chen Ke,Wang Weidi,Hu Tianlei,Chen Gang,Wu Sai,Shou Lidan
2017-01-01
Abstract:The invention discloses a rapid hierarchical document querying method. The rapid hierarchical document querying method comprises the following steps: establishing data models for documents of document sets, and carrying out formatting treatment on the documents to obtain a document centroid vector and a document label; taking the generated document centroid vector as a point in high-dimensional vector space, and constructing a hash index structure in a memory for every document set by using a locality sensitive hash method; acquiring a candidate document set in the hash index structure by using a querying method based on locality sensitive hash thinking according to the document centroid vector of a queried text; and acquiring a nearest neighbor document under word movement distance measurement in a candidate document set by using a filtering-thinning hierarchical frame according to the document label of the queried text. When the designed hierarchical querying method is applied to document classification and retrieval, the efficiency and the effect are balanced well, and a target document is acquired rapidly under the accuracy can be guaranteed when the user queries documents under word movement distance measurement.
What problem does this paper attempt to address?