Research of an improved algorithm for Chinese word segmentation dictionary based on double-array Trie-tree

Yang Wen-chua
Abstract:Chinese word segmentation dictionary based on the double-array Trie-tree has higher search efficiency,but the dynamic insertion consumes a lot of time.Therefore,an improved algorithm(iDAT)based on double-array Trie-tree for Chinese word segmentation dictionary is proposed.The nodes with more branches are handled while the original dictionary is being initialized.After the initialization,a Hash process is performed on the index values of empty sequence in base array.The final Hash table stores the sum of the empty sequences before the current empty sequence.After that,the iDAT is used to carry out the dynamic insertion process.This algorithm adopts Sunday jumps algorithm of single pattern matching.With the reasonable increasement of space,it reduces the the average time complexity of the dynamic insertion process in Trie-tree.Practical results show it has good operation performance.
Computer Science
What problem does this paper attempt to address?