REBDT: A regular expression boundary-based decision tree model for Chinese logistics address segmentation

Guangming Ling,Aiping Xu,Chao Wang,Jie Wu
DOI: https://doi.org/10.1007/s10489-022-03511-6
IF: 5.3
2022-07-12
Applied Intelligence
Abstract:Chinese logistics address segmentation is a specific domain of the address resolution, which is very challenging due to language, culture, user privacy, business value, etc. Although deep learning can effectively solve problems where traditional segmentation methods are overly dependent on domain knowledge, it faces the dilemma of costly manual labeling. In this context, a decision tree model based on regular expression boundaries is proposed, which requires no additional data and manual labeling. First, different from traditional methods of describing the entire address elements, a regular expressions rule library (RERL) is constructed, which only describes the boundaries of address elements. Second, the binary split attribute is defined according to the boundary matching algorithm based on RERL. A decision tree model is then constructed concerning the distribution law of address element types to segment an address and to evaluate its effect. The final experimental results demonstrate the improvement of our model and further substantiate that our proposal can provide a high-quality labeling training set for deep learning models without any professional domain knowledge, even if in low-resource scenarios.
computer science, artificial intelligence
What problem does this paper attempt to address?