TUHTC: Two-Stage Unsupervised Hierarchical Text Classification of Power Grid Data Assets

Shenglong Liu,Yixin Li,Honggang Wang,Bing An,Jiasong Chen,Weining Shi
DOI: https://doi.org/10.1142/s0129156424401244
2024-10-01
International Journal of High Speed Electronics and Systems
Abstract:International Journal of High Speed Electronics and Systems, Ahead of Print. Data classification and grading provide the foundation and guidance for information security management, and the construction of data security system is inseparable from this cornerstone. In recent years, the high-frequency use and complex interactions of power grid data have made the data security management increasingly in need of automated, efficient and credible data classification and grading. The characteristics of power grid data assets, such as sensitivity, complexity and multi-dimensions, exacerbate the challenges of the task. Due to the scattered businesses and complex categories of existing data, experts in a single field have limited knowledge and cannot label all data in all the fields, which also makes data labeling very difficult. To address the aforementioned issues, we present a two-stage unsupervised methodology for hierarchical text classification (TUHTC). Initially, we leverage the semantic information inherent in hierarchical labels for data augmentation of annotated datasets. Subsequently, we enhance the semantic embedding of labels to facilitate the effective classification of data. We conducted experimental verification using database information in authentic business scenarios, thereby validating the efficacy of the proposed methodology.
What problem does this paper attempt to address?