Character Embedding-Based Bi-LSTM for Zircon Similarity Calculation with Clustering.

Hu Xiangben,Hu Zhichen,Jiang Jielin,Xue Weiwei,Hu Xiumian,Xu Xiaolong
DOI: https://doi.org/10.1007/s12145-022-00847-y
2022-01-01
Earth Science Informatics
Abstract:Similarity calculations for zircons are vital to topical issues in sedimentology, such as provenance analysis, dating of sediment and identification of geotectonic effects. In general, zircon data is stored in a table where each column represents a key-value pair. According to the semantics of the keys, multiple tables are merged to extract data for analyzing the variability of single feature. However, there are conflicts between the different indicators due to sedimentation, which leads to inaccuracy of similarity. Moreover, unknown and semantically ambiguous keys are not recognized by the knowledge base, which results in the inefficiency of aggregating key-value pairs. Therefore, this paper proposed a Fast Much zircon (FM-zircon) framework that combines natural language processing (NLP) and multidimensional scaling (MDS) for calculating the similarity of zircons. First, NLP classifies keys by extracting semantic features. After the key-value pairs with the same key are fused, MDS is implemented to calculate multiple features. Ultimately, the results are represented in a visual representation To evaluate the performance, experiments were performed with zircon tables, that showed the good performance of FM-zircon.
What problem does this paper attempt to address?