Using Big Data From The Web To Train Chinese Traffic Word Representation Model In Vector Space

Wei Li,Xudong Xie,Jianming Hu,Zuo Zhang,Yi Zhang
DOI: https://doi.org/10.1109/WCICA.2016.7578483
2016-01-01
Abstract:Words and texts are particularly important big data sources for intelligent transportation systems. There is relevance between the traffic condition and the text content which people published in the internet within a period of time. In order to predict traffic condition by the text content we need to analysis these words and texts for all kinds of means. Many traditional researches on neuro linguistic program have provided us a variety of methods for processing these words and texts. In this paper we propose a novel method to generate word representation model focus on transportation domain. Big data set of 83.9 G words we collected from the web by web spider. After data mining, we first do word segmentation process then using the segmented word to train a Chinese transportation word model. The proposed representation model we obtained can help to convert Chinese word into vector space which preserve the semantic-syntactic words relationship. Experiments demonstrate that accurate and fast word cluster can improved a lot by the word vector we generated.
What problem does this paper attempt to address?