On-line supervised theme-modeling and evolution-analyzing method

Jian Shao,Yin Zhang,Hongkai Ren,Fei Wu
2012-01-01
Abstract:The invention discloses an on-line supervised theme-modeling and evolution-analyzing method. The method comprises the following steps: (1) that news texts are downloaded from news media websites and are divided according to certain time granularity; (2) that word segmentation processing is carried out for news texts in each time period, and vocabulary are selected and updated according to word frequencies; (3) that text features are extracted to form a relational matrix between words and texts and to compose input of an on-line supervised theme model; (4) that the on-line supervised theme model is established, wherein the on-line supervised theme-modeling method is used to detect themes of data set in each time granularity to acquire a distribution matrix of words about theme and a distribution matrix of themes on texts; and (5) that a Jensen-Shannon divergence is used to carry out evolution analysis for themes acquired in step (4) and to calculate attributes of each theme, in order to acquire evolution processes of each theme. The method provided in the invention fully utilizes time and classification information of data itself, improves accuracy of theme mining, and effectively analyzes evolution processes of themes by combing classification information.
What problem does this paper attempt to address?