Abstract:Chinese domain term extraction is an important content of text knowledge mining. Chinese domain term extraction method with the traditional manual method, this method is time?consuming and laborious. It is currently in Chinese domain term extraction method of automation stage are:dictionary based method, rule?based method and statistical based method. Due to the complexity of Chinese natural language, the automatic extraction method has some limitations, such as the specific areas of the user dictionary and rule updating speed is slow, lack of consider?ation of text feature, which leads to the extraction performance is poor. To solve these problems, this paper presents Chinese domain term extraction methods that compound the text feature and statistics. After coarse grain screening of Chinese words in a document, the method considering the part of speech, word length, boundary text features of the candidate terms, construct information entropy and TFIDF statistics, calculate the comprehensive weight, and the weights are bigger than the set threshold extracted as the final domain terms. The experimental results show that the method gets the good correct rate, recall rate and F?measure under the test corpus.

A method of Web news extraction based on decision tree

Web News Pages Extraction Method Based on DOM and Decision Tree

An efficient method for extracting web news content

Web Information Segmentation Method Based on DOM Structure Tree

Hybrid method for automated news content extraction from the web

Automatic Elements Extraction of Chinese Web News Using Prior Information of Content and Structure

STATISTICS-BASED AUTOMATIC WEB NEWS TEXT EXTRACTION

Learning to Extract Web News Title in Template Independent Way

A Statistical Approach for Content Extraction from Web Page

A Novel Chinese Web News Source Extraction Algorithm

Content Extraction of Web Pages Based on Characteristic Symbols

Title-Based Extraction of News Contents for Text Mining.

Domain Term Extraction Method Based on Hierarchical Combination Strategy for Chinese Web Documents

Web News Extraction Based on Path Pattern Mining

Chinese Web News Source Extraction Algorithm Based On Rules And Region Recognition

Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm

A Template Independent Approach for Web News and Blog Content Extraction

Automatic Web News Extraction with Semantic Features

Keyword Extraction Method Based on Density Clustering for Chinese News Web Pages

STUDY AND IMPLEMENTATION OF DYNAMIC WEB INFORMATION EXTRACTION BASED ON TREE MODEL ALGORITHM

Adaptive Web Information Extraction Based on DOM Tree