Abstract:Geographical information explodes with the emergence of Internet, which also adopts brand new ideas to obtain geospatial data with traditional GIS methods. With the abundant geospatial information on the web, we proposed a toponym co-occurrences network model by extracting the toponym entities from web page texts using nature language process methods, as well as uniforming the toponyms, in order to conduct a comprehensive analysis of the web pages. The network set up in this paper is a weighted directed graph, of which every vertex represents a distinct toponym, and the co-occurrence of each two toponyms is displayed as one edge of this network. The frequency of geographic names is taken into consideration synthetically, which shows the weight of each network edge, as well as explains the co-occurrence relationship and transformation occurrence characteristics of those toponyms. On this basis, a method of toponym extraction from web page texts based on link analysis is carried out, taking advantage of the PageRank algorithm to calculate the link weight of every toponym in the co-occurrence network and rank each geographical name with a PageRank score. In this way, the importance of the toponym is calculated and the core geographic names with remarkable features or navigation features in all huge network resources can be found. A case study based on the actual data extracted from People's Daily and Sina News Sport web pages is carried out to verify the technical solution, which shows that the proposed solution is both feasible and practically effective, which can also be applied to geographical information retrieval. Results show that the core toponym of co-occurrence network differs in different themes of web pages, and when the time sequence factor is taken into account, the core toponym results may also be different within a single theme of web pages.

Hyponym Extraction from the Web by Bootstrapping

Exploiting Multiple Sources for Open-Domain Hypernym Discovery.

Verification Based on Hyponymy Hierarchical Characteristics for Web-Based Hyponymy Discovery.

Extracting Hyponymy Relation Between Chinese Terms

Hyponymy acquisition from Chinese text by SVM

Chinese Hypernym-Hyponym Extraction from User Generated Categories.

Motif-Based Hyponym Relation Extraction from Wikipedia Hyperlinks

Self-Supervised Synonym Extraction from the Web *

Bootstrapping Large-scale Named Entities Using URL-Text Hybrid Patterns.

Extracting hyponymy relation between Chinese terms based on term types'commonality and sequential patterns

Domain Hyponymy Hierarchy Discovery by Iterative Web Searching and Inferable Semantics Based Concept Selecting

A Hybrid Method for Entity Hyponymy Acquisition in Chinese Complex Sentences.

Predicting Hypernym–hyponym Relations for Chinese Taxonomy Learning

Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web

Using Synonym Relations in Chinese Collocation Extraction.

MOTIF-RE: Motif-Based Hypernym/Hyponym Relation Extraction from Wikipedia Links

Bootstrapping for Extracting Relations from Large Corpora

Extract Core Toponyms from Web Page Text Based on Link Analysis

HYPERNYMY EXTRACTION WITH HYBRID TEXT KERNEL

The Bootstrapping Based Recognition Of Conceptual Relationship For Text Retrieval

Bootstrapping Information Extraction Via Conceptualization