Abstract:With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.

Heuristic Learning of Rules for Information Extraction from Web Documents

Automatic Web Information Extraction Based On Rules

Extracting method knowledge elements from scientific literature: A rule‐based approach

Extraction Rule Language for Web Information Extraction and Integration

Reasoning Makes Good Annotators : an Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction

A Rule-Based Information Extraction System for Human-Readable Semi-Structured Scientific Documents

KICE: A Knowledge Consolidation and Expansion Framework for Relation Extraction.

Domain Term Extraction Method Based on Hierarchical Combination Strategy for Chinese Web Documents

Research on Automated Web Navigation and Data Integration Rules for Web Infor-mation Extraction

Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning

Attribute Value Extraction Based on Rule Matching

DRTE:A Term Extraction Method for K12 Education

Ruleformer: Context-aware Rule Mining over Knowledge Graph.

A web text classification rules extraction algorithm

Adaptive Ordered Information Extraction with Deep Reinforcement Learning

Automatic Extraction Rules Generation Based On Xpath Pattern Learning

Rule extraction based on linguistic-valued intuitionistic fuzzy layered concept lattice

Automated Text Data Extraction Based on Unsupervised Small Sample Learning

Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web

Knowledge Extraction: Automatic Classification of Matching Rules

A Hybrid Rule Extraction Method Using Rough Sets and Neural Networks