Abstract:Measurement is a typical way of gathering information about an investigated object, generalized by a finite set of characteristic parameters. The result of each iteration of the measurement is an instance of the class of the investigated object in the form of a set of values of characteristic parameters. An ordered set of instances forms a collection whose dimensionality for a real object is a factor that cannot be ignored. Managing the dimensionality of data collections, as well as classification, regression, and clustering, are fundamental problems for machine learning. Compactification is the approximation of the original data collection by an equivalent collection (with a reduced dimension of characteristic parameters) with the control of accompanying information capacity losses. Related to compactification is the data completeness verifying procedure, which is characteristic of the data reliability assessment. If there are stochastic parameters among the initial data collection characteristic parameters, the compactification procedure becomes more complicated. To take this into account, this study proposes a model of a structured collection of stochastic data defined in terms of relative entropy. The compactification of such a data model is formalized by an iterative procedure aimed at maximizing the relative entropy of sequential implementation of direct and reverse projections of data collections, taking into account the estimates of the probability distribution densities of their attributes. The procedure for approximating the relative entropy function of compactification to reduce the computational complexity of the latter is proposed. To qualitatively assess compactification this study undertakes a formal analysis that uses data collection information capacity and the absolute and relative share of information losses due to compaction as its metrics. Taking into account the semantic connection of compactification and completeness, the proposed metric is also relevant for the task of assessing data reliability. Testing the proposed compactification procedure proved both its stability and efficiency in comparison with previously used analogues, such as the principal component analysis method and the random projection method.

Extracting representative information to enhance flexible data queries.

Model Semantic Relations with Extended Attributes

Extending Representative Information Extraction Based on Fuzzy Classification

Introducing Relation Compactness for Generating a Flexible Size of Search Results in Fuzzy Queries

Reduct Algorithm Based on Information Entropy and Rough Set Theory

How “small” Reflects “large”?—representative Information Measurement and Extraction

Mean Descriptor: a Compact Representation for 3D Model's Multiple Features

A Combined Measure for Representative Information Retrieval in Enterprise Information Systems.

Attribute Reduction Based on Improved Information Entropy

An Incremental Approach to Efficiently Retrieving Representative Information for Mobile Search on Web

Small Stochastic Data Compactification Concept Justified in the Entropy Basis

A rough set based clustering algorithm and the information theoretical approach to refine clusters

Finding Representative Set from Massive Data.

Attribute Reduction Based on the Minimum Hybrid Entropy

A heuristic approach for λ-representative information retrieval from large-scale data

Relation strength-aware clustering of heterogeneous information networks with incomplete attributes

Finding an λ-representative subset from massive data

Efficiently Estimating Mutual Information Between Attributes Across Tables

Query-based Instance Discrimination Network for Relational Triple Extraction

Approximate Information Retrieval for Heterogeneity Ontologies

Enriching Relations with Additional Attributes for ER