Abstract:The correlation of the set of attributes is a crucial statistical value for the measuring of prediction potential present in a dataset. The correlation coefficient, which measures the correlation between the values of two attributes, can be used in order to measure the prediction potential between two-element subsets of a dataset containing a high number of attributes. In this way two common summary visualizations of prediction potential in datasets are formed—correlation matrices and correlation heatmaps. Both of these visualizations are focused on the presentation of correlation between pair of attributes but not much more regarding the context of correlations in the dataset. The main objective of this article is the design and implementation of graphical models usable in a visual representation of data prediction potential—correlation graphs and correlation chains—which emphasize the pseudo-transitivity of prediction potential in a dataset.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficiency of existing visualization methods for predictive potential in datasets. Specifically, although traditional correlation matrices and correlation heatmaps can show the correlations between attribute pairs, they have limitations in presenting the correlation context in the dataset and in discovering patterns and trends in the dataset. Moreover, these methods have poor readability and interpretability when dealing with large or high - dimensional datasets. To overcome these problems, the paper designs and implements two new graphical models - correlation graphs and correlation chains - to represent the predictive potential in the dataset more effectively and to emphasize the pseudo - transitivity of the predictive potential in the dataset. These models are based on graph theory principles and are designed to help analysts discover patterns and trends when analyzing data. ### Main contributions: 1. **Propose original visualization models**: These models are suitable for correlation analysis and subsequent predictive analysis of large and multi - dimensional datasets. Based on graph theory principles, they are called correlation graphs and correlation chains. 2. **Implement the proposed graphical representations**: These graphical representations of predictive potential are implemented in the form of freely available Python code. 3. **Evaluate the proposed graphical models**: These models are evaluated on two datasets of different sizes and structures, one is the standard Iris dataset, and the other is the original graph - attribute dataset containing multiple attributes and records. ### Specific methods: - **Correlation graph**: Represent the correlations in the dataset by constructing an undirected weighted graph, where each node corresponds to an attribute and the weight of an edge represents the correlation coefficient between two attributes. Simplify the complexity of the graph through a two - stage pruning method (selecting edges with the maximum correlation value and setting a correlation threshold). - **Correlation chain**: Extract a sub - graph from the correlation graph, which contains edges with correlations greater than the set threshold, further emphasizing the pseudo - transitivity of the predictive potential in the dataset. ### Evaluation results: - **Iris dataset**: Correlation graphs and correlation chains can clearly show the direct and indirect influences between various attributes, especially when predicting the types of flowers. - **Cubic - graph - attribute dataset**: When dealing with larger and more complex datasets, correlation graphs and correlation chains still maintain good readability and interpretability and can effectively identify the correlations between attributes. In conclusion, through the design and implementation of new graphical models, this paper solves the deficiencies of traditional correlation visualization methods in dealing with large and high - dimensional datasets and provides more effective tools for data analysis.

Graphical representation of data prediction potential: correlation graphs and correlation chains

Expansion of net correlations in terms of partial correlations

Temporal Attribute Prediction via Joint Modeling of Multi-Relational Structure Evolution

Multivariate Prediction for Learning in Relational Graphs

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Identifying Graphical Models

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

On Correlation and Prediction Interval Reduction

Measuring and Discovering Correlations in Large Data Sets

Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure

Azadkia-Chatterjee's correlation coefficient adapts to manifold data

Residual Correlation in Graph Neural Network Regression

Improved Approximation and Visualization of the Correlation Matrix

Correlation Coefficients: Appropriate Use and Interpretation

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data

Qgraph: Network visualizations of relationships in psychometric data

MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering

Identification of Latent Variables From Graphical Model Residuals

Introducing Gaussian covariance graph models in genome-wide prediction

Copula-based statistical dependence visualizations

Evaluating Financial Relational Graphs: Interpretation Before Prediction