Using Spearman's correlation coefficients for exploratory data analysis on big dataset
Chengwei Xiao,Jiaqi Ye,Rui Máximo Esteves,Chunming Rong
DOI: https://doi.org/10.1002/cpe.3745
2015-12-18
Concurrency and Computation: Practice and Experience
Abstract:Correlation analysis is both popular and useful in a number of social networking research, particularly in the exploratory data analysis. In this paper, three well‐known and often‐used correlation coefficients, Pearson product–moment correlation coefficient, Spearman, and Kendall rank correlation coefficients, are compared from definition to application domain. Based on the characteristics of the pump's vibration dataset, the nonparametric and distribution‐free Spearman rank correlation coefficient is introduced to analyze the relationship between the pump's working state and each of the 207′880 variables. The percentage of variables and exact variables' tables with high Spearman's correlation coefficients for states I and II, states I and III, states II and III, and three states in different files are obtained respectively, which has important valuation for the future research of the unsupervised machine learning system. Copyright © 2015 John Wiley & Sons, Ltd.