Abstract:A probabilistic method is proposed for detecting outliers from multivariate data.Distortion in statistics of geo-parameters due to outliers is taken into account.Statistical uncertainty due to sparsity of site-specific data is quantified by BML.The proposed approach is illustrated and verified using simulated and real data.Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances (i.e., outliers) that do not conform with the expected pattern of regular data instances. With sparse multivariate data obtained from geotechnical site investigation, it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity. This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation. The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5. It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts, rationally, for the statistical uncertainty by Bayesian machine learning. Moreover, the proposed approach also suggests an exclusive method to determine outlying components of each outlier. The proposed approach is illustrated and verified using simulated and real-life dataset. It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner. It can significantly reduce the masking effect (i.e., missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty). It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification. This emphasizes the necessity of data cleaning process (e.g., outlier detection) for uncertainty quantification based on geoscience data.<ol class="links-for-figure"><li><a class="anchor download-link u-font-sans" href="https://ars.els-cdn.com/content/image/1-s2.0-S1674987120300918-fx1_lrg.jpg">Download : Download high-res image (451KB)</a></li><li><a class="anchor download-link u-font-sans" href="https://ars.els-cdn.com/content/image/1-s2.0-S1674987120300918-fx1.jpg">Download : Download full-size image</a></li></ol>

Markov Boundary-Based Outlier Mining

Robust Subspace Outlier Detection in High Dimensional Space

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks.

Mining Distance-Based Outliers from Large Databases in Any Metric Space

Provable Self-Representation Based Outlier Detection in a Union of Subspaces

MCODE: Multivariate Conditional Outlier Detection

A Novel Outlier Detection Method for Multivariate Data

Outlier Detection and Spatial Analysis Algorithms

Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data

Query-Based Outlier Detection in Heterogeneous Information Networks

A neighborhood weighted-based method for the detection of outliers

Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning

Robust Multi-Kernel Nearest Neighborhood for Outlier Detection

Clustering With Outlier Removal

A fast MST-inspired kNN-based outlier detection method

An outlier map for Support Vector Machine classification

Unsupervised Parameter-free Outlier Detection using HDBSCAN* Outlier Profiles

Community-based Outlier Detection for Edge-attributed Graphs

Data Mining Based Outlier Cluster Detection Algorithm

Cascade Subspace Clustering for Outlier Detection

Multiscale Feature Attribution for Outliers