Abstract:This paper addresses the task to devise a statistical estimation procedure in an event where the volume of the array of initial data used in processing is insufficient to correctly determine the parameters of the response function. The object of research is the technology of statistical processing of a small sample of data. The subject of the study is the methods of statistical estimation under conditions of a small sample of initial data. The main direction is to devise a special procedure for statistical processing of a small sample of initial data, which provides a correct statistical estimation of the parameters of the response function. The method for solving the problem is the selection of the most representative orthogonal replica-like subplan from the plan of a complete factorial experiment obtained by artificially orthogonalizing the results of a passive experiment. The necessity and expediency of the proposed procedure is a consequence of the unpredictability and uneven distribution of points in the phase space of coordinates. The result of the implementation of the corresponding procedure is a truncated orthogonal plan of the full factorial experiment, which provides the possibility of independent estimation of all coefficients of the regression polynomial describing the response function. Under conditions of a severe shortage of the number of measurements, the procedure makes it possible to isolate a representative orthogonal replica from the resulting plan of a complete factorial experiment. Using this subplan of the full factorial experiment plan makes it possible to evaluate all the coefficients of the regression polynomial that describes the desired response function. The corresponding computational procedure is based on solving the triaxial Boolean assignment problem

Subdata selection based on orthogonal array for big data

Orthogonal Subsampling for Big Data Linear Regression

Distributed Privacy-Aware Fast Selection Algorithm for Large-Scale Data.

Projection-Uniform Subsampling Methods for Big Data

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Sample size determination for multidimensional parameters and the A-optimal subsampling in a big data linear regression model

A sub-sampling algorithm preventing outliers

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

A review on design inspired subsampling for big data

Optimal design subsampling from Big Datasets

Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models

Statistical processing of a small sample of raw data using artificial orthogonalisation technology

Robust optimal subsampling based on weighted asymmetric least squares

Optimal subdata selection for linear model selection

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources

Optimal Subsampling Approaches for Large Sample Linear Regression

Accounting for outliers in optimal subsampling methods

Distributed Successive Measurement Selection Based on Online Sparsity Inference

Independence-Encouraging Subsampling for Nonparametric Additive Models

Subsampling Suffices for Adaptive Data Analysis