Inverse Sampling of Degenerate Datasets from a Linear Regression Line

Albert S. Kim
DOI: https://doi.org/10.48550/arXiv.2108.11477
2021-08-26
Abstract:When linear regression generates a relationship between a (dependent) scalar response and one or multiple independent variables, various datasets providing distinct graphical trends can develop resembling relationships based on the same statistical properties. Advanced statistical approaches, such as neural networks and machine learning methods, are of great necessity to process, characterize, and analyze these degenerate datasets. On the other hand, the accurate creation of purposedly degenerate datasets is essential to test new models in the research and education of applied statistics. In this light, the present study characterizes the famous Anscombe datasets and provides a general algorithm for creating multiple paired datasets of identical statistical properties.
Methodology
What problem does this paper attempt to address?