Bidirectional self-adaptive resampling in internet of things big data learning

Weihong Han,Zhihong Tian,Zizhong Huang,Shudong Li,Yan Jia
DOI: https://doi.org/10.1007/s11042-018-6938-9
IF: 2.577
2018-12-05
Multimedia Tools and Applications
Abstract:This paper focuses on the problem of low learning algorithm accuracy caused by serious imbalance of big data in Internet of Things, and proposes a bidirectional self-adaptive resampling algorithm for imbalanced big data. Based on the sizes of data sets and imbalance ratios inputted by the user, the algorithm will process the data using a combination of oversampling for minority class and distribution sensitive undersampling for majority class. This paper proposes a new distribution-sensitive resampling algorithm. According to the distribution of samples, the majority and minority samples are divided into different categories, and different processing methods are adopted for the samples with different distribution characteristics The algorithm makes the sample set after resampling keep the same characteristics with the original data set as much as possible. The algorithm emphasizes the importance of boundary samples, that is, the samples at the boundary of majority classes and minority classes are more important than other samples for learning algorithm. The boundary minority samples will be copied, and the boundary majority samples will be reserved. Real-world application is introduced in the end, which shows that compared with the existing imbalanced data resampling algorithms, this algorithm improves the accuracy of learning algorithm, especially for the accuracy and recall rate of minority class.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?