Improved Inference for Imputation-Based Semisupervised Learning Under Misspecified Setting
Shaogao Lv,Linsen Wei,Qian Zhang,Bin Liu,Zenglin Xu
DOI: https://doi.org/10.1109/tnnls.2021.3077312
IF: 14.255
2021-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Semisupervised learning (SSL) has been extensively studied in related literature. Despite its success, many existing learning algorithms for semisupervised problems require specific distributional assumptions, such as "cluster assumption" and "low-density assumption," and thus, it is often hard to verify them in practice. We are interested in quantifying the effect of SSL based on kernel methods under a misspecified setting. The misspecified setting means that the target function is not contained in a hypothesis space under which some specific learning algorithm works. Practically, this assumption is mild and standard for various kernel-based approaches. Under this misspecified setting, this article makes an attempt to provide a theoretical justification on when and how the unlabeled data can be exploited to improve inference of a learning task. Our theoretical justification is indicated from the viewpoint of the asymptotic variance of our proposed two-step estimation. It is shown that the proposed pointwise nonparametric estimator has a smaller asymptotic variance than the supervised estimator using the labeled data alone. Several simulated experiments are implemented to support our theoretical results.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture