Ui-Ear: On-face Gesture Recognition Through On-ear Vibration Sensing

Guangrong Zhao,Yiran Shen,Feng Li,Lei Liu,Lizhen Cui,Hongkai Wen
DOI: https://doi.org/10.1109/tmc.2024.3480216
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:With the convenient design and prolific functionalities, wireless earbuds are fast penetrating in our daily life and taking over the place of traditional wired earphones. The sensing capabilities of wireless earbuds have attracted great interests of researchers on exploring them as a new interface for human-computer interactions. However, due to its extremely compact size, the interaction on the body of the earbuds is limited and not convenient. In this paper, we propose Ui-Ear, a new on-face gesture recognition system to enrich interaction maneuvers for wireless earbuds. Ui-Ear exploits the sensing capability of Inertial Measurement Units (IMUs) to extend the interaction to the skin of the face near ears. The accelerometer and gyroscope in IMUs perceive dynamic vibration signals induced by on-face touching and moving, which brings rich maneuverability. Since IMUs are provided on most of the budget and high-end wireless earbuds, we believe that Ui-Ear has great potential to be adopted pervasively. To demonstrate the feasibility of the system, we define seven different on-face gestures and design an end-to-end learning approach based on Convolutional Neural Networks (CNNs) for classifying different gestures. To further improve the generalization capability of the system, adversarial learning mechanism is incorporated in the offline training process to suppress the user-specific features while enhancing gesture-related features. We recruit 20 participants and collect a realworld datasets in a common office environment to evaluate the recognition accuracy. The extensive evaluations show that the average recognition accuracy of Ui-Ear is over 95% and 82.3% in the user-dependent and user-independent tasks, respectively. Moreover, we also show that the pre-trained model (learned from user-independent task) can be fine-tuned with only few training samples of the target user to achieve relatively high recognition accuracy (up to 95%). At last, we implement the personalization and recognition components of Ui-Ear on an off-the-shelf Android smartphone to evaluate its system overhead. The results demonstrate Ui-Ear can achieve real-time response while only brings trivial energy consumption on smartphones
What problem does this paper attempt to address?