Towards Ubiquitous Intelligent Hand Interaction

Chen Liang
2023-08-21
Abstract:The development of ubiquitous computing and sensing devices has brought about novel interaction scenarios such as mixed reality and IoT (e.g., smart home), which pose new demands for the next generation of natural user interfaces (NUI). Human hand, benefit for the large degree-of-freedom, serves as a medium through which people interact with the external world in their daily lives, thus also being regarded as the main entry of NUI. Unfortunately, current hand tracking system is largely confined on first perspective vision-based solutions, which suffer from optical artifacts and are not practical in ubiquitous environments. In my thesis, I rethink this problem by analyzing the underlying logic in terms of sensor, behavior, and semantics, constituting a research framework for achieving ubiquitous intelligent hand interaction. Then I summarize my previous research topics and illustrated the future research directions based on my research framework.
Human-Computer Interaction,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Limitations of existing hand tracking systems**: Current hand tracking systems mostly rely on first-person visual solutions, which have problems related to environment, structure, and resolution. This makes it difficult to achieve highly usable natural user interaction in complex environments (such as low-light conditions or cluttered backgrounds). 2. **Inherent defects of visual systems**: Vision-based hand tracking methods are easily affected by environmental factors, such as low light conditions, complex backgrounds, and occlusions. These factors make it difficult for the system to accurately recognize subtle or fast hand gestures. Additionally, visual systems struggle to distinguish real touch events from false positives (such as pretending to touch), because touch events usually occur in a very short time and require extremely high spatial resolution. 3. **Lack of practicality**: Existing visual hand tracking technologies rely on fixed first-person cameras and other auxiliary hardware, which are impractical in outdoor or IoT scenarios, as these devices bring physical and computational burdens. To overcome the above issues, the authors propose a research framework aimed at achieving intelligent hand interaction in a wide range of environments through three stages: sensor, behavior, and semantic modeling. Specifically, the framework focuses on how to use different sensors (such as IMU, RF modules, and acoustic modules) to capture local features of the hand, and combines machine learning methods to improve the accuracy and robustness of hand tracking, thereby supporting more natural and rich gesture input.