Abstract:Hand-face interactions play a key role in many everyday tasks, providing insights into user habits, behaviors, intentions, and expressions. However, existing wearable sensing systems often struggle to track these interactions in daily settings due to their reliance on multiple sensors or privacy-sensitive, vision-based approaches. To address these challenges, we propose WristSonic, a wrist-worn active acoustic sensing system that uses speakers and microphones to capture ultrasonic reflections from hand, arm, and face movements, enabling fine-grained detection of hand-face interactions with minimal intrusion. By transmitting and analyzing ultrasonic waves, WristSonic distinguishes a wide range of gestures, such as tapping the temple, brushing teeth, and nodding, using a Transformer-based neural network architecture. This approach achieves robust recognition of 21 distinct actions with a single, low-power, privacy-conscious wearable. Through two user studies with 15 participants in controlled and semi-in-the-wild settings, WristSonic demonstrates high efficacy, achieving macro F1-scores of 93.08% and 82.65%, respectively.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem that existing wearable devices are difficult to accurately track hand - face interactions in daily environments. Specifically, hand - face interactions (such as adjusting glasses, wiping the chin, scratching an itch, or covering the mouth when coughing, etc.) play a crucial role in many daily tasks and can provide important information about users' habits, behaviors, intentions, and expressions. However, existing wearable sensing systems usually rely on multiple sensors or vision - based methods, and these methods have the following challenges:
1. **Dependence on multiple sensors**: In order to accurately identify hand - face interactions, multiple sensors usually need to be installed on the head and arms, which increases the complexity and cost of the system.
2. **Privacy issues**: Vision - based methods (such as using cameras) perform well in a controlled environment, but they will cause privacy issues in daily life and have high energy consumption.
3. **Complex postures and movements**: Hand - face interactions involve complex postures and movements, and a single wrist - or head - worn device is difficult to capture the dynamic changes of the hand and face simultaneously.
To solve these problems, the authors propose **WristSonic**, a wrist - based active acoustic sensing system that captures the movements of the hand, arm, and face through ultrasonic reflection data, thereby achieving fine - grained detection of hand - face interactions. The main goals of WristSonic are:
- **Multi - part tracking with a single device**: Only one low - power, privacy - friendly wearable device can be used to simultaneously capture the dynamic changes of the hand and face.
- **Fine - grained hand - face interaction recognition**: It can distinguish a variety of subtle hand - face interaction actions, such as tapping the temple, brushing teeth, nodding, etc.
- **High precision and robustness**: Verified by two user studies, WristSonic achieved macro F1 scores of 93.08% and 82.65% in the laboratory environment and semi - natural environment respectively.
### Summary
The core problem of the paper is to develop a wearable device that can efficiently and accurately track hand - face interactions in daily environments, in order to overcome the limitations of existing technologies in multi - sensor dependence, privacy protection, and complex motion capture. WristSonic has successfully achieved this goal through active acoustic sensing technology and deep learning models.