MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

Saif Mahmud,Devansh Agarwal,Ashwin Ajit,Qikang Liang,Thalia Viranda,Francois Guimbretiere,Cheng Zhang
DOI: https://doi.org/10.1145/3675095.3676619
2024-08-03
Abstract:We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning pipeline to classify six actions: hand-to-mouth movements for food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained study with 12 participants, MunchSonic achieved a 93.5% macro F1-score in a user-independent evaluation with a 2-second resolution in tracking these actions, also demonstrating its effectiveness in tracking eating episodes and food intake frequency within those episodes.
Human-Computer Interaction,Emerging Technologies
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to achieve precise tracking and recognition of fine-grained eating behaviors (such as food intake, chewing, drinking, etc.) in an unrestricted environment through an active acoustic sensing system integrated into glasses. Current wearable devices can recognize eating events but have limitations in distinguishing specific eating actions (such as chewing or hand-to-mouth movements). The MunchSonic system aims to overcome this challenge by installing active acoustic sensors on glasses to capture subtle movements of body parts (especially the mouth, jaw, and arms), thereby achieving high-precision recognition of these fine-grained eating behaviors. Specifically, MunchSonic addresses this problem in the following ways: 1. **Active Acoustic Sensing**: The system emits inaudible ultrasonic waves through speakers on the glasses frame and receives the reflected signals through microphones, generating a 2D echogram to capture detailed positions and movements of body parts. 2. **Deep Learning Framework**: A lightweight deep learning model is used to process the echogram data, classifying six behaviors: hand-to-mouth food intake, chewing, drinking, talking, face touching, and other activities (null class). 3. **User Study**: In an unrestricted user study involving 12 participants, MunchSonic achieved a macro F1 score of 93.5% in independent user evaluations, demonstrating its effectiveness in tracking fine-grained eating behaviors. Through these methods, MunchSonic not only can recognize eating moments but also further distinguish specific eating actions, providing strong support for dietary behavior assessment, chronic disease management, and overall health.