Abstract:Smartglasses, in addition to their visual-output capabilities, often contain acoustic sensors for receiving the user's voice. However, operation in noisy environments may lead to significant degradation of the received signal. To address this issue, we propose employing an acoustic sensor array which is mounted on the eyeglasses frames. The signals from the array are processed by an algorithm with the purpose of acquiring the user's desired near-filed speech signal while suppressing noise signals originating from the environment. The array is comprised of two AVSs which are located at the fore of the glasses' temples. Each AVS consists of four collocated subsensors: one pressure sensor (with an omnidirectional response) and three particle-velocity sensors (with dipole responses) oriented in mutually orthogonal directions. The array configuration is designed to boost the input power of the desired signal, and to ensure that the characteristics of the noise at the different channels are sufficiently diverse (lending towards more effective noise suppression). Since changes in the array's position correspond to the desired speaker's movement, the relative source-receiver position remains unchanged; hence, the need to track fluctuations of the steering vector is avoided. Conversely, the spatial statistics of the noise are subject to rapid and abrupt changes due to sudden movement and rotation of the user's head. Consequently, the algorithm must be capable of rapid adaptation. We propose an algorithm which incorporates detection of the desired speech in the time-frequency domain, and employs this information to adaptively update estimates of the noise statistics. Speech detection plays a key role in ensuring the quality of the output signal. We conduct controlled measurements of the array in noisy scenarios. The proposed algorithm preforms favorably with respect to conventional algorithms.

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Directional Source Separation for Robust Speech Recognition on Smart Glasses

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

Near-field signal acquisition for smartglasses using two acoustic vector-sensors

Directional Sound-Capture System with Acoustic Array Based on FPGA

Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.

Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

MISAR: A Multimodal Instructional System with Augmented Reality

GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting

Multi-channel Conversational Speaker Separation via Neural Diarization

Decoding Silent Speech Commands from Articulatory Movements Through Soft Magnetic Skin and Machine Learning