Model-Agnostic Utility-Preserving Biometric Information Anonymization

Chun-Fu Chen,Bill Moriarty,Shaohan Hu,Sean Moran,Marco Pistoia,Vincenzo Piuri,Pierangela Samarati
2024-05-24
Abstract:The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experiences and deeper business insights, the use of biometrics has raised serious privacy concerns due to their intrinsic sensitive nature and the accompanying high risk of leaking sensitive information such as identity or medical conditions.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using biometric data (such as fingerprints, voices, retina/face scans, or gait/movement/gesture data) for identity verification, health monitoring, or other complex analyses, how to preserve the useful features of the data while protecting sensitive information (such as identity or medical conditions). Specifically, the author proposes a new modality - independent data transformation framework that can suppress sensitive attributes in biometric data while retaining valuable features relevant to downstream machine - learning analyses. ### Core of the Problem 1. **Privacy Protection**: Biometric data is highly sensitive, and once leaked, it may expose sensitive information such as personal identity or medical conditions. 2. **Data Utility**: When performing various analysis tasks, such as sentiment analysis, activity recognition, etc., it is still necessary to retain the useful features in the data to ensure the accuracy of the analysis results. ### Solution The author proposes a novel modality - independent data transformation framework, aiming to solve the problem in the following ways: - **Sensitive Attribute Suppression**: Through specific technical means, sensitive information (such as identity information) cannot be extracted from the transformed data. - **Utility Preservation**: Ensure that the transformed data can still be used for valuable analysis tasks and maintain a high accuracy rate. ### Experimental Evaluation To verify the effectiveness of the proposed method, the author conducted extensive experimental evaluations using publicly available face, voice, and motion datasets. The results show that this framework can effectively suppress sensitive information while maintaining the utility of the data, thus providing a reliable basis for subsequent analyses. ### Key Formulas The paper defines several key concepts and formulas to measure the effect of data transformation: - **Utility \( U \)**: The collective identification accuracy rate of the transformed data for the interested attributes and additional attributes: \[ U(T(D)) = P(T(D))+\sum_{i = 1}^{n}\alpha_iQ_i(T(D)) \] where \( T(D) \) is the transformed biometric dataset, \( \{ \alpha_n \} \) are the additional attribute weights input by the user, and \( P(·) \) and \( \{ Q_n(·) \} \) are the corresponding identification models. - **Confusion Degree \( M \)**: The confusion degree of the trained sensitive - attribute classification model on the transformed data: \[ M(T(D)) = 1 - S(T(D)) \] where \( S(·) \) is the sensitive - attribute classification model. By maximizing \( U \) and \( M \), the optimal anonymization transformation method \( T^*(·) \) can be found, so that the transformed data can completely confuse the sensitive - attribute classification model and reliably extract useful attributes. ### Summary The main contribution of this paper is the proposal of a brand - new framework, which for the first time introduces the concept of utility preservation in general biometric information anonymization based on ML, and verifies its effectiveness and practicality through experiments.