PyPulse: A Python Library for Biosignal Imputation

Kevin Gao,Maxwell A. Xu,James M. Rehg,Alexander Moreno
2024-12-09
Abstract:We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. Missingness is commonplace in these settings and can arise from multiple causes, such as insecure sensor attachment or data transmission loss. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers. Specifically, its new capabilities include using pre-trained imputation methods out-of-the-box on custom datasets, running the full workflow of training or testing a baseline method with a single line of code, and comparing baseline methods in an interactive visualization tool. We released PyPulse under the MIT License on Github and PyPI. The source code can be found at: <a class="link-external link-https" href="https://github.com/rehg-lab/pulseimpute" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the widespread issue of missing values in biosignal data in clinical and wearable - sensor environments. Specifically, these problems include: 1. **Missing of biosignal data**: Due to reasons such as sensor loosening and data - transmission loss, missing values often occur in biosignal data. Such missing values pose a great challenge to health monitoring and subsequent analysis. 2. **Limitations of existing imputation methods**: Traditional imputation methods (such as mean filling or linear interpolation) cannot fully handle the unique characteristics of pulse signals, such as their quasi - periodicity and specific morphological features, which are very important for clinical significance. 3. **Lack of large - scale public datasets**: Existing research lacks large - scale public datasets with real - world missing patterns, which hinders the development and testing of advanced imputation methods. 4. **Poor scalability of existing tools**: Although some previous works (such as PulseImpute) have provided valuable insights, their codebases do not provide APIs, and the software stacks are highly specialized and difficult to be extended to user - provided custom datasets. To solve these problems, the paper introduces PyPulse, a Python library for biosignal imputation. The main features of PyPulse include: - **Modular and extensible framework**: It provides flexible and easy - to - use configuration files and supports custom datasets and missing mechanisms. - **Pre - trained models**: Pre - trained imputation methods can be directly used on custom datasets. - **Simplified workflow**: The complete training or testing process can be run with a single line of code. - **Interactive visualization tools**: The imputation results of different baseline methods can be compared, helping researchers better understand the effects of various methods. Through these functions, PyPulse aims to help health and machine - learning researchers quickly compare and improve imputation algorithms, especially for applications on custom datasets.