DN3: An open-source Python library for large-scale raw neurophysiology data assimilation for more flexible and standardized deep learning

Demetres Kostas,Frank Rudzicz
DOI: https://doi.org/10.1101/2020.12.17.423197
2020-12-17
Abstract:Abstract We propose an open-source Python library, called DN3, designed to accelerate deep learning (DL) analysis with encephalographic data. This library focuses on making experimentation rapid and reproducible and facilitates the integration of both public and private datasets. Furthermore, DN3 is designed in the interest of validating DL processes that include, but are not limited to, classification and regression across many datasets to prove capacity for generalization. We explore the effectiveness of this library by presenting a general scheme for person disambiguation called T-Vectors inspired by speech recognition. These are single vectors created by typically short, though arbitrary in length, electro-encephalographic (EEG) data sequences that uniquely identify users relative to others. T-Vectors were trained by classifying nearly 1000 people using as little as 1 second-long sequences and generalize effectively to users never seen during training. Generalized performance is demonstrated on two commonly used and publicly accessible motor imagery task datasets, which are notorious for intra- and inter-subject signal variability. According to these datasets, subjects can be identified with accuracies as high as 97.7% by simply adopting the label of the nearest neighbouring T-Vectors, with no dependence on task performed and little dependence on recording session, even when sessions are separated by days. Visualization of the T-Vectors from both datasets show no conflation of subjects between datasets, and indicates a T-Vector manifold where subjects cluster well. We first conclude that this is a desirable paradigm shift in EEG-based biometrics and secondly that this manifold deserves further investigation. Our proposed library provides a variety of essential tools that facilitated the development of T-Vectors. The T-vectors codebase serves as a template for future projects using DN3, and we encourage leveraging our provided model for future work. Author summary We present a new Python library to train deep learning (DL) models with brain data. This library is tailored, but not limited, to developing neural networks for brain-computer-interfaces (BCI) applications. There is abundant interest in leveraging DL in the wider neuroscience community, but we have found current solutions limiting. Furthermore both BCI and DL benefit from benchmarking against multiple datasets and sharing parameters. Our library tries to be accessible to DL novices, yet not limiting to experts, while making experiment configurations more easily shareable and flexible for benchmarking. We demonstrated many of the features of our library by developing a deep neural network capable of disambiguating people from arbitrary lengths of electroencephalography data. We identify a variety of future avenues of study for these representations produced by our network, particularly in biometric applications and addressing the variation in BCI classifier performance. We share our model, library and its associated guides and documentation with the community at large.
What problem does this paper attempt to address?