Spatial encoding of BOLD fMRI time series for categorizing static images across visual datasets: A pilot study on human vision

Vamshi K. Kancharala,Debanjali Bhattacharya,Neelam Sinha
2023-09-07
Abstract:Functional MRI (fMRI) is widely used to examine brain functionality by detecting alteration in oxygenated blood flow that arises with brain activity. In this study, complexity specific image categorization across different visual datasets is performed using fMRI time series (TS) to understand differences in neuronal activities related to vision. Publicly available BOLD5000 dataset is used for this purpose, containing fMRI scans while viewing 5254 images of diverse categories, drawn from three standard computer vision datasets: COCO, ImageNet and SUN. To understand vision, it is important to study how brain functions while looking at different images. To achieve this, spatial encoding of fMRI BOLD TS has been performed that uses classical Gramian Angular Field (GAF) and Markov Transition Field (MTF) to obtain 2D BOLD TS, representing images of COCO, Imagenet and SUN. For classification, individual GAF and MTF features are fed into regular CNN. Subsequently, parallel CNN model is employed that uses combined 2D features for classifying images across COCO, Imagenet and SUN. The result of 2D CNN models is also compared with 1D LSTM and Bi-LSTM that utilizes raw fMRI BOLD signal for classification. It is seen that parallel CNN model outperforms other network models with an improvement of 7% for multi-class classification. Clinical relevance- The obtained result of this analysis establishes a baseline in studying how differently human brain functions while looking at images of diverse complexities.
Image and Video Processing,Artificial Intelligence,Computer Vision and Pattern Recognition,Signal Processing
What problem does this paper attempt to address?
The problem this paper attempts to address is how to classify static images from different visual datasets using functional magnetic resonance imaging (fMRI) time series data, in order to understand the functional differences in the brain when processing visual information of varying complexity. Specifically, the paper utilizes the publicly available BOLD5000 dataset, which contains fMRI scan data of subjects viewing 5254 images from three standard computer vision datasets (COCO, ImageNet, and SUN). The main goal of the study is to hypothesize whether the brain's functional activity differs when processing images with varying complexity, object scales, and backgrounds. To achieve this goal, the paper employs Gramian Angular Field (GAF) and Markov Transition Field (MTF) methods to encode 1D fMRI time series into 2D representations, thereby capturing spatial information. These 2D features are then fed into a Convolutional Neural Network (CNN) for image classification and compared with classification methods based on 1D LSTM and Bi-LSTM. The results of the study indicate that the classification performance using the 2D CNN model combined with GAF and MTF features outperforms other methods, particularly in multi-class classification tasks, with an accuracy improvement of 7%. This provides an important foundation for understanding the functional differences in the human brain when processing visual information of varying complexity.