Multi-Stream Single Network: Efficient Compressed Video Action Recognition With a Single Multi-Input Multi-Output Network

Hayato Terao,Wataru Noguchi,Hiroyuki Iizuka,Masahito Yamamoto
DOI: https://doi.org/10.1109/access.2024.3363022
IF: 3.9
2024-02-13
IEEE Access
Abstract:Compressed video action recognition classifies actions using multiple features stored in compressed videos to omit the decoding process for RGB frames and shorten the computation time. Previous methods mostly used multiple networks to process compressed video features and explored the use of lightweight networks without affecting accuracy to reduce the computational complexity further. We have focused on another approach that uses only one network to reduce computational complexity. Our previous study proposed the MussNet model, which consists of independent subnetworks within a single network instead of multiple networks. The subnetworks classify compressed video features independently with a feedforwarding step of a single network and achieved competitive accuracy against previous studies with lower computational complexity. The remaining issue of the MussNet model is how to fuse the independently processed compressed video features. The current MussNet model makes independent predictions from each input and only averages them to fuse the inputs. However, recent studies have shown that intermediate fusion, which fuses features inside the networks, improves accuracy. This study proposes the EFS module that extends the MussNet model into intermediate fusion by disentangling and aggregating the features of the same videos in the hidden vectors while keeping the individual subnetworks. Our experiments show that the EFS module improves the MussNet model's accuracy by 0.4 points for UCF-101 and 1.0 points for HMDB-51, while the additional GFLOPs are only 1% of the MussNet model. These accuracy scores are also competitive against previous studies while keeping one of the lowest computational complexity.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?