Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery

Jie Ying Wu,Aniruddha Tamhane,Peter Kazanzides,Mathias Unberath
DOI: https://doi.org/10.1007/s11548-021-02343-y
2021-03-24
International Journal of Computer Assisted Radiology and Surgery
Abstract:Multi- and cross-modal learning consolidates information from multiple data sources which may offer a holistic representation of complex scenarios. Cross-modal learning is particularly interesting, because synchronized data streams are immediately useful as self-supervisory signals. The prospect of achieving self-supervised continual learning in surgical robotics is exciting as it may enable lifelong learning that adapts to different surgeons and cases, ultimately leading to a more general machine understanding of surgical processes.
What problem does this paper attempt to address?