Pyramid multi-loss vision transformer for thyroid cancer classification using cytological smear

Bo Yu,Peng Yin,Hechang Chen,Yifei Wang,Yu Zhao,Xianling Cong,Jouke Dijkstra,Lele Cong
DOI: https://doi.org/10.1016/j.knosys.2023.110721
IF: 8.139
2023-09-01
Knowledge-Based Systems
Abstract:Multi-instance learning, a commonly used technique in artificial intelligence for analyzing slides, can be applied to diagnose thyroid cancer based on cytological smears. Since smears do not have multidimensional histological features similar to histopathology, mining potential contextual information and diversity of features is crucial for better classification performance. In this paper, we propose a pyramid multi-loss vision transformer model called PyMLViT, a novel algorithm with two core modules to address these issues. Specifically, we design a pyramid token extraction module to acquire potential contextual information on smears. The pyramid token structure extracts multi-scale local features, and the vision transformer structure further obtains global information through the self-attention mechanism. Furthermore, we construct multi-loss fusion module based on the conventional multi-instance learning framework. With carefully designed bag and patch weight allocation strategies, we incorporate slide-level annotations as pseudo-labels for patches to participate in training, thus enhancing the diversity of supervised information. Extensive experimental results on the real-world dataset show that PyMLViT has a high performance and a competitive number of parameters compared to popular methods for diagnosing thyroid cancer in cytological smears.
computer science, artificial intelligence
What problem does this paper attempt to address?