Abstract:Recent deep learning models have attracted substantial attention in infant brain analysis. These models have performed state-of-the-art performance, such as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher). However, these models depend on an encoder-decoder structure with stacked local operators to gather long-range information, and the local operators limit the efficiency and effectiveness. Besides, the $MRI$ data contain different tissue properties ($TPs$) such as $T1$ and $T2$. One major limitation of these models is that they use both data as inputs to the segment process, i.e., the models are trained on the dataset once, and it requires much computational and memory requirements during inference. In this work, we address the above limitations by designing a new deep-learning model, called 3D-DenseUNet, which works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss. The self-attention module connects the down-sampling blocks to up-sampling blocks, and integrates the feature maps in three dimensions of spatial and channel, effectively improving the representation potential and discriminating ability of the model. Additionally, we propose a new method called Two Independent Teachers ($2IT$), that summarizes the model weights instead of label predictions. Each teacher model is trained on different types of brain data, $T1$ and $T2$, respectively. Then, a fuse model is added to improve test accuracy and enable training with fewer parameters and labels compared to the Temporal Ensembling method without modifying the network architecture. Empirical results demonstrate the effectiveness of the proposed method. The code is available at <a class="link-external link-https" href="https://github.com/AfifaKhaled/Two-Independent-Teachers-are-Better-Role-Model" rel="external noopener nofollow">this https URL</a>.

Learning from Multiple Teacher Networks

DCCD: Reducing Neural Network Redundancy Via Distillation

Customizing Student Networks From Heterogeneous Teachers Via Adaptive Knowledge Amalgamation

Learning Student Networks via Feature Embedding

Teacher outputs Student outputs Teacher ? Student ? ! ! " !

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

Collaborative Teaching with Attention Distillation for Multiple Cross-Domain Few-Shot Learning

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

FitNets: Hints for Thin Deep Nets

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Collaborative Teacher-Student Learning via Multiple Knowledge Transfer

MKTN: Adversarial-Based Multifarious Knowledge Transfer Network from Complementary Teachers

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Two Independent Teachers are Better Role Model

Adaptive Multi-Teacher Multi-level Knowledge Distillation

Improving knowledge distillation via an expressive teacher

Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning

Multi-Teacher Distillation With Single Model for Neural Machine Translation

ShrinkTeaNet: Million-scale Lightweight Face Recognition via Shrinking Teacher-Student Networks