Abstract:We introduce the task of webpage briefing (WB) to provide a summary of a webpage in a hierarchical manner, from the broad topic of the webpage, to finer level key attributes. A straightforward approach for this task is to train a machine learning model for generating topics and extracting key attributes. However, such a model may not perform well on webpages that are from domains not seen in the training data. An ideal model should be able to adapt to unseen domains while preserving knowledge learned from the seen domains. Knowledge distillation (KD) offers a potential solution, in which a teacher pre-trained with specific domains can pass the knowledge to a student, while unseen domains can also be added to increase the robustness of the models. However, existing works usually assume the models have no access to seen domains during distillation and the knowledge on seen domains may be lost. In our setting, we have access to the generated topics, which contain representative knowledge of seen domains and can help preserve that knowledge during distillation. Moreover, a vanilla KD does not pass on the knowledge about the location patterns of the informative contents in webpages, which are essential for identifying the topics to be generated or the key attributes to be extracted. To preserve more knowledge of seen domains and to better utilize the location patterns, we propose a Dual Distillation model which consists of identification distillation (ID) and understanding distillation (UD); ID distills knowledge on the identification of informative contents under the guidance of the learned topics of seen domains, while UD distills knowledge on topic generation or key attribute extraction. Since topics and key attributes are distilled separately in two students in Dual Distillation, the inherent correlations between them are not utilized. To better exploit such correlations, we propose a Triple Distillation model which consists of a shared ID and two UDs, one for topic generation and the other for key attribute extraction. We further propose a joint model for WB with signal enhancement and exchange among a key attribute extractor, a topic generator, and an informative section predictor. Experiments on real-world webpages show that our models achieve high performances for WB, and validate the superiority of Dual Distillation and Triple Distillation in their target settings. Experiments also show that the proposed joint model outperforms single-task baselines and other joint models.

Automatic Webpage Briefing

MCKD: Mutually Collaborative Knowledge Distillation for Federated Domain Adaptation and Generalization

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wasserstein Contrastive Representation Distillation

Tree-structured Auxiliary Online Knowledge Distillation

Respecting Transfer Gap in Knowledge Distillation

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

Revisiting Knowledge Distillation: an Inheritance and Exploration Framework

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation

Cross-domain knowledge distillation for text classification

Weight-Inherited Distillation for Task-Agnostic BERT Compression

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Dual-Teacher De-biasing Distillation Framework for Multi-domain Fake News Detection

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

Online Knowledge Distillation with Diverse Peers