Abstract:Knowledge constitutes the accumulated understanding and experience that humans use to gain insight into the world. In deep learning, prior knowledge is essential for mitigating shortcomings of data-driven models, such as data dependence, generalization ability, and compliance with constraints. To enable efficient evaluation of the worth of knowledge, we present a framework inspired by interpretable machine learning. Through quantitative experiments, we assess the influence of data volume and estimation range on the worth of knowledge. Our findings elucidate the complex relationship between data and knowledge, including dependence, synergistic, and substitution effects. Our model-agnostic framework can be applied to a variety of common network architectures, providing a comprehensive understanding of the role of prior knowledge in deep learning models. It can also be used to improve the performance of informed machine learning, as well as distinguish improper prior knowledge.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issue of how to effectively evaluate and utilize prior knowledge in deep learning, particularly in the face of challenges such as data dependency, generalization ability, and adherence to constraints in data-driven models. The authors propose a framework inspired by interpretable machine learning, aimed at quantitatively assessing the value of prior knowledge and exploring the impact of data volume and estimation range on the value of knowledge through experiments. Specifically, the paper addresses the following key questions: 1. **How to evaluate the value of knowledge?** The authors draw on the concept of Shapley values and propose a framework for quantitatively measuring the effect of prior rules in guided machine learning. They introduce the concept of "Rule Importance" (RI) to effectively handle the contribution of integrating rules in guided deep learning. 2. **What is the relationship between data and rules?** Through quantitative experiments, the authors assess the impact of data volume and estimation range on the value of knowledge, revealing the complex interactions between data and knowledge, including dependency, synergy, and substitution effects. 3. **How to make prior rules work better?** The proposed rule importance measure can be used to adjust regularization parameters in guided machine learning to prevent non-convergence during the training process and maximize the value of knowledge. Additionally, it can be used to identify inappropriate prior rules. The paper also explores the intrinsic principles between data and rules, as well as the interactions among multiple rules, including dependency, synergy, and substitution. These findings not only help in understanding the complex relationships between data and rules but can also be applied in practical scenarios, such as optimizing the performance of guided machine learning models and identifying inappropriate prior rules. In summary, this research provides a new perspective on the issue of knowledge value assessment in deep learning and offers practical solutions for guided machine learning.

Worth of knowledge in deep learning

Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Explaining Knowledge Distillation by Quantifying the Knowledge

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

Improving Data-Driven Inferential Sensor Modeling by Industrial Knowledge: A Bayesian Perspective

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

Knowledge Matters: Importance of Prior Information for Optimization

Model Information As an Analysis Tool in Deep Learning

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Interpreting Deep Learning Models for Knowledge Tracing

Knowledge Efficient Deep Learning for Natural Language Processing

Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning

Informed Pre-Training on Prior Knowledge

Layerwise Change of Knowledge in Neural Networks

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Distilling Model Knowledge

How Does Information Bottleneck Help Deep Learning?

Data-Free Knowledge Transfer: A Survey