MAP- and MLE-Based Teaching

Hans Ulrich Simon,Jan Arne Telle
DOI: https://doi.org/10.48550/arXiv.2307.05252
2023-07-11
Abstract:Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively teach a machine - learning model to recognize specific concepts under different sampling modes. Specifically, the paper explores how a teacher can select the minimum number of observed data sets under the maximum a posteriori probability (MAP) and maximum likelihood estimation (MLE) frameworks so that the learner can correctly infer the target concept. The research focuses on the influence of different sampling methods (ordered / unordered sampling, sampling with / without replacement) on teaching efficiency and the relationship between these sampling methods and the teaching dimension of the concept class. The main contributions of the paper include: 1. **Monotonicity property**: It is proved that the MAP teaching model has two intuitive monotonicity properties, that is, adding new observed data usually leads to a smaller MAP teaching dimension, while adding new concepts will increase the MAP teaching dimension. 2. **Comparison of sampling modes**: Four different sampling modes (ordered / unordered sampling, sampling with / without replacement) are analyzed, and it is pointed out that two of these modes (ordered sampling with replacement and unordered sampling with replacement) are equivalent, while the other modes are mutually incompatible, that is, the size of the MAP teaching dimension they lead to depends on the specific concept class and the learner's parameter settings. 3. **Results in special cases**: For the important special case where the concept is a subset of the domain and the observed data are 0, 1 - labeled examples, the paper provides additional results, including: - The MAP teaching dimension of the optimally parameterized MAP learner is described using graph - theoretic methods. - It is proved that the MLE teaching dimension is either equal to the MAP teaching dimension or 1 more than the latter. - It is pointed out that these teaching dimensions can be upper - bounded by relevant combinatorial parameters such as the so - called antichain number and VC dimension. - It is further shown that these teaching dimensions can be calculated in polynomial time from the natural encoding of the concept class. Through these studies, the paper not only deepens the understanding of machine - teaching theory but also provides a theoretical basis for the design of teaching strategies in practical applications.