Supplementary Material : Label Distribution Learning by Optimal Transport

Peng Zhao,Zhi-Hua Zhou
2018-01-01
Abstract:This is the supplementary material for the paper Label Distribution Learning by Optimal Transport (Zhao and Zhou 2018), including proofs of the theorems and lemmas in the main paper. Review of Optimal Transport Distance In this part, we review some basic concepts and properties for optimal transport distance. Definition 1. (Transport Polytope) For two probability vectors r and c in the simplex Σd, we write U(r, c) for the transport polytope of r and c, namely the polyhedral set of d× d matrices, U(r, c) := {P ∈ Rd×d + |P1d = r, P1d = c}. (1) Definition 2. (Optimal Transport) Given a d × d cost matrix M , the total cost of mapping from r to c using a transport matrix (or coupling probability) P can be quantified as 〈P,M〉. The optimal transport (OT) problem is defined as, dM (r, c) := min P∈U(r,c) 〈P,M〉. (2) Theorem 1. (Optimal Transport Distance) dM defined in (2) is a distance on Σd whenever M is a metric matrix. Theorem 1 is proved by gluing lemma, and a detailed proof could be found in Chapter 6 in the seminal book (Villani 2008). Proof of Optimal Transport with a Pseudo-Metric Cost In this part, we will prove that for optimal transport with a pseudo-metric cost matrix, it preserves the sub-additivity property, which plays a key role in measuring difference between prediction and groundtruth. Meanwhile, it is sufficient to make it a strict distance by multiplying dM by 1r 6=c. The proof here is similar to proofs in papers (Cuturi 2013; Cuturi and Avis 2014), we provide a detailed proof as follows for self-containedness. Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Theorem 2. For a pseudo-metric M and probability distributions r, c ∈ Σd, the function (r, c)→ 1r 6=cdM (r, c) satisfies all four distance axioms, i.e., non-negativity, symmetry, definiteness and sub-additivity (triangle inequality). Proof. Non-negativity is easy to prove: since the coupling matrix P and cost matrix M are nonnegative. Besides, by the symmetry of M , dM is itself symmetric in its two arguments. Also, the definiteness is a direct result of the 1r 6=c term in function definition. The main point is to prove subadditivity. Let x, y, z be three elements in Σd. Let P ∈ U(x, y) and Q ∈ U(y, z) be two optimal solutions of the transport problems dM (x, y) and dM (y, z). Let T be a d× d× d tensor, Tijk = {pijqjk yj when yj 6= 0 0 when yj = 0 DefineR , [rik], where rik = ∑d j=1 Tijk. Then,R is the coupling set of x and z, i.e., R ∈ U(x, z). Indeed,
What problem does this paper attempt to address?