Abstract:This is the supplementary material for the paper Label Distribution Learning by Optimal Transport (Zhao and Zhou 2018), including proofs of the theorems and lemmas in the main paper. Review of Optimal Transport Distance In this part, we review some basic concepts and properties for optimal transport distance. Definition 1. (Transport Polytope) For two probability vectors r and c in the simplex Σd, we write U(r, c) for the transport polytope of r and c, namely the polyhedral set of d× d matrices, U(r, c) := {P ∈ Rd×d + |P1d = r, P1d = c}. (1) Definition 2. (Optimal Transport) Given a d × d cost matrix M , the total cost of mapping from r to c using a transport matrix (or coupling probability) P can be quantified as 〈P,M〉. The optimal transport (OT) problem is defined as, dM (r, c) := min P∈U(r,c) 〈P,M〉. (2) Theorem 1. (Optimal Transport Distance) dM defined in (2) is a distance on Σd whenever M is a metric matrix. Theorem 1 is proved by gluing lemma, and a detailed proof could be found in Chapter 6 in the seminal book (Villani 2008). Proof of Optimal Transport with a Pseudo-Metric Cost In this part, we will prove that for optimal transport with a pseudo-metric cost matrix, it preserves the sub-additivity property, which plays a key role in measuring difference between prediction and groundtruth. Meanwhile, it is sufficient to make it a strict distance by multiplying dM by 1r 6=c. The proof here is similar to proofs in papers (Cuturi 2013; Cuturi and Avis 2014), we provide a detailed proof as follows for self-containedness. Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Theorem 2. For a pseudo-metric M and probability distributions r, c ∈ Σd, the function (r, c)→ 1r 6=cdM (r, c) satisfies all four distance axioms, i.e., non-negativity, symmetry, definiteness and sub-additivity (triangle inequality). Proof. Non-negativity is easy to prove: since the coupling matrix P and cost matrix M are nonnegative. Besides, by the symmetry of M , dM is itself symmetric in its two arguments. Also, the definiteness is a direct result of the 1r 6=c term in function definition. The main point is to prove subadditivity. Let x, y, z be three elements in Σd. Let P ∈ U(x, y) and Q ∈ U(y, z) be two optimal solutions of the transport problems dM (x, y) and dM (y, z). Let T be a d× d× d tensor, Tijk = {pijqjk yj when yj 6= 0 0 when yj = 0 DefineR , [rik], where rik = ∑d j=1 Tijk. Then,R is the coupling set of x and z, i.e., R ∈ U(x, z). Indeed,

Supplementary Material : Label Distribution Learning by Optimal Transport

Label Distribution Learning by Optimal Transport.

Distorted optimal transport

Relative Optimal Transport

Learning to Count via Unbalanced Optimal Transport

Unifying Distributionally Robust Optimization via Optimal Transport Theory

Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications

Discrete Probabilistic Inverse Optimal Transport

Coupling Matrix Manifolds and Their Applications in Optimal Transport

OPTIMAL PATHS RELATED TO TRANSPORT PROBLEMS

On optimal transport of matrix-valued measures

PT$\mathrm{L}^{p}$: Partial Transport $\mathrm{L}^{p}$ Distances

Multi-marginal optimal transport on Riemannian manifolds

Optimal Transport for Manifold-Valued Images

A Riemannian Approach to Ground Metric Learning for Optimal Transport

Optimal Transport for Generative Models

An optimal transport based characterization of convex order

Tsallis Regularized Optimal Transport and Ecological Inference

Regularity theory and geometry of unbalanced optimal transport

Optimal Transport With Relaxed Marginal Constraints

Optimal Transport for Kernel Gaussian Mixture Models