Abstract:Given a rooted tree and a ranking of its leaves, what is the minimum number of inversions of the leaves that can be attained by ordering the tree? This variation of the problem of counting inversions in arrays originated in mathematical psychology, with the evaluation of the Mann--Whitney statistic for detecting differences between distributions as a special case. We study the complexity of the problem in the comparison-query model, used for problems like sorting and selection. For many types of trees with $n$ leaves, we establish lower bounds close to the strongest known in the model, namely the lower bound of $\log_2(n!)$ for sorting $n$ items. We show: (a) $\log_2((\alpha(1-\alpha)n)!) - O(\log n)$ queries are needed whenever the tree has a subtree that contains a fraction $\alpha$ of the leaves. This implies a lower bound of $\log_2((\frac{k}{(k+1)^2}n)!) - O(\log n)$ for trees of degree $k$. (b) $\log_2(n!) - O(\log n)$ queries are needed in case the tree is binary. (c) $\log_2(n!) - O(k \log k)$ queries are needed for certain classes of trees of degree $k$, including perfect trees with even $k$. The lower bounds are obtained by developing two novel techniques for a generic problem $\Pi$ in the comparison-query model and applying them to inversion minimization on trees. Both techniques can be described in terms of the Cayley graph of the symmetric group with adjacent-rank transpositions as the generating set. Consider the subgraph consisting of the edges between vertices with the same value under $\Pi$. We show that the size of any decision tree for $\Pi$ must be at least: (i) the number of connected components of this subgraph, and (ii) the factorial of the average degree of the complementary subgraph, divided by $n$. Lower bounds on query complexity then follow by taking the base-2 logarithm.

On the hardness of learning queries from tree structured data

Properly Learning Decision Trees with Queries Is NP-Hard

Learning latent tree models with small query complexity

Tree Learning: Optimal Algorithms and Sample Complexity

Conjunctive Queries over Trees

Learning Tree Pattern Transformations

Finding Top-K Min-Cost Connected Trees In Databases

Searching in trees with monotonic query times

Superconstant Inapproximability of Decision Tree Learning

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

Complexity of Equivalence and Learning for Multiplicity Tree Automata

Consistent Query Answering for Primary Keys on Rooted Tree Queries

TreeSpan: efficiently computing similarity all-matching.

Branch Code: A Labeling Scheme for Efficient Query Answering on Trees

Sample-Optimal and Efficient Learning of Tree Ising models

Query Understanding Enhanced by Hierarchical Parsing Structures

Answering Complex Logical Queries on Knowledge Graphs Via Query Computation Tree Optimization

Lower Bound Techniques in the Comparison-Query Model and Inversion Minimization on Trees

Tree-like Queries in OWL 2 QL: Succinctness and Complexity Results

Adaptive Exact Learning of Decision Trees from Membership Queries

Ramsey Theorems for Trees and a General 'Private Learning Implies Online Learning' Theorem