Hidden Ancestor Graphs: Models for Detagging Property Graphs

R. W. R. Darling,Gregory S. Clark,J. D. Tucker
DOI: https://doi.org/10.48550/arXiv.2102.09581
2023-12-14
Abstract:Consider a graph $G$ where each vertex is visibly labelled as a member of a distinct class, but also has a hidden binary state: wild or tame. Edges with end points in the same class are called agreement edges. Premise: an edge connecting vertices in different classes -- a conflict edge -- is allowed only when at least one end point is wild. Interpret wild status as readiness to form connections with any other vertex, regardless of class -- a form of class disaffiliation. The learning goal is to classify each vertex as wild or tame using its neighborhood data. In applications such as communications metadata, bio-informatics, retailing, or bibliography, adjacency in $G$ is typically created by paths of length two in a transactional bipartite graph $B$. Class labelling, imported from a reference data source, is typically assortative, so agreement edges predominate. Conflict edges represent observed behavior (from $B$) inconsistent with prior labelling of $V(G)$. Wild vertices are those whose label is uninformative. The hidden ancestor graph constitutes a natural model for generating agreement edges and conflict edges, depending on a latent tree structure. The model is able to manifest high clustering rates and heavy-tailed degree distributions typical of social and spatial networks. It can be fitted to graph data using a few measurable graph parameters, and supplies a natural statistical classifier for wild versus tame.
Probability
What problem does this paper attempt to address?