Abstract:Object clustering has received considerable research attention most recently. However, 1) most existing object clustering methods utilize visual information while ignoring important tactile modality, which would inevitably lead to model performance degradation and 2) simply concatenating visual and tactile information via multiview clustering method can make complementary information to not be fully explored, since there are many differences between vision and touch. To address these issues, we put forward a graph-based visual–tactile fused object clustering framework with two modules: 1) a modality-specific representation learning module <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.734ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1607.5 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-52" x="1372" y="-213"></use></g></svg></span> and 2) a unified affinity graph learning module <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.747ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1613.2 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-55" x="1372" y="-213"></use></g></svg></span> . Specifically, <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.734ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1607.5 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-52" x="1372" y="-213"></use></g></svg></span> focuses on learning modality-specific representations for visual–tactile data, where deep non-negative matrix factorization (NMF) is adopted to extract the hidden information behind each modality. Meanwhile, we employ an autoencoder-like structure to enhance the robustness of the learned representations, and two graphs to improve its compactness. Furthermore, <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.747ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1613.2 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-55" x="1372" y="-213"></use></g></svg></span> highlights how to mitigate the differences between vision and touch, and further maximize the mutual information, which adopts a minimizing disagreement scheme to guide the modality-specific representations toward a unified affinity graph. To achieve ideal clustering performance, a Laplacian rank constraint is imposed to regularize the learned graph with ideal connected components, where noises that caused wrong connections are removed and clustering labels can be obtained directly. Finally, we propose an efficient alternating iterative minimization updating -trategy, followed by a theoretical proof to prove framework convergence. Comprehensive experiments on five public datasets demonstrate the superiority of the proposed framework.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-4D" d="M289 629Q289 635 232 637Q208 637 201 638T194 648Q194 649 196 659Q197 662 198 666T199 671T201 676T203 679T207 681T212 683T220 683T232 684Q238 684 262 684T307 683Q386 683 398 683T414 678Q415 674 451 396L487 117L510 154Q534 190 574 254T662 394Q837 673 839 675Q840 676 842 678T846 681L852 683H948Q965 683 988 683T1017 684Q1051 684 1051 673Q1051 668 1048 656T1045 643Q1041 637 1008 637Q968 636 957 634T939 623Q936 618 867 340T797 59Q797 55 798 54T805 50T822 48T855 46H886Q892 37 892 35Q892 19 885 5Q880 0 869 0Q864 0 828 1T736 2Q675 2 644 2T609 1Q592 1 592 11Q592 13 594 25Q598 41 602 43T625 46Q652 46 685 49Q699 52 704 61Q706 65 742 207T813 490T848 631L654 322Q458 10 453 5Q451 4 449 3Q444 0 433 0Q418 0 415 7Q413 11 374 317L335 624L267 354Q200 88 200 79Q206 46 272 46H282Q288 41 289 37T286 19Q282 3 278 1Q274 0 267 0Q265 0 255 0T221 1T157 2Q127 2 95 1T58 0Q43 0 39 2T35 11Q35 13 38 25T43 40Q45 46 65 46Q135 46 154 86Q158 92 223 354T289 629Z"></path><path stroke-width="1" id="MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path><path stroke-width="1" id="MJMATHI-55" d="M107 637Q73 637 71 641Q70 643 70 649Q70 673 81 682Q83 683 98 683Q139 681 234 681Q268 681 297 681T342 682T362 682Q378 682 378 672Q378 670 376 658Q371 641 366 638H364Q362 638 359 638T352 638T343 637T334 637Q295 636 284 634T266 623Q265 621 238 518T184 302T154 169Q152 155 152 140Q152 86 183 55T269 24Q336 24 403 69T501 205L552 406Q599 598 599 606Q599 633 535 637Q511 637 511 648Q511 650 513 660Q517 676 519 679T529 683Q532 683 561 682T645 680Q696 680 723 681T752 682Q767 682 767 672Q767 650 759 642Q756 637 737 637Q666 633 648 597Q646 592 598 404Q557 235 548 205Q515 105 433 42T263 -22Q171 -22 116 34T60 167V183Q60 201 115 421Q164 622 164 628Q164 635 107 637Z"></path></defs></svg>

Learning Representation on Optimized High-Order Manifold for Visual Classification

Deep Clustering and Representation Learning that Preserves Geometric Structures

Learning Exemplar-Represented Manifolds in Latent Space for Classification.

Deep Manifold Computing and Visualization

Deep Manifold Computing and Visualization Using Elastic Locally Isometric Smoothness

Graph Learning in Low Dimensional Space for Graph Convolutional Networks

Markov-Lipschitz Deep Learning

Constrained Manifold Learning for Hyperspectral Imagery Visualization

Multi-Scale Representation Learning on Hypergraph for 3D Shape Retrieval and Recognition

Hypergraph-Induced Convolutional Networks for Visual Classification

Learning Structured Representations with Hyperbolic Embeddings

Inductive Multi-Hypergraph Learning and Its Application on View-Based 3D Object Classification

Elastic Net Hypergraph Learning for Image Clustering and Semi-supervised Classification

Unsupervised Image Classifier based on Manifold Learning

Robust Dimensionality Reduction via Low-rank Laplacian Graph Learning

Deep Extrinsic Manifold Representation for Vision Tasks

Learning Pose Image Manifolds Using Geometry-Preserving GANs and Elasticae

Multiple Laplacian Graph Regularised Low-Rank Representation with Application to Image Representation

Deep Hypergraph Structure Learning

Exploring the Manifold of Neural Networks Using Diffusion Geometry

Visual-Tactile Fused Graph Learning for Object Clustering