Adversarial Binary Mutual Learning for Semi-Supervised Deep Hashing

Guan’An Wang,Qinghao Hu,Yang Yang,Jian Cheng,Zeng-Guang Hou,Guan'an Wang
DOI: https://doi.org/10.1109/tnnls.2021.3055834
IF: 14.255
2021-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Hashing is a popular search algorithm for its compact binary representation and efficient Hamming distance calculation. Benefited from the advance of deep learning, deep hashing methods have achieved promising performance. However, those methods usually learn with expensive labeled data but fail to utilize unlabeled data. Furthermore, the traditional pairwise loss used by those methods cannot explicitly force similar/dissimilar pairs to small/large distances. Both weaknesses limit existing methods' performance. To solve the first problem, we propose a novel semi-supervised deep hashing model named adversarial binary mutual learning (ABML). Specifically, our ABML consists of a generative model <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.518ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1514.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-47" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1112" y="-213"></use></g></svg></span> and a discriminative model <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.616ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1556.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-44" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1171" y="-213"></use></g></svg></span> , where <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.616ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1556.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-44" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1171" y="-213"></use></g></svg></span> learns labeled data in a supervised way and <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.518ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1514.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-47" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1112" y="-213"></use></g></svg></span> learns unlabeled data by synthesizing real images. We adopt an adversarial learning (AL) strategy to transfer the knowledge of unlabeled data to <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.616ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1556.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-44" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1171" y="-213"></use></g></svg></span> by making <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.518ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1514.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-47" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1112" y="-213"></use></g></svg></span> and <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="3.616ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 1556.8 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-44" x="0" y="0"></use> <use transform="scale(0.707)" xlink:href="#MJMATHI-48" x="1171" y="-213"></use></g></svg></span> mutually learn from each other. To solve the second problem, we propose a novel Weibull cross-entropy loss (WCE) by using the Weibull distribution, which can distinguish tiny differences of distances and explicitly force similar/dissimilar distances as small/large as possible. Thus, the learned features are more discriminative. Finally, by incorporating ABML with WCE loss, our model can acquire more semantic and discriminative features. Extensive experiments on four com-on data sets (CIFAR-10, large database of handwritten digits (MNIST), ImageNet-10, and NUS-WIDE) and a large-scale data set ImageNet demonstrate that our approach successfully overcomes the two difficulties above and significantly outperforms state-of-the-art hashing methods.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="1" id="MJMATHI-48" d="M228 637Q194 637 192 641Q191 643 191 649Q191 673 202 682Q204 683 219 683Q260 681 355 681Q389 681 418 681T463 682T483 682Q499 682 499 672Q499 670 497 658Q492 641 487 638H485Q483 638 480 638T473 638T464 637T455 637Q416 636 405 634T387 623Q384 619 355 500Q348 474 340 442T328 395L324 380Q324 378 469 378H614L615 381Q615 384 646 504Q674 619 674 627T617 637Q594 637 587 639T580 648Q580 650 582 660Q586 677 588 679T604 682Q609 682 646 681T740 680Q802 680 835 681T871 682Q888 682 888 672Q888 645 876 638H874Q872 638 869 638T862 638T853 637T844 637Q805 636 794 634T776 623Q773 618 704 340T634 58Q634 51 638 51Q646 48 692 46H723Q729 38 729 37T726 19Q722 6 716 0H701Q664 2 567 2Q533 2 504 2T458 2T437 1Q420 1 420 10Q420 15 423 24Q428 43 433 45Q437 46 448 46H454Q481 46 514 49Q520 50 522 50T528 55T534 64T540 82T547 110T558 153Q565 181 569 198Q602 330 602 331T457 332H312L279 197Q245 63 245 58Q245 51 253 49T303 46H334Q340 38 340 37T337 19Q333 6 327 0H312Q275 2 178 2Q144 2 115 2T69 2T48 1Q31 1 31 10Q31 12 34 24Q39 43 44 45Q48 46 59 46H65Q92 46 125 49Q139 52 144 61Q147 65 216 339T285 628Q285 635 228 637Z"></path><path stroke-width="1" id="MJMATHI-44" d="M287 628Q287 635 230 637Q207 637 200 638T193 647Q193 655 197 667T204 682Q206 683 403 683Q570 682 590 682T630 676Q702 659 752 597T803 431Q803 275 696 151T444 3L430 1L236 0H125H72Q48 0 41 2T33 11Q33 13 36 25Q40 41 44 43T67 46Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628ZM703 469Q703 507 692 537T666 584T629 613T590 629T555 636Q553 636 541 636T512 636T479 637H436Q392 637 386 627Q384 623 313 339T242 52Q242 48 253 48T330 47Q335 47 349 47T373 46Q499 46 581 128Q617 164 640 212T683 339T703 469Z"></path></defs></svg>
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?