Compact scalable hash from deep learning features aggregation for content de-duplication

Shan Feng,Zhu Li,Yiling Xu,Jun Sun
DOI: https://doi.org/10.1109/MMSP.2017.8122286
2017-01-01
Abstract:Unprecedented growth in media content generation, communication and consumption has taken over the vast majority of storage spaces in devices, network caches, and clouds. How to identify duplications from network caches is an important issue for fast and efficient content delivery network (CDN) communication and storage. In this work, we developed a novel hash scheme which is scalable and robust to typical CDN induced transcoding and manipulations. Scalable hash design is constructed in essentially two stages: images are first represented as 512 channels of thumbnail images from the deep learning VGG-16 networks, and then a Fisher Vector aggregation is performed on the features which offer scalability in both underlying Gaussian Mixture Model (GMM) PCA embedding and component posterior likelihood. Hash is generated by direct binarizing the Fisher Vector with component/dimensionality priority optimization. Simulation results have demonstrated that this is a very compact and accurate scheme for CDN content de-duplication.
What problem does this paper attempt to address?