DuMapper: Towards Automatic Verification of Large-Scale POIs with Street Views at Baidu Maps

Miao Fan,Jizhou Huang,Haifeng Wang
2024-11-27
Abstract:With the increased popularity of mobile devices, Web mapping services have become an indispensable tool in our daily lives. To provide user-satisfied services, such as location searches, the point of interest (POI) database is the fundamental infrastructure, as it archives multimodal information on billions of geographic locations closely related to people's lives, such as a shop or a bank. Therefore, verifying the correctness of a large-scale POI database is vital. To achieve this goal, many industrial companies adopt volunteered geographic information (VGI) platforms that enable thousands of crowdworkers and expert mappers to verify POIs seamlessly; but to do so, they have to spend millions of dollars every year. To save the tremendous labor costs, we devised DuMapper, an automatic system for large-scale POI verification with the multimodal street-view data at Baidu Maps. DuMapper takes the signboard image and the coordinates of a real-world place as input to generate a low-dimensional vector, which can be leveraged by ANN algorithms to conduct a more accurate search through billions of archived POIs in the database for verification within milliseconds. It can significantly increase the throughput of POI verification by $50$ times. DuMapper has already been deployed in production since \DuMPOnline, which dramatically improves the productivity and efficiency of POI verification at Baidu Maps. As of December 31, 2021, it has enacted over $405$ million iterations of POI verification within a 3.5-year period, representing an approximate workload of $800$ high-performance expert mappers.
Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the automatic verification of large - scale Point - of - Interest (POI) databases. Specifically, with the popularization of mobile devices, Web map services have become an indispensable tool in people's daily lives. And as the basic infrastructure for storing multi - modal information of geographical locations (such as stores, banks, etc.), the accuracy of POI databases is crucial for providing satisfactory user search results. However, keeping POI databases in sync with the real world is challenging because changes and innovations in the business environment are dynamic. Most companies rely on Volunteer Geographic Information System (VGI) platforms to ensure the quantity and quality of POI data, but this method requires a huge human cost. To solve this problem, the paper proposes the DuMapper system, which aims to achieve automatic verification of large - scale POIs by using Street View data, thereby significantly reducing labor costs and improving efficiency. DuMapper includes two versions: 1. **DuMapper I**: Imitating the work flow of expert surveyors, it uses a three - stage pipeline (geospatial indexing, optical character recognition, candidate POI ranking) for automatic POI verification. 2. **DuMapper II**: Proposes a new framework that uses Deep Multi - Modal Embedding (DME) and Approximate Nearest Neighbor (ANN) search techniques to directly index multi - modal Street View data including signboard images and coordinates, thus significantly accelerating the speed of POI verification. Experimental results show that compared with DuMapper I, DuMapper II can increase the throughput of POI verification by 50 times. Since its deployment in June 2018, DuMapper has performed more than 405 million POI verification iterations on Baidu Maps, which is equivalent to the workload of about 800 high - performance expert surveyors. ### Key Formulas In DuMapper II, in order to fuse data of different modalities, the following formulas are used: - **CNN Feature Extraction**: \[ G' = CNN(Z) \] where \(G' \in \mathbb{R}^{l \times d}\) and \(Z\) is the original data of the signboard image. - **Geohash Encoding**: \[ i' = GeoHash(x, y) \] where \(i' \in \mathbb{R}^l\) and \(x\) and \(y\) are longitude and latitude respectively. - **Random Embedding Expansion**: \[ I' = GeoEmb(i') \] where \(I' \in \mathbb{R}^{l \times d}\). - **Cross - Attention Mechanism**: \[ I = \text{Softmax}\left(\frac{Q_G K_I^T}{\sqrt{d}}\right) V_I \] \[ G = \text{Softmax}\left(\frac{Q_I K_G^T}{\sqrt{d}}\right) V_G \] - **Average Pooling**: \[ i = \text{Avg\_Pool}(I) \] \[ g = \text{Avg\_Pool}(G) \] - **Multi - Modal Embedding**: \[ m = i \oplus g \] - **Triplet Loss Function**: \[ L_\Delta = \sum_{(m, m^+, m^*) \in \Delta} \max\left\{0, \gamma + \frac{m \cdot m^+}{|m||m^+|} - \frac{m \cdot m^*}{|m||m^*|}\right\} \] Through these formulas, DuMapper II can effectively map data of different modalities to the same low - dimensional space and quickly find the most similar POIs through the ANN algorithm.