Improvement in Semantic Address Matching using Natural Language Processing

Vansh Gupta,Mohit Gupta,Jai Garg,Nitesh Garg

DOI: https://doi.org/10.1109/INCET51464.2021.9456342

2024-04-18

Abstract:Address matching is an important task for many businesses especially delivery and take out companies which help them to take out a certain address from their data warehouse. Existing solution uses similarity of strings, and edit distance algorithms to find out the similar addresses from the address database, but these algorithms could not work effectively with redundant, unstructured, or incomplete address data. This paper discuss semantic Address matching technique, by which we can find out a particular address from a list of possible addresses. We have also reviewed existing practices and their shortcoming. Semantic address matching is an essentially NLP task in the field of deep learning. Through this technique We have the ability to triumph the drawbacks of existing methods like redundant or abbreviated data problems. The solution uses the OCR on invoices to extract the address and create the data pool of addresses. Then this data is fed to the algorithm BM-25 for scoring the best matching entries. Then to observe the best result, this will pass through BERT for giving the best possible result from the similar queries. Our investigation exhibits that our methodology enormously improves both accuracy and review of cutting-edge technology existing techniques.

Computation and Language

What problem does this paper attempt to address?

The paper titled "Improvement in Semantic Address Matching Using Natural Language Processing" aims to solve the problem of accurate address matching, which is crucial for businesses such as delivery and take-out companies. Existing methods, which rely on string similarity and edit distance algorithms, often struggle with redundant, unstructured, or incomplete address data. The authors propose a new method that leverages natural language processing (NLP) techniques to improve the accuracy and efficiency of address matching. The proposed system involves the following steps: 1. **Data Collection**: Addresses from different countries, including the USA, India, and Canada, are collected from invoices obtained through web scraping. Optical character recognition (OCR) is used to extract addresses from these invoices. 2. **BM25 Algorithm**: The BM25 weighting scheme is used to calculate the similarity between addresses, considering factors such as document length and term frequency. This helps in scoring and ranking potential matches. 3. **BERT Algorithm**: When the difference in scores between the top-ranked addresses is not significant, the BERT algorithm is used to perform string matching. BERT, a deep learning model, is employed to understand the semantic similarity between addresses, improving the accuracy of matches. 4. **Results and Improvements**: The system demonstrates improvements over traditional methods by achieving higher accuracy, precision, and recall. It also ca

Improvement in Semantic Address Matching using Natural Language Processing

A deep learning architecture for semantic address matching

DeepAM: Deep Semantic Address Representation for Address Matching.

Deep Transfer Learning Model for Semantic Address Matching

Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching

Methods for Matching English Language Addresses

Deep Contrast Learning Approach for Address Semantic Matching

Geographical Address Representation Learning for Address Matching

GSAM: A Deep Neural Network Model for Extracting Computational Representations of Chinese Addresses Fused with Geospatial Feature

Improving Address Matching using Siamese Transformer Networks

Deep Contextual Embeddings for Address Classification in E-commerce

Address Matching Based On Hierarchical Information

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

AddrLLM: Address Rewriting via Large Language Model on Nationwide Logistics Data

A Novel Address-Matching Framework Based on Region Proposal

A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

An Efficient Post-Processing Approach for Off-Line Handwritten Chinese Address Recognition

Semantic Matching Model based on Layer-Wise Attention Pooling Network and Dynamic Feature Fusion Mechanism

A Post-processing Approach for Handwritten Chinese Address Recognition

Recognition Method of New Address Elements in Chinese Address Matching Based on Deep Learning

Leveraging Subword Embeddings for Multinational Address Parsing