What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the problem of geo-entity linking in noisy multilingual social media data. Specifically, the authors focus on how to match user-provided location information (e.g., the "location" field in Twitter profiles) with actual geographic entities. ### Background and Motivation 1. **Importance of Geographic Location**: The actual geographic location of social media users is crucial for many computational social science tasks, including disaster response, disease monitoring, language variation analysis, and regional attitude comparison. 2. **Limitations of Geotags**: Traditional geotags (such as latitude and longitude coordinates) were deprecated in 2019, and even before that, less than 2% of tweets contained geotags. Therefore, inferring location from user profiles and free-text location fields has become increasingly necessary. 3. **Limitations of Existing Tools**: Currently available multilingual geo-entity linking tools are scarce. Existing tools are either rule-based, which can easily fail in social media environments, or based on large language models (LLMs), which are costly and not suitable for large-scale datasets. ### Research Objectives 1. **Propose a New Method**: The authors propose a new method that represents real-world locations using the average embeddings of annotated user input location names and achieves selective prediction through an adjustable cosine similarity threshold. 2. **Performance Evaluation**: The authors evaluate the performance of the proposed method on a multilingual global social media dataset and compare it with other baseline methods. 3. **Discuss Issues**: The authors discuss the issues encountered when evaluating geo-entity linking at different geographic granularities (country, administrative region, city), particularly the challenges at the city level. ### Main Contributions 1. **New Method**: A method is proposed to represent real-world locations through average embeddings and achieve selective prediction using a cosine similarity threshold. 2. **Performance Improvement**: The proposed method outperforms leading baseline methods across all variants on a multilingual global dataset. 3. **Accuracy Upper Bound**: Through manual annotation experiments, the accuracy upper bound on the dataset is estimated, and the issues of geo-entity linking at the city level are discussed. ### Related Work 1. **Geo-entity Linking**: Previous research typically combines the text and context of location mentions, knowledge bases (such as gazetteers, Wikipedia), and coordinate/geometry features, using rule-based, unsupervised, or supervised methods. 2. **Multilingual Research**: Most prior work has focused on English data and news articles, but there are a few studies involving historical texts and web data. 3. **Social Media Data**: Some previous studies have explored geo-entity linking in social media data, but these studies are mostly rule-based or use large language models. ### Methodology 1. **Task Definition**: Given a target location database, a training set containing user input location names and real location pairs, and a test set, the model needs to predict the best matching geographic entity for each user input. 2. **Data**: A modified GeoNames database is used as the target location database, and geotagged tweets from the Twitter-Global dataset are extracted as training and test data. 3. **Method**: The proposed method (UserGeo) computes embeddings for each location in the target location database, then predicts the location by calculating the cosine similarity between the user input and location embeddings. If the cosine similarity of all location embeddings is below a given threshold, the prediction confidence is considered low, and no prediction is made. ### Experimental Results 1. **Performance Comparison**: UserGeo achieves the highest accuracy at the country and administrative region levels, outperforming Carmen 2.0 by 25 and 17 percentage points, respectively; NameGeo achieves the highest accuracy at the city level, outperforming Carmen 2.0 by 5 percentage points. 2. **Precision-Coverage Curve**: UserGeo and NameGeo can trade off between precision and coverage by adjusting the threshold, while Carmen 2.0 has lower coverage. 3. **Error Analysis**: UserGeo performs better in handling non-Latin scripts and alternative/informal location names, whereas Carmen 2.0 and NameGeo...

Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Localize Online Social Network User Via Social Sensing

Discovering Geo-dependent Stories by Combining Density-based Clustering and Thread-based Aggregation techniques

Locate Who You Are: Matching Geo-location to Text for User Identity Linkage

Geolocation differences of language use in urban areas

Geo-referencing Place from Everyday Natural Language Descriptions

Accurate Local Estimation of Geo-Coordinates for Social Media Posts

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Joint Intent Detection and Entity Linking on Spatial Domain Queries.

Leveraging Contrastive Learning for Few-shot Geolocation of Social Posts

Geographical Feature Extraction for Entities in Location-based Social Networks.

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Implicit Entity Linking in Tweets

GeoLocator: a location-integrated large multimodal model for inferring geo-privacy

Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning

Multimodal interaction aware embedding for location-based social networks

Inferring the geographic focus of online documents from social media sharing patterns

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

Entity Linking in the Job Market Domain