Abstract:The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput.

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Socialanalysis: A Real-Time Query And Mining System From Social Media Data Streams

Distributed Collaborative Hashing and Its Applications in Ant Financial

Real-time Text Analytics Pipeline Using Open-source Big Data Tools

Processing Long Queries Against Short Text

An Enhanced Batch Query Architecture in Real-time Recommendation

Compact Indexing and Judicious Searching for Billion-Scale Microblog Retrieval.

Exploring Real-Time Data Processing Using Big Data Frameworks

HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data

Real-time Twitter data analysis using Hadoop ecosystem

Tencentrec: Real-Time Stream Recommendation In Practice

Real-Time Search over a Microblogging System

Automatic Query Optimization for Retrieving Traffic Tweets

Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

Supporting Real-Time Analytic Queries In Big And Fast Data Environments

Real-time Recommendation for Microblogs

Serving Hybrid-Cloud SQL Interactive Queries at Twitter

Toward real-time data query systems in HEP

Temporal Geo-Social Personalized Keyword Search Over Streaming Data

TeRec: a temporal recommender system over tweet stream

The Unified Logging Infrastructure for Data Analytics at Twitter