BOTTARI : Location based Social Media Analysis with Semantic Web

Irene Celino,Daniele Dell'Aglio,Emanuele Della Valle,Marco Balduini,Yi Huang,Tony Kyung-il Lee,Seon-Ho Kim,Volker Tresp
2011-01-01
Abstract:Location-based services are influencing our lives and the way we experience the surrounding environment; smartphone and tablet applications supply a huge amount of information: shops around us, traffic conditions, etc. A recent trend in this kind of services is to provide personalized information, such as friends’ position or events users could be interested in. In this paper we present BOTTARI, an Android application that exploits social media and context to provide point of interest (POI) recommendations to user in a specific geographic location. BOTTARI exploits a number of semantic techniques (sentiment analysis, inductive reasoning, stream reasoning) for social media analysis and suggests POIs on the basis of users’ tastes and influencing people’s opinion. 1 Description of BOTTARI and its innovation The Insa-dong area of Seoul is one of the most popular visitor attractions in South Korea: it is the focal point of Korean traditional culture and crafts, with a multitude of shops, restaurant and points of interest (POIs). BOTTARI is a location-based mobile application developed for Android tablets targeted to Korean users moving in Insa-dong. It provides personalized POI recommendations, to help users find their way when they are in a specific location. BOTTARI collects relevant information from social media such as Twitter and blog posts, elaborates it and provides contextualized suggestions. The Korean word “bottari” refers to a bundle or container made from patterned cloth that is used to transport a one’s belongings when travelling; the BOTTARI application lets the user “transport” the location-specific knowledge, derived from social media, when moving in the physical space. BOTTARI has two main sources of information. One is a curated dataset about the Insa-dong area, and collects information about some hundred POIs, each one described by a few dozen attributes (location, description, place category, price range, reviews, contacts, etc.); this dataset content is quite static and is used as “background” information about the POIs. This data are expressed in RDF, described with regards to an OWL ontology and sums up to more than 20 thousand triples. 2 Irene Celino et al. The other dataset is gathered from social media. Apart from blog posts, which are manually collected from Korean web sites, the main source consists in tweets collected from Korean users (i.e., all tweets are written in Korean language) since April 2010; those short messages are acquired by means of the Twitter APIs, are further elaborated to identify the tweets talking about POIs in Insa-dong and processed to assess the “sentiment” they express (positive judgement vs. negative rating). The results are expressed in RDF and described with regards to the OWL ontology illustrated in [1]; those triples are then stored in a SOR repository; the triple count is currently 0.6 billion, but it is continuously and steadily increasing. Fig. 1. Screenshots of BOTTARI: (a) augmented reality display of recommended POIs, (b) POI selection and (c) visualization of the selected POI details, (d) trends in user sentiment about the POI. Figure 1 shows some screenshots of the application. BOTTARI provides to its users four different types of recommendations: – Interesting recommendations suggest POIs indicated for foreign visitors in Korea; this feature calls for analysis and retrieval of POIs attributes; – Popular recommendations suggest the POIs which show the highest level of reputation on social media; this feature call for a complete analysis of the social sentiment about POIs; – Emerging recommendations suggest the most popular POIs in a delimited period of time (e.g. last 6 months); this feature calls for the identification of “hypes” and new trends in the social sentiment; 5 SOR is a Saltlux product: http://semanticwiki-en.saltlux.com/index.php/SOR. BOTTARI: Location based Social Media Analysis with Semantic Web 3 – For me recommendations suggest POIs of interest for the current user; this feature calls for personalized recommendations. The innovation brought by BOTTARI consists in offering a location-based service through a simple and intuitive interface for a natural user experience; the application provides advanced semantic features, hiding the complexity of their computation from the user’s sight. The details of the internal functioning of the BOTTARI application are given in the following section. 2 Semantic features in BOTTARI The four types of recommendations provided by BOTTARI requires different levels of semantic technologies. In this section, we explain how we addressed the different challenges. 2.1 Semantic Information Retrieval to get Interesting POIs The first kind of recommendations requires to suggest the user with a subset of the POIs that matches (1) the user current location and (2) the category of “attractions of interest for foreign visitors”. To provide those recommendations, we used the Semantic Information Retrieval features of SOR, which provides a geographic extension of SPARQL that allows to query both the “semantic” description of POIs and their physical location. 2.2 Sentiment Analysis of social media to get Popular POIs The second type of recommendations requires an analysis of the social media. The tweets and blog posts are processed by a sentiment analysis algorithm that detects if the message talks about a POI and, in case, if it expresses a positive or negative rating on the POI. We adopted a twofold approach to compute the “sentiment”; we applied the two methods both separately and in cooperation, to improve the precision of results. On the one hand, we used a pure machine learning approach using SVMs with syllable kernel. On the other hand, we used a rule-based approach: the messages are analysed with respect to some rules about the structure and language. Those rules are both manually coded and generated by machine learning algorithms; this is a NLP technology of Saltlux, specialized on the Korean language. Once the sentiment is elicited, this information is attached as metadata to the message description in the triple store. The popular recommendations are then generated by querying the knowledge base and suggesting the POIs with the highest number of positive ratings; the geographic features of SOR are also used to filter POIs and recommend only those around the user current location. 4 Irene Celino et al. 2.3 Stream Reasoning to get Emerging POIs The opinion of users on POIs can change over time: the third kind of recommendations suggests the users with POIs that are “on fashion” in the latest period of time. To this end, we adopted stream reasoning [2] to identify trends and changes in the sentiment about the POIs. Figure 2 illustrates how we elaborated the social media stream to derive synthetic and aggregated information that help the user in choosing the POI to visit. Because of the sentiment analysis elaboration, the stream of messages annotated with the user sentiment is not in real-time, but it is “re-streamed” from its storage. The queries enabled by the C-SPARQL Engine [3, 4] let find the emerging opinion of users about POIs: from left to right in the figure, our engine counts the positive opinions about a POI per each day, so to create top-10 lists; the positive messages per day can be further aggregated by week or by month and be visualized as plot lines or heatmaps. Fig. 2. Query chain to elaborate social media streams with C-SPARQL 2.4 Inductive Reasoning to get For me POIs Finally, POI recommendations can be personalized: the user can be suggested with POIs that could be interesting for her. To this end, we adopted inductive reasoning on social media to compute BOTTARI’s for me recommendations. We exploited the SUNS approach (Statistical Unit Node Set) described in [5, 6]. SUNS is a machine learning approach for exploiting the regularities in large data sets in relational and semantic domains. The approach can be used to detect interesting data patterns and predict unknown but potentially true statements. In BOTTARI we applied SUNS to estimate the probability that a user will like a POI, based on the sentiment the same user expressed about other POIs and the opinion that other users expressed about that POI. In this sense, we provide a personalized collaborative filtering recommendation engine, to suggest users with the most interesting POIs with respect to their preferences. BOTTARI: Location based Social Media Analysis with Semantic Web 5 3 Design and development of BOTTARI The front-end of BOTTARI is a dedicated Android application which uses localization and augmented reality to provide its functionality to users in mobility in the Insa-dong area of Seoul; the visual appearance of the front-end is exemplified in Figure 1. The BOTTARI back-end is where the semantic technologies are adopted for social media analysis. To design and develop the back-end we exploited the potentialities of the LarKC platform [7, 8]. The LarKC platform, realized in the homonymous EU project, is aimed to reason on massive heterogeneous information such as social media data. The platform consists of a framework to build workflows, i.e. sequences of connected components (plug-ins) able to consume and process data. Each plug-in exploits techniques and heuristics from diverse areas such as databases, machine learning and the Semantic Web. In BOTTARI, we designed a workflow that makes use of a number of plug-ins that implements the features explained in Section 2 (semantic information retrieval, stream reasoning, inductive reasoning). Every time a user of the mobile application requests a type of recommendation, the BOTTARI LarKC workflow is invoked and instantiates the plug-ins to compute the requested data. Each plug-in encapsulate a part of the data processing and interacts with the other plug-ins through well-defined interfaces; the plug-ins used in BOTTARI are described in several technical reports. The adoption of La
What problem does this paper attempt to address?