Alexandria: Extensible Framework for Rapid Exploration of Social Media

Fenno F. Heath III,Richard Hull,Elham Khabiri,Matthew Riemer,Noi Sukaviriya,Roman Vaculin
DOI: https://doi.org/10.48550/arXiv.1507.06667
2015-07-24
Abstract:The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing "domain models" (i.e., families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities; these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research's SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics.
Information Retrieval,Computers and Society,Human-Computer Interaction,Social and Information Networks
What problem does this paper attempt to address?
The problem this paper attempts to address is how to integrate multiple big data analysis capabilities to support business analysts in quickly, collaboratively, and iteratively exploring and analyzing large datasets. Specifically, the paper introduces a system called Alexandria, which provides a scalable framework and platform that supports various big data analyses and visualizations, with a particular focus on the rapid exploration of text-based social media data. ### Main Issues 1. **How to integrate multiple data analysis tools**: Existing research mostly focuses on perfecting individual tools, whereas Alexandria provides a comprehensive environment that can quickly create domain models. 2. **How to support business users in data analysis**: Unlike most research that focuses on scalable performance or specific application domains, Alexandria aims to provide a high-level platform for business users, enabling them to more effectively utilize analysis results, discover actionable insights, and incorporate them into ongoing business processes. 3. **How to support the data analysis lifecycle from exploration to application**: Alexandria supports not only the initial exploration phase but also the evolution of data analysis methods from exploration to production use. 4. **How to support collaborative production environments for multidisciplinary teams**: Data analysis is no longer the isolated work of a few data scientists but is completed by multidisciplinary teams working together to deeply mine data and find ways to integrate analytical insights into existing business processes, achieving production-level applications. 5. **How to achieve scalability in large-scale data processing**: Alexandria needs to handle billions of tweets and forum comments, supporting integration with advanced systems like Spark, Titan, and Hadoop-based distributed data processing systems to achieve rapid big data analysis processes. ### Solutions - **Scalable platform**: Supports various styles of analysis, enabling business users to more effectively utilize analysis results. - **Support for the data analysis lifecycle**: From exploration to application, supporting the evolution of data analysis methods. - **Collaborative production environment**: Supports multidisciplinary team collaboration, improving the efficiency and quality of data analysis. - **Scalability**: Supports large-scale data processing, capable of handling billions of tweets and forum comments. Through these solutions, Alexandria aims to advance the frontier of social media analysis, providing a powerful platform that supports business users in quickly, collaboratively, and iteratively exploring and analyzing large datasets.