A Topic-wise Exploration of the Telegram Group-verse

Alessandro Perlo,Giordano Paoletti,Nikhil Jha,Luca Vassio,Jussara Almeida,Marco Mellia
2024-09-04
Abstract:Although currently one of the most popular instant messaging apps worldwide, Telegram has been largely understudied in the past years. In this paper, we aim to address this gap by presenting an analysis of publicly accessible groups covering discussions encompassing different topics, as diverse as Education, Erotic, Politics, and Cryptocurrencies. We engineer and offer an open-source tool to automate the collection of messages from Telegram groups, a non-straightforward problem. We use it to collect more than 50 million messages from 669 groups. Here, we present a first-of-its-kind, per-topic analysis, contrasting the characteristics of the messages sent on the platform from different angles -- the language, the presence of bots, the type and volume of shared media content. Our results confirm some anecdotal evidence, e.g., clues that Telegram is used to share possibly illicit content, and unveil some unexpected findings, e.g., the different sharing patterns of video and stickers in groups of different topics. While preliminary, we hope that our work paves the road for several avenues of future research on the understudied Telegram platform.
Social and Information Networks
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to fill the gap in the lack of research on the Telegram platform. It analyzes user behavior patterns under different topics by conducting multi - topic exploration of publicly accessible Telegram groups. Specifically, the authors attempt to answer the following core questions: 1. **Diversity of platform use**: What are the differences in the communication methods and content of users in Telegram groups with different topics (such as education, pornography, politics, cryptocurrency, etc.)? Do these differences reflect unique behavior patterns under specific topics? 2. **Automated data collection tools**: How to design and implement an open - source tool to automatically collect a large amount of message data from Telegram groups? This solves the technical problem of data acquisition. 3. **Characteristics of user activities**: - **Language use**: What is the distribution of languages used by users under different topics? - **Robot use**: What is the frequency of robot use in Telegram groups and what is its impact? - **Multimedia sharing**: What are the sharing frequencies and characteristics of different types of media (such as pictures, videos, GIFs, stickers, etc.) in different topics? - **External links**: How do users share external website links in groups, and which platforms do these links point to? 4. **Illegal or sensitive content**: Is there evidence that Telegram is used to spread potentially illegal content? For example, do dark - web groups contain more long - text messages, and do these messages involve illegal transactions? 5. **Cross - topic comparison**: By comparing the user behaviors of different - topic groups, what behavior patterns are universal and which are specific to a particular topic? ### Research methods To answer the above questions, the authors designed a two - stage crawler tool to obtain a list of public Telegram groups from the TGStat website and further join these groups to collect more than 50 million messages. Through the analysis of this data, they were able to compare the characteristics of different - topic groups from multiple perspectives (such as language, robot activity, multimedia sharing, etc.). ### Main findings - **Robot activity**: The proportion of messages generated by robots in some topics (such as language learning) is relatively high, while this proportion is relatively low in political groups, reflecting different levels of user participation. - **Message length**: Messages in dark - web groups are generally longer, while messages in language - learning and political groups are shorter. - **Multimedia use**: There are significant differences in the frequency and type of multimedia use in different - topic groups. For example, the resolution of videos in video and movie groups is relatively high, while the video quality in pornographic groups is higher but the quantity is less. - **External links**: Social networks (such as X, YouTube, Instagram) are the most frequently shared external links, but in technology - related groups, the frequency of using GitHub pages is relatively high. In general, this paper reveals the diverse use patterns of different - topic groups on the Telegram platform through multi - angle analysis, providing valuable data and insights for further research.