A novel technique for identification and classification of HIV/AIDS related social media data using LD-KMEANS and DBN-LSTM
DOI: https://doi.org/10.1007/s11042-024-19283-9
IF: 2.577
2024-05-11
Multimedia Tools and Applications
Abstract:To understand the mass behaviour of people, an effectual platform was provided by the online social network, which aids in developing techniques for the surveillance of Human Immunodeficiency Viruses/Acquired Immunodeficiency Syndrome (HIV/AIDS). With the rapid advancement of social sites, namely Facebook, Twitter, and blogs, the social networking approach is the most promising factor in HIV/AIDS investigation. Recently, most of the prevailing works implemented various frameworks to classify HIV/AIDS-related information using Social Media (SM) data. However, the traditional techniques were not generalized well enough to handle the complex structure of SM data. Also, the existing models were less effective due to the lack of annotation processes and pre-processing strategies. In this paper, to identify as well as classify HIV/AIDS-related SM data, a novel strategy has been proposed by utilizing the Levenshtein Distance-KMeans algorithm (LD-KMeans) and Deep Belief Networks—Long Short Term Memory (DBN-LSTM) models. The proposed work mainly focuses on the discussions on HIV and AIDS-related issues taking place on Twitter. For an efficient HIV/AIDS-related tweet classification, the proposed work undergoes the following steps. Initially, the tweets from Twitter are extracted by using Twitter API, and then, the preprocessing function is performed on the Twitter data. Then, the annotation extraction is performed. Next, the tweets are separated into organization tweets and person tweets based on the annotation. In the proposed work, organization tweets are highly considered. After that, the text normalization is performed, which provides the cleaned structured tweets. Then, the hashtags related to HIV and AIDS are identified and grouped together by using the LD-KMeans algorithm. Thereafter, the word embedding is performed by means of M-Word2Vec. Once the embedding process is completed, the most important features are selected by the LS-DFO algorithm. Finally, on the basis of selected features, the classification is performed, which efficiently classifies the HIV/AIDS-related tweets into different categories like symptoms, awareness, medicine, and reason. In this research work, Twitter data are utilized. Then, the outcomes obtained by the proposed methodology are analogized with the prevailing algorithms. Thus, the analysis results proved that the research methodology obtained a better accuracy, sensitivity, and specificity of 94.65%, 94.56%, and 94.25%, respectively. Likewise, the proposed work reached a tweet identification time of 72154 ms. Finally, the experiential outcomes demonstrated that regarding sensitivity, specificity, along with accuracy, the proposed model outperformed the prevailing systems in the process of classifying the HIV\AIDS-related tweets.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering