Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation

Neha Agarwal,Geeta Sikka,Lalit Kumar Awasthi
DOI: https://doi.org/10.1016/j.eswa.2020.113682
IF: 8.5
2020-12-01
Expert Systems with Applications
Abstract:Due to the rapid growth of web services in repositories, discovering the requisite web service is becoming increasingly cumbersome task. It has raised the demand for efficient web service clustering algorithms. In service repositories, when related web services are stored in a clustered way, it enhances the web service discovery process by reducing search space and time. Many eminent researchers have worked in this field and used the Term Frequency – Inverse Document Frequency (TF-IDF) method for representing web services in vector space. In general, there are various limitations of the TF-IDF approach i.e. 1) Not efficient for large documents 2) Position of term and its co-occurrences does not matter 3) Unable to analyze how terms are dispersed in different documents. In the web service scenario, services are represented in short text form. TF-IDF does not work well in web service representation because of the reason that it is unable to effectively find the importance of a term concerning its occurrence in other documents. If we compare two service documents i.e. 's1' and 's2' first having a large and second having small number of terms respectively then TF-IDF does not demonstrate the importance of terms in 's1' as smaller to 's2'. Therefore, it is not possible to assign effective weights to the terms. In the lack of effective vector space representation, the performance of the clustering algorithm also degrades. In this paper, we propose a new approach i.e. LFW+K which is based on Length Feature Weight (LFW) for the vectorized representation of service followed by K-Means clustering. The proposed approach helps to find the informative term from web service and assigns the term weight accordingly by considering parameters like the dimension of the web service document, maximum frequency of a term in the document and occurrences of a term in other documents. LFW+K is applied on the datasets of real-world web services and the performance is measured using standard measurement criteria (i.e. precision, recall, F1-score, and accuracy). Results of the proposed approach are compared with K-Means Clustering on TF-IDF representation method i.e. TF-IDF+K. Results show that the proposed method outperforms the clustering done by using TF-IDF method for vector space representation of web services.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?