ProSwats: A Proxy-based Scientific Workflow Retrieval Approach by Bridging the Gap Between Textual and Structural Semantics

Yang Gu,Jian Cao,Shiyou Qian,Nengjun Zhu,Wei Guan
DOI: https://doi.org/10.1109/icws62655.2024.00111
2024-01-01
Abstract:It is time-consuming and knowledge-intensive for scientists to find practical workflows from the massive number of scientific workflow models. Currently, the retrieval approaches are mainly based on text matching between natural language queries and the descriptions of candidate workflows. Notably, the workflow structure also provides essential semantics, but the challenge lies in effectively matching these two pieces of heterogeneous information. To address this issue, we propose a Proxy-based Scientific workflow retrieval approach, ProSwats, which selects a workflow as the Proxy for each text query to bridge the gap between textual and structural semantics. ProSwats consists of two stages: workflow pre-selection based on text similarity and workflow ranking based on a matching degree prediction model. The textual and structural features are integrated by the proxy in this model, which is used to predict and rank the degree of semantic matching between the user query and candidate workflows. Crucially, ProSwats incorporates a confidence-aware learning mechanism to adapt to the varying reliability of proxies, enhancing generalizability. Extensive experimental results on two real-world datasets demonstrate that ProSwats outperforms state-of-the-art methods with statistical significance.
What problem does this paper attempt to address?