Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals

Thomas Quirchmayr,Barbara Paech,Roland Kohl,Hannes Karey,Gunar Kasdepke
DOI: https://doi.org/10.1007/s10664-018-9597-6
IF: 3.762
2018-02-19
Empirical Software Engineering
Abstract:Mature software systems comprise a vast number of heterogeneous system capabilities which are usually requested by different groups of stakeholders and which evolve over time. Software features describe and bundle low level capabilities logically on an abstract level and thus provide a structured and comprehensive overview of the entire capabilities of a software system. Software features are often not explicitly managed. Quite the contrary, feature-relevant information is often spread across several software engineering artifacts (e.g., user manual, issue tracking systems). It requires huge manual effort to identify and extract feature-relevant information from these artifacts in order to make feature knowledge explicit. In this paper we present a two-step-approach to extract feature-relevant information from a user manual: First we semi-automatically extract a domain terminology from a natural language user manual based on linguistic patterns. Then, we apply natural language processing techniques based on the extracted domain terminology and structural sentence information. Our approach is able to extract atomic feature-relevant information with an F1-score of at least 92.00%. We describe the implementation of the approach as well as evaluations based on example sections of a user manual taken from industry.
computer science, software engineering
What problem does this paper attempt to address?