Automatic and Accurate Expansion of Abbreviations in Parameters
Yanjie Jiang,Hui Liu,Jiaqi Zhu,Lu Zhang
DOI: https://doi.org/10.1109/tse.2018.2868762
IF: 7.4
2020-01-01
IEEE Transactions on Software Engineering
Abstract:Abbreviations are widely used in identifiers. However, they have severe negative impact on program comprehension and IR-based software maintenance activities, e.g., concept location, software clustering, and recovery of traceability links. Consequently, a number of efficient approaches have been proposed successfully to expand abbreviations in identifiers. Most of such approaches rely heavily on dictionaries, and rarely exploit the specific and fine-grained context of identifiers. As a result, such approaches are less accurate in expanding abbreviations (especially short ones) that may match multiple dictionary words. To this end, in this paper we propose an automatic approach to improve the accuracy of abbreviation expansion by exploiting the specific and fine-grained context. It focuses on a special but common category of abbreviations (abbreviations in parameter names), and thus it can exploit the specific and fine-grained context, i.e., the type of the enclosing parameter as well the corresponding formal (or actual) parameter name. The recent empirical study on parameters suggest that actual parameters are often lexically similar to their corresponding formal parameters. Consequently, it is likely that an abbreviation in a formal parameter can find its full terms in the corresponding actual parameter, and vice versa. Based on this assumption, a series of heuristics are proposed to look for full terms from the corresponding actual (or formal) parameter names. To the best of our knowledge, we are the first to expand abbreviations by exploiting the lexical similarity between actual and formal parameters. We also search for full terms in the data type of the enclosing parameter. Only if all such heuristics fail, the approach turns to the traditional abbreviation dictionaries. We evaluate the proposed approach on seven well known open-source projects. Evaluation results suggest that when only parameter abbreviations are involved, the proposed approach can improve the precision from 26 to 95 percent and recall from 26 to 65 percent compared against the state-of-the-art general purpose approach. Consequently, the proposed approach could be employed as a useful supplement to existing approaches to expand parameter abbreviations.