Mapping APIs in Dynamic-typed Programs by Leveraging Transfer Learning

Zhenfei Huang,Junjie Chen,Jiajun Jiang,Yihua Liang,Hanmo You,Fengjie Li
DOI: https://doi.org/10.1145/3641848
IF: 3.685
2024-01-22
ACM Transactions on Software Engineering and Methodology
Abstract:Application Programming Interface (API) migration is a common task for adapting software across different programming languages and platforms, where manually constructing the mapping relations between APIs is indeed time-consuming and error-prone. To facilitate this process, many automated API mapping approaches have been proposed. However, existing approaches were mainly designed and evaluated for mapping APIs of statically-typed languages, while their performance on dynamically-typed languages remains unexplored. In this paper, we conduct the first extensive study to explore existing API mapping approaches’ performance for mapping APIs in dynamically-typed languages, for which we have manually constructed a high-quality dataset. According to the empirical results, we have summarized several insights. In particular, the source code implementations of APIs can significantly improve the effectiveness of API mapping. However, due to the confidentiality policy, they may not be available in practice. To overcome this, we propose a novel API mapping approach, named Matl , which leverages the transfer learning technique to learn the semantic embeddings of source code implementations from large-scale open-source repositories and then transfers the learned model to facilitate the mapping of APIs. In this way, Matl can produce more accurate API embedding of its functionality for more effective mapping without knowing the source code of the APIs. To evaluate the performance of Matl , we have conducted an extensive study by comparing Matl with state-of-the-art approaches. The results demonstrate that Matl is indeed effective as it improves the state-of-the-art approach by at least 18.36% for mapping APIs of dynamically-typed language and by 30.77% for mapping APIs of the statically-typed language.
computer science, software engineering
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to solve the problem of API mapping in dynamic - type languages. Specifically, the paper focuses on the following points: 1. **Limitations of existing methods**: - Existing API mapping methods are mainly designed and evaluated for static - type languages (such as Java and C#), and the performance of these methods in dynamic - type languages (such as Python and JavaScript) has not been fully studied. - API mapping in dynamic - type languages faces more challenges because of the lack of type information, which may make methods based on usage patterns or documentation less effective. 2. **Actual requirements**: - With the rapid development of deep - learning technology, more and more deep - learning models need to be deployed on different platforms, which requires developers to be able to perform API migration efficiently. - Especially in dynamic - type languages like Python, the need for API migration is more urgent because Python is one of the most popular programming languages currently and is widely used in fields such as deep - learning and data science. 3. **Proposed new method**: - The paper proposes a new method named Matl, which uses transfer - learning techniques to learn the semantic embeddings of source - code implementations and applies them to API mapping tasks. - Matl can generate more accurate API embeddings by learning semantic features in large - scale open - source code repositories without accessing the API source code, thereby improving the effectiveness of API mapping. ### Specific problem description - **Background**: - API migration is a common task in software development, especially when the platform changes or the library is upgraded. The APIs on which the original software depends need to be updated to adapt to the new execution environment. - Existing automated program - conversion tools (such as Java2Csharp, 2to3 scripts, Patl4J, etc.) usually require external API mapping relationships as input, but building these mapping relationships is time - consuming and error - prone. - **Challenges**: - Providing API mapping relationships is very challenging and labor - intensive because it requires developers to have a deep understanding of the functions of different APIs. - The lack of type information in dynamic - type languages may make existing API mapping methods ineffective. - **Objectives**: - Evaluate the performance of existing API mapping methods in dynamic - type languages. - Propose a new API mapping method that can effectively perform API mapping without accessing the API source code. ### Solutions - **Research methods**: - The paper first conducts extensive empirical research to evaluate the performance of existing API mapping methods in dynamic - type languages. - The research finds that methods based on usage patterns perform poorly in dynamic - type languages, while methods based on documentation perform better, but are still limited by the unavailability of source code. - **New method Matl**: - Use transfer - learning techniques to learn the semantic embeddings of API signatures and documentation from large - scale open - source code repositories. - Perform API mapping based on the similarity of API embeddings through joint learning, thereby achieving more accurate API mapping without accessing the source code. ### Experimental results - **Performance evaluation**: - Compared with the existing state - of - the - art methods, Matl has an average 18.36% improvement in Top - 1 accuracy on dynamic - type languages (such as Python), and an average 30.77% improvement in accuracy on static - type languages (such as Java and Swift). - The experimental results show that Matl can significantly improve the accuracy of API mapping without relying on source - code implementation. ### Conclusions - **Contributions**: - Conducted the first empirical study on the performance of existing API mapping methods in dynamic - type languages. - Constructed a high - quality benchmark dataset and summarized a series of research findings, providing a basis for future research. - Proposed a new API mapping method Matl that can effectively perform API mapping without accessing the API source code. - Open - sourced all experimental data and implementations to promote future research.