Software Internationalization and Localization: an Industrial Experience
Xin Xia,David Lo,Feng Zhu,Xinyu Wang,Bo Zhou
DOI: https://doi.org/10.1109/iceccs.2013.40
2013-01-01
Abstract:Software internationalization and localization are important steps in distributing and deploying software to different regions of the world. Internationalization refers to the process of reengineering a system such that it could support various languages and regions without further modification. Localization refers to the process of adapting an internationalized software for a specific language or region. Due to various reasons, many large legacy systems did not consider internationalization and localization at the early stage of development. In this paper, we present our experience on, and propose a process along with tool supports for software internationalization and localization. We reengineer a large legacy commercial financial system called PAM of State Street Corporation, which is written in C/C++, containing 30 different modules, and more than 5 millions of lines of source code. We propose a source code ranker that recovers important source code to be analyzed. Based on this code, we extract general patterns of the source code that need to be reengineered for internationalization. We divide the patterns into 2 categories: convertible patterns and suspicious patterns. To locate the source code that need to be modified, we develop an automated tool I18nLocator, that consumes these patterns and outputs the locations that match the patterns. The source codes matching the convertible patterns are automatically converted, and those matching the suspicious patterns are converted by developers considering the context of the corresponding codes. For localization, we extract hard-coded strings, translate them, and store them into resource data files. Out of the 504 thousands of lines of source code that are modified using our proposed approach, we can automatically modify 79.76% of them, saving much valuable developers' time. The quality of the resultant system is also good. The number of bugs per lines of code modified found during user acceptance test and deployment to the production environment is 0.000218 bugs/LOC.