A Technique to Pre-trained Neural Network Language Model Customization to Software Development Domain

Pavel V. Dudarin,Vadim G. Tronin,Kirill V. Svyatov
DOI: https://doi.org/10.1007/978-3-030-30763-9_14
2019-01-01
Abstract:According to the CHAOS report from Standish Group during 1992–2017, the degree of success of projects in the development of software intensive systems (Software Intensive Systems, SIS) has changed insignificantly, remaining at the level of 50% inconsistency with the initial requirements (finance, time and functionality) for medium-sized projects. The annual financial losses in the world due to the total failures are of the order of hundreds of billion dollars. The majority of information about software projects has textual representation. Analysis of this information is vital for project status understanding, revealing problems on the early stage. Nowadays the majority of tasks in NLP field are solved by means of neural network language models. These models already have shown state-of-the-art results in classification, translation, named entity recognition, and so on. Pre-trained models are accessible in the internet, but the real life problem domain could differ from the origin domain where the network was learned. In this paper an approach to vocabulary expansion for neural network language model by means of hierarchical clustering is presented. This technique allows one to adopt pre-trained language model to a different domain.
What problem does this paper attempt to address?