Program Comprehension Using Information Retrieval and Probabilistic Finite-State Automata

陈华,王灿,陈纯,唐文彬,钱剑飞
DOI: https://doi.org/10.3785/j.issn.1008-973x.2008.12.013
2008-01-01
Abstract:To improve the accuracy of information retrieval(IR) based program comprehension method,a new two stages method was proposed,which consists of IR stage and probabilistic finite-state automata(PFA) recognition stage.This method uses, PFAs to address the problem of imprecise in applying IR in program comprehension directly.Meanwhile,applying IR makes it possible to construct many simple PFAs rather than a big complex one to greatly improve the scalability of recognition.PFAs are learned from clusters generated by latent semantic analysis(LSA) in training state.In recognition state,source code segment is processed in lexical,and then it is used as an IR query to retrieve n candidate plans.After that,the corresponding PFAs of the plans are found,and the PFA with maximum probability is chosen.Finally,the code segment is marked with the same semantic as the result PFA.
What problem does this paper attempt to address?