Code layer semantic analysis driving-based privacy data identification method

Yang Min,Yang Zhemin,Nan Yuhong,Zhang Yuan,Zhu Donglai
2018-01-01
Abstract:The invention belongs to the technical field of program information security detection, and particularly discloses a code layer semantic analysis driving-based privacy data identification method. Themethod comprises the steps of performing natural language processing technology-based privacy related semantic analysis and code segment localization: extracting a character string constant identifierin a code, performing preprocessing, matching semantic information in a character string constant with a predefined semantic related privacy dictionary, and judging whether specific privacy data is indicated or not through a part-of-speech tag in the character string constant and a dependency relationship of different words in a sentence phrase; and performing machine learning-based privacy related code segment identification: by adopting a support vector machine model of machine learning, extracting a code feature behavior used by the privacy data to judge whether the given code contains theprivacy data concerned by a system or not. By identifying the privacy data, the privacy data is marked as a sensitive data source, so that the leakage risk of user privacy data is lowered.
What problem does this paper attempt to address?