Abstract:Large-scale real user password sets are well regarded important in the field of system security research,due to their usages in evaluating the efficacy of the algorithms that guess passwords,and detecting defects of existing password protection mechanisms,etc.At present,some ways of capturing real passwords are available for researchers,such as accidental or malicious passwords disclosure,voluntary user contributions,or sharing by voluntary websites for research purposes.However,there are some serious limitations involved in collecting user password sets in the above ways.For example,password sets that are captured from passwords disclosure may have been tampered,and therefore their quality cannot be guaranteed.What's more,types of these password sets are limited.As a result,it is still difficult for researches to have access to the large-scale clear-text user passwords in a systematic manner.Motivated to resolve the above issue,this paper presents a sample perturbation based password generation algorithm(SPPG for short).The algorithm is to use a given small-scale real user password sample as a training set to generate a probability model that can then be used to provide large-scale password sets.The small-scale sample is relatively easier to obtain.With the purpose of improving the authenticity of the simulation password sets,the SPPG algorithm is designed based on the idea of sample perturbation.On the one hand,the algorithm takes advantage of the Probabilistic Context-Free Grammar to parse the sample,and then generates passwords that have the same structures with passwords in the sample.On the other hand,it also utilizes rules that are frequently used for users to deform their passwords,and then generates passwords that are similar to passwords in the sample.To evaluate the efficacy of the SPPG algorithm,this paper presents a set of criteria to evaluate the quality of the simulation password sets.These criteria include the coverage rate of the real passwords,the goodness of fit to the Zipf distribution,the similarity of password structure distributions and the proportion of special patterns.In the end,this paper compares the efficacy of the SPPG algorithm with the popular probability models of password guessing,including the Probabilistic Context-Free Grammar and several variants of the Markov models.In the experiment,small-scale samples are randomly selected from real user password sets,and then are used by different models to generate the simulation password sets.The experiment results show that the SPPG algorithm has better performances.On average,the coverage of the real passwords is improved by 9.58％ and 72.79％ respectively compared with the Probabilistic Context-Free Grammar and the 4-order Markov model.And the coverage of the real passwords is 10.34 times more than the 3-order Markov model and 13.41 times more than the 1-order Markov model.Besides,the goodness of fit to the Zipf distribution remains at a high level that is no less than 0.9.As for the password structure distribution and the proportion of special patterns,simulation password sets generated by the SPPG algorithm are also shown to be more similar to the real password sets compared with simulation password sets generated by the other models.

#Segments: A Dominant Factor of Password Security to Resist against Data-driven Guessing

TransPCFG : Transferring the Grammars From Short Passwords to Guess Long Passwords Effectively

Mangling Rules Generation with Density-Based Clustering for Password Guessing

Chunk-Level Password Guessing: Towards Modeling Refined Password Composition Representations

Improved Wordpcfg for Passwords with Maximum Probability Segmentation

Corpora-based Password Guessing: an Efficient Approach for Small Training Sets

Using personal information to aid in guessing passwords of Chinese webs

Improved Probabilistic Context-Free Grammars for Passwords Using Word Extraction

Modified Password Guessing Methods Based on TarGuess-I

Targeted Online Password Guessing

A New Targeted Password Guessing Model.

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Password Guessing Time Based on Guessing Entropy and Long-Tailed Password Distribution in the Large-Scale Password Dataset

Special Characters Usage and Its Effect On Password Security

Understanding Passwords of Chinese Users : A Survey and Empirical Analysis

Password guessers under a microscope: an in-depth analysis to inform deployments

A New Targeted Online Password Guessing Algorithm Based on Old Password.

Digit Semantics Based Optimization for Practical Password Cracking Tools.

The Effect of Domain Terms on Password Security

An Efficient Algorithm to Generate Password Sets Based on Samples

Improving Real-world Password Guessing Attacks via Bi-directional Transformers