Abstract:Password guessing plays an important role in studying the vulnerability of passwords to improve security. In modern password guessing methods, the patterns of passwords from users in specific regions are discovered from a large number of leaked passwords. Most traditional methods, such as PCFG, Markov process, and other deep learning methods rely only on the training set. Different from other application areas of machine learning, the training set of password guessing comes from leaked real password sets, such as Rockyou, CSDN, and VK. Traditional approaches of password guessing are effective for large-scale training sets. However, the size of leaked password sets leaked by users of small languages or users of specific organizations is very small, which makes it difficult for current password guessing methods which relying only on training sets to discover enough words in passwords. In order to solve this problem, this paper proposed a corpus-based password guessing method. First, we analyzed the common words and their categories in the leaked password sets from users in three different countries. On this basis, we proposed an organization method for multiple language corpora, and constructed corpora of more than 3 million words. Secondly, we improved the traditional PCFG password segmentation method and described password structure based on corpora. Third, we evaluated the probability of words in the corpora which are not appearing in the training set based on the Lapalace smoothing. Actual tests show that our method can produce a finer structure than the PCFG. When the size of the training set decreases, the cracking rate of the PCFG decreases significantly, while the impact of our method is not significant, and the cracking rate is significantly higher than that of the PCFG.

GENPass: A Multi-Source Deep Learning Model for Password Guessing.

Genpass: A General Deep Learning Model For Password Guessing With Pcfg Rules And Adversarial Generation

PassTCN-PPLL: A Password Guessing Model Based on Probability Label Learning and Temporal Convolutional Neural Network

Password Guessing Based on GAN with Gumbel-Softmax

Mangling Rules Generation with Density-Based Clustering for Password Guessing

Password Guessing Based on Semantic Analysis and Neural Networks

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

Search-based Ordered Password Generation of Autoregressive Neural Networks

Corpora-based Password Guessing: an Efficient Approach for Small Training Sets

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

PASS2EDIT: A Multi-Step Generative Model for Guessing Edited Passwords

Password Guessing Based On Lstm Recurrent Neural Networks

PassGPT: Password Modeling and (Guided) Generation with Large Language Models

A New Targeted Password Guessing Model.

Recurrent Neural Network Based Password Generation for Group Attribute Context-Ware Applications.

Modified Password Guessing Methods Based on TarGuess-I

Improving Real-world Password Guessing Attacks via Bi-directional Transformers

TransPCFG : Transferring the Grammars From Short Passwords to Guess Long Passwords Effectively

Using personal information to aid in guessing passwords of Chinese webs

Comprehensive overview of plaintext password generation models

Targeted Online Password Guessing