Zipf's Law in Passwords.

Ding Wang,Gaopeng Jian,Ping Wang
2014-01-01
Abstract:Despite more than thirty years of intensive research efforts, textual passwords are still enveloped in mysterious veils. In this work, we make a substantial step forward in understanding the underlying distributions of passwords. By conducting linear regressions on a corpus of 97.2 million passwords (a mass of chaotic data), we for the first time show that Zipf’s law perfectly exists in user-generated passwords, figure out the corresponding exact distribution functions, and investigate some fundamental implications of our observations for password policies and password-based cryptographic protocols (e.g., authentication, encryption and signature). As one specific application of this law of nature, we propose the number of unique passwords used in regression and the absolute value of slope of the regression line together as a metric for assessing the strength of password datasets, and prove its correctness in a mathematically rigorous manner. In addition, extensive experiments (including optimal attacks, simulated optimal attacks and state-of-the-art cracking sessions) are performed to demonstrate the practical effectiveness of our metric. In two of four cases, our metric outperforms Bonneau’s α-guesswork in simplicity and to the best of knowledge, it is the first one that is both easy to approximate and accurate to facilitate comparisons, providing a useful tool for the security administrators to gain a precise grasp of the strength of their password datasets and to adjust the password policies more reasonably.
What problem does this paper attempt to address?