DMTG: A Human-Like Mouse Trajectory Generation Bot Based on Entropy-Controlled Diffusion Networks

Jiahua Liu,Zeyuan Cui,Wenhan Ge,Pengxiang Zhan
2024-10-24
Abstract:CAPTCHAs protect against resource misuse and data theft by distinguishing human activity from automated bots. Advances in machine learning have made traditional image and text-based CAPTCHAs vulnerable to attacks, leading modern CAPTCHAs, such as GeeTest and Akamai, to incorporate behavioral analysis like mouse trajectory detection. Existing bypass techniques struggle to fully mimic human behavior, making it difficult to evaluate the effectiveness of anti-bot measures. To address this, we propose a diffusion model-based mouse trajectory generation framework (DMTG), which controls trajectory complexity and produces realistic human-like mouse movements. DMTG also provides white-box and black-box testing methods to assess its ability to bypass CAPTCHA systems. In experiments, DMTG reduces bot detection accuracy by 4.75%-9.73% compared to other models. Additionally, it mimics physical human behaviors, such as slow initiation and directional force differences, demonstrating improved performance in both simulation and real-world CAPTCHA scenarios.
Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the following two main issues: 1. **Anthropomorphic Stylization**: - Existing methods struggle to fully simulate human operational styles when generating mouse trajectories, including the non-repeatability of human operations, characteristics of ineffective operations, and differences in the initial stages. - To improve the simulation effect, the generated mouse trajectories need to better mimic the historical operational styles of different users, making the distinction between machine operations and human operations more blurred. 2. **Practical Testing Limitation**: - Current methods mainly rely on white-box testing and quality evaluation metrics, such as accuracy, lacking sufficient experiments in large-scale real-world CAPTCHA black-box environments. - This limitation weakens the robustness and generalization ability of most methods, thus requiring extensive validation in various commercial CAPTCHA systems to assess their effectiveness in real-world scenarios. ### Specific Research Questions To address the above issues, the paper proposes three core research questions (RQ): 1. **How to construct mouse trajectories that are both random and purposeful?** - The generated mouse trajectories need to exhibit the randomness of human operations while achieving specific goals. 2. **How to ensure these trajectories conform to human operational styles, making them indistinguishable from robot trajectories?** - The generated trajectories need to be similar to human operational habits in all aspects, including the density, mean, and distribution of the trajectories. 3. **How to evaluate the model's ability to bypass commercial CAPTCHA systems?** - It is necessary to test in the real world whether the model can successfully bypass commercial-grade CAPTCHA detectors. ### Solution The paper proposes a mouse trajectory generation framework based on an entropy-controlled diffusion network (DMTG), with the following steps: 1. **Data Collection**: - Use the SapiMouse dataset and Open Images V7 dataset as training data, containing a large number of human mouse operation samples. 2. **Model Construction**: - Construct a generator to generate controllable random trajectories and design a series of evaluators to assess bypassing and mimicking capabilities. - The core of the generator is the α-DDIM model, which generates trajectories with controllable randomness by adjusting the complexity control factor α. 3. **System Development**: - Implement a browser proxy robot to simulate a real operational environment and deploy the α-DDIM model. - The quality evaluators include simulation confidence evaluation and commercial CAPTCHA verification. 4. **Evaluation**: - Design white-box and black-box tests to respectively evaluate the human similarity of the generated trajectories and the ability to bypass commercial CAPTCHAs. Through these steps, the DMTG framework can generate highly realistic mouse trajectories while improving its robustness and generalization ability in practical applications.