A Click Ahead: Real-Time Forecasting of Keyboard and Mouse Actions using RNNs and Computer Vision

Fabio Matti,Pierre Dillenbourg,Ludovico Novelli
2023-09-21
Abstract:Computer input is more complex than a sequence of single mouse clicks and keyboard presses. We introduce a novel method to identify and represent the user interactions and build a system which predicts - in real-time - the action a user is most likely going to take next. For this, a recurrent neural network (RNN) is trained on a person's usage of the computer. We demonstrate that it is enough to train the RNN on a user's activity over approximately a week to achieve an accuracy of 34.63 % when predicting the next action from a set of almost 500 possible actions. Specific examples for how these predictions may be leveraged to build tools for improving and speeding up workflows of computer users are discussed.
Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict in real - time the upcoming keyboard and mouse operations of users when they are using a computer. Specifically, the author proposes a new method to identify and represent users' interaction behaviors and constructs a system that can predict the most likely next operation of the user in real - time based on the user's past activity data. ### Main research questions: 1. **Complexity of user input**: Traditional user input models often simplify users' input behaviors, only considering a single mouse click or keyboard key press, while ignoring more complex interaction patterns. This paper aims to comprehensively capture users' input behaviors through introducing a new method, including combinations of mouse clicks and keyboard operations. 2. **Real - time prediction**: How to use deep - learning techniques (especially Recurrent Neural Networks, RNN) to accurately predict the user's next operation in a real - time environment. 3. **Scope of application**: How to apply this prediction technique to various different applications and tasks, not just specific tasks or applications. ### Solutions: - **Representation of user actions**: The author defines a new concept of "user action" to represent users' input behaviors. These actions include not only simple keyboard keys and mouse clicks but also complex combined operations (such as `Ctrl + C`) and clicks on interaction areas on the screen (buttons, links, text boxes, etc.). - **Data collection and processing**: By recording users' computer usage within a week, a large amount of user action data has been collected. To make these data independent of specific applications, the author uses computer vision techniques (such as the YOLOv5 model) to identify and extract interaction areas on the screen. - **Model training**: Use RNN (especially LSTM and GRU) to train the time - series data of user actions to predict the probability distribution of the next user action. - **Real - time application**: Develop a small application that can predict and visualize the five most likely upcoming user actions in real - time when the user is using the computer, thus helping the user improve work efficiency. ### Experimental results: - **Accuracy**: After less than five minutes of training, the GRU model achieved an accuracy rate of 34.63% on the validation set, which means that among nearly 500 possible user actions, the model can correctly predict the user's next operation in more than one - third of the cases. - **Application scenarios**: Discussed how to use these prediction results to improve the user's computer - using experience, such as automatically completing repeated inputs, providing more convenient button - accessing methods for visually - impaired users, and improving operation precision by attracting the mouse cursor to the most likely clicked button. Through these methods and techniques, the paper demonstrates the feasibility and potential value of predicting users' keyboard and mouse operations in a real - time environment.