Abstract:Computer input is more complex than a sequence of single mouse clicks and keyboard presses. We introduce a novel method to identify and represent the user interactions and build a system which predicts - in real-time - the action a user is most likely going to take next. For this, a recurrent neural network (RNN) is trained on a person's usage of the computer. We demonstrate that it is enough to train the RNN on a user's activity over approximately a week to achieve an accuracy of 34.63 % when predicting the next action from a set of almost 500 possible actions. Specific examples for how these predictions may be leveraged to build tools for improving and speeding up workflows of computer users are discussed.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to predict in real - time the upcoming keyboard and mouse operations of users when they are using a computer. Specifically, the author proposes a new method to identify and represent users' interaction behaviors and constructs a system that can predict the most likely next operation of the user in real - time based on the user's past activity data. ### Main research questions: 1. **Complexity of user input**: Traditional user input models often simplify users' input behaviors, only considering a single mouse click or keyboard key press, while ignoring more complex interaction patterns. This paper aims to comprehensively capture users' input behaviors through introducing a new method, including combinations of mouse clicks and keyboard operations. 2. **Real - time prediction**: How to use deep - learning techniques (especially Recurrent Neural Networks, RNN) to accurately predict the user's next operation in a real - time environment. 3. **Scope of application**: How to apply this prediction technique to various different applications and tasks, not just specific tasks or applications. ### Solutions: - **Representation of user actions**: The author defines a new concept of "user action" to represent users' input behaviors. These actions include not only simple keyboard keys and mouse clicks but also complex combined operations (such as `Ctrl + C`) and clicks on interaction areas on the screen (buttons, links, text boxes, etc.). - **Data collection and processing**: By recording users' computer usage within a week, a large amount of user action data has been collected. To make these data independent of specific applications, the author uses computer vision techniques (such as the YOLOv5 model) to identify and extract interaction areas on the screen. - **Model training**: Use RNN (especially LSTM and GRU) to train the time - series data of user actions to predict the probability distribution of the next user action. - **Real - time application**: Develop a small application that can predict and visualize the five most likely upcoming user actions in real - time when the user is using the computer, thus helping the user improve work efficiency. ### Experimental results: - **Accuracy**: After less than five minutes of training, the GRU model achieved an accuracy rate of 34.63% on the validation set, which means that among nearly 500 possible user actions, the model can correctly predict the user's next operation in more than one - third of the cases. - **Application scenarios**: Discussed how to use these prediction results to improve the user's computer - using experience, such as automatically completing repeated inputs, providing more convenient button - accessing methods for visually - impaired users, and improving operation precision by attracting the mouse cursor to the most likely clicked button. Through these methods and techniques, the paper demonstrates the feasibility and potential value of predicting users' keyboard and mouse operations in a real - time environment.

A Click Ahead: Real-Time Forecasting of Keyboard and Mouse Actions using RNNs and Computer Vision

Learning Efficient Representations of Mouse Movements to Predict User Attention

Virtual Mouse and Keyboard for Computer Interaction by Hand Gestures Using Machine Learning

Predicting human decision making in psychological tasks with recurrent neural networks.

Predicting human decision making in psychological tasks with recurrent neural networks

Predicting Mouse Click Position Using Long Short-Term Memory Model Trained by Joint Loss Function

Enabling Pointing Assistance in Adaptive Interfaces Using Mouse Pointing Intention Prediction

Typing on Any Surface: A Deep Learning-based Method for Real-Time Keystroke Detection in Augmented Reality

On human motion prediction using recurrent neural networks

Implementing a Real Time Virtual Mouse System and Fingertip Detection based on Artificial Intelligence

FABEL: Forecasting Animal Behavioral Events with Deep Learning-Based Computer Vision

Detection of upper limb abrupt gestures for human–machine interaction using deep learning techniques

Behavior Recognition in Mouse Videos Using Contextual Features Encoded by Spatial-temporal Stacked Fisher Vectors.

Implementation of Gesture-Based Virtual Keyboard and Mouse

I-Keyboard: Fully Imaginary Keyboard on Touch Devices Empowered by Deep Neural Decoder

Survey on Gesture-Based Virtual Keyboard and Mouse

English Sentence Recognition using Artificial Neural Network through Mouse-based Gestures

Computing a human-like reaction time metric from stable recurrent vision models

Using Motion Forecasting for Behavior-Based Virtual Reality (VR) Authentication

Deep Learning-Based Real-Time AI Virtual Mouse System Using Computer Vision to Avoid COVID-19 Spread

Forecasting Player Behavioral Data and Simulating in-Game Events