Reliability Enhancement in Cloud Computing Via Optimized Job Scheduling Implementing Reinforcement Learning Algorithm and Queuing Theory

Husamelddin A. M. Balla,Chen Guang Sheng,Weipeng Jing
DOI: https://doi.org/10.1109/ICDIS.2018.00027
2018-01-01
Abstract:Reliability in cloud systems is an important aspect of delivering stable cloud services for users. Focusing on improving successful execution of tasks under resource constraints, this work proposes an enhanced and effective resource management method to achieve reliability within the cloud environment. The proposed method employs an adaptive reinforcement learning algorithm merged with the queuing theory to schedule user requests. There are many dynamic changes in the cloud environment in terms of resource availability and attributes that make a reliable task execution difficult to guarantee. As a solution to this problem, our approach employs a task scheduler, which can effectively adapt to those dynamic changes and successfully schedule user requests. We developed an adaptive action-selection method that aims to control the action selection dynamically (i.e., suitable virtual machine selection), considering the queue buffer size and uncertainty value function. To evaluate the performance of our approach, we conduct several experiments and compare our approach with greedy and random job scheduling policies, in terms of successful task execution, utilization rate, and response time. The numerical results demonstrate the efficiency of our method.
What problem does this paper attempt to address?