Abstract:In this paper, we consider the general scenario of resource sharing in a decentralized system when the resource rewards/qualities are time-varying and unknown to the users, and using the same resource by multiple users leads to reduced quality due to resource sharing. Firstly, we consider a user-independent reward model with no communication between the users, where a user gets feedback about the congestion level in the resource it uses. Secondly, we consider user-specific rewards and allow costly communication between the users. The users have a cooperative goal of achieving the highest system utility. There are multiple obstacles in achieving this goal such as the decentralized nature of the system, unknown resource qualities, communication, computation and switching costs. We propose distributed learning algorithms with logarithmic regret with respect to the optimal allocation. Our logarithmic regret result holds under both i.i.d. and Markovian reward models, as well as under communication, computation and switching costs.

What problem does this paper attempt to address?

This paper attempts to solve the problem of how to maximize system utility through distributed learning algorithms in a decentralized multi - user resource allocation environment when resource rewards change over time and are unknown. Specifically, the paper focuses on the following challenges: 1. **Temporal Variation and Uncertainty of Resource Rewards**: Each user's effect of using resources (i.e., rewards) is random and changes over time, and this information is unknown to the users. 2. **Performance Degradation Caused by Multiple Users Sharing Resources**: Multiple users using the same resource simultaneously will lead to a decline in resource quality, thus affecting rewards. 3. **Decentralized Characteristics of the System**: Since the system is decentralized, users cannot communicate directly or coordinate their actions. 4. **Communication, Computation, and Switching Costs**: Even if communication is allowed, it will bring additional costs, including communication costs, computation costs, and resource - switching costs. To solve these problems, the paper proposes two main models: ### 1. User - Independent Reward Model (No Communication) In this model, users cannot communicate with each other, but they can obtain feedback on the congestion level of the selected resource (i.e., how many other users are also using the same resource). Each user selects resources based on their own historical observations and actions. The paper proves that in this case, the system can achieve a logarithmic regret value through a distributed learning algorithm, that is, the gap in the total expected reward compared to the optimal static allocation is logarithmic. #### Formulas: - The goal of the system is to maximize the total expected utility of all users: \[ v(n)=\sum_{k = 1}^{K}n_k\mu_{k,n_k} \] where \(n_k\) is the number of users using resource \(k\), and \(\mu_{k,n_k}\) is the average reward of resource \(k\) when \(n_k\) users are using it. - The value of the optimal allocation is: \[ v^*=\max_{n\in N}v(n) \] where \(N = \{n=(n_1,n_2,\ldots,n_K):n_k\geq0,n_1 + n_2+\ldots+n_K = M\}\) is the set of all possible user - to - resource allocations. ### 2. User - Specific Reward Model (With Communication) In this model, resource rewards are user - specific, and different users may obtain different rewards when using the same resource. Users can communicate with each other, but communication is costly. The paper proposes a distributed online learning algorithm that can also achieve a logarithmic regret value in this case. #### Formulas: - The immediate reward for each user \(i\) when using resource \(k\) is \(r_i^k(t)\), and its expected reward is: \[ \mu_i^{k,n}=\int r_i^k(s,n)F(ds)\quad\text{(for i.i.d. model)} \] or \[ \mu_i^{k,n}=\sum_{s\in S_k}\mu_k^s r_i^k(s,n)\quad\text{(for Markovian model)} \] - The value of the optimal allocation is: \[ v^*=\max_{\alpha\in K^M}\sum_{i = 1}^{M}\mu_i^{\alpha_i,n_{\alpha_i}(\alpha)} \] ### Summary The main contribution of the paper is to show that in the case of limited feedback and communication costs, resource allocation algorithms with logarithmic regret values can be designed. This provides an effective solution to the decentralized multi - user resource allocation problem, especially in application scenarios such as opportunistic spectrum access in cognitive radio networks.

Online Learning in Decentralized Multiuser Resource Sharing Problems

Decentralized Online Learning: Take Benefits from Others’ Data without Sharing Your Own to Track Global Trend

Distributed Online Learning via Cooperative Contextual Bandits

Decentralized Scheduling with QoS Constraints: Achieving O(1) QoS Regret of Multi-Player Bandits

Distributed Online Private Learning of Convex Nondecomposable Objectives

Pricing Mechanism for Resource Sustainability in Competitive Online Learning Multi-Agent Systems

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

Distributed Online Learning for Joint Regret with Communication Constraints

Coordinated Online Learning for Multi-Agent Systems with Coupled Constraints and Perturbed Utility Observations

Scale-Robust Timely Asynchronous Decentralized Learning

Online Discrete Optimization in Social Networks in the Presence of Knightian Uncertainty

Active Learning for Fair and Stable Online Allocations

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Decentralized Online Learning for Noncooperative Games in Dynamic Environments

Decentralized Online Learning for Noncooperative Games in Dynamic Environments

Distributed Autonomous Online Learning: Regrets And Intrinsic Privacy-Preserving Properties

Asynchronous Decentralized Online Learning

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution

Decentralized Multi-Task Online Convex Optimization Under Random Link Failures

Decentralized Multitask Online Convex Optimization Under Random Link Failures

Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing