Abstract:In this paper, we consider the general scenario of resource sharing in a decentralized system when the resource rewards/qualities are time-varying and unknown to the users, and using the same resource by multiple users leads to reduced quality due to resource sharing. Firstly, we consider a user-independent reward model with no communication between the users, where a user gets feedback about the congestion level in the resource it uses. Secondly, we consider user-specific rewards and allow costly communication between the users. The users have a cooperative goal of achieving the highest system utility. There are multiple obstacles in achieving this goal such as the decentralized nature of the system, unknown resource qualities, communication, computation and switching costs. We propose distributed learning algorithms with logarithmic regret with respect to the optimal allocation. Our logarithmic regret result holds under both i.i.d. and Markovian reward models, as well as under communication, computation and switching costs.
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to maximize system utility through distributed learning algorithms in a decentralized multi - user resource allocation environment when resource rewards change over time and are unknown. Specifically, the paper focuses on the following challenges:
1. **Temporal Variation and Uncertainty of Resource Rewards**: Each user's effect of using resources (i.e., rewards) is random and changes over time, and this information is unknown to the users.
2. **Performance Degradation Caused by Multiple Users Sharing Resources**: Multiple users using the same resource simultaneously will lead to a decline in resource quality, thus affecting rewards.
3. **Decentralized Characteristics of the System**: Since the system is decentralized, users cannot communicate directly or coordinate their actions.
4. **Communication, Computation, and Switching Costs**: Even if communication is allowed, it will bring additional costs, including communication costs, computation costs, and resource - switching costs.
To solve these problems, the paper proposes two main models:
### 1. User - Independent Reward Model (No Communication)
In this model, users cannot communicate with each other, but they can obtain feedback on the congestion level of the selected resource (i.e., how many other users are also using the same resource). Each user selects resources based on their own historical observations and actions. The paper proves that in this case, the system can achieve a logarithmic regret value through a distributed learning algorithm, that is, the gap in the total expected reward compared to the optimal static allocation is logarithmic.
#### Formulas:
- The goal of the system is to maximize the total expected utility of all users:
\[
v(n)=\sum_{k = 1}^{K}n_k\mu_{k,n_k}
\]
where \(n_k\) is the number of users using resource \(k\), and \(\mu_{k,n_k}\) is the average reward of resource \(k\) when \(n_k\) users are using it.
- The value of the optimal allocation is:
\[
v^*=\max_{n\in N}v(n)
\]
where \(N = \{n=(n_1,n_2,\ldots,n_K):n_k\geq0,n_1 + n_2+\ldots+n_K = M\}\) is the set of all possible user - to - resource allocations.
### 2. User - Specific Reward Model (With Communication)
In this model, resource rewards are user - specific, and different users may obtain different rewards when using the same resource. Users can communicate with each other, but communication is costly. The paper proposes a distributed online learning algorithm that can also achieve a logarithmic regret value in this case.
#### Formulas:
- The immediate reward for each user \(i\) when using resource \(k\) is \(r_i^k(t)\), and its expected reward is:
\[
\mu_i^{k,n}=\int r_i^k(s,n)F(ds)\quad\text{(for i.i.d. model)}
\]
or
\[
\mu_i^{k,n}=\sum_{s\in S_k}\mu_k^s r_i^k(s,n)\quad\text{(for Markovian model)}
\]
- The value of the optimal allocation is:
\[
v^*=\max_{\alpha\in K^M}\sum_{i = 1}^{M}\mu_i^{\alpha_i,n_{\alpha_i}(\alpha)}
\]
### Summary
The main contribution of the paper is to show that in the case of limited feedback and communication costs, resource allocation algorithms with logarithmic regret values can be designed. This provides an effective solution to the decentralized multi - user resource allocation problem, especially in application scenarios such as opportunistic spectrum access in cognitive radio networks.