Abstract:We consider the problem of power allocation over a time-varying channel with unknown distribution in energy harvesting communication systems. In this problem, the transmitter has to choose the transmit power based on the amount of stored energy in its battery with the goal of maximizing the average rate obtained over time. We model this problem as a Markov decision process (MDP) with the transmitter as the agent, the battery status as the state, the transmit power as the action and the rate obtained as the reward. The average reward maximization problem over the MDP can be solved by a linear program (LP) that uses the transition probabilities for the state-action pairs and their reward values to choose a power allocation policy. Since the rewards associated the state-action pairs are unknown, we propose two online learning algorithms: UCLP and Epoch-UCLP that learn these rewards and adapt their policies along the way. The UCLP algorithm solves the LP at each step to decide its current policy using the upper confidence bounds on the rewards, while the Epoch-UCLP algorithm divides the time into epochs, solves the LP only at the beginning of the epochs and follows the obtained policy in that epoch. We prove that the reward losses or regrets incurred by both these algorithms are upper bounded by constants. Epoch-UCLP incurs a higher regret compared to UCLP, but reduces the computational requirements substantially. We also show that the presented algorithms work for online learning in cost minimization problems like the packet scheduling with power-delay tradeoff with minor changes.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper mainly explores the problem of how to allocate power to maximize the average transmission rate in energy - harvesting communication systems when the channel distribution is unknown. Specifically: 1. **Research Background**: - Energy - harvesting communication systems use the energy obtained from nature (such as solar energy, thermal energy, etc.) for data transmission. - The performance of such systems depends on how to effectively use the energy currently stored in the battery and the energy to be harvested in the future. 2. **Problem Description**: - In the single - channel case, the sender needs to select the transmit power according to the stored energy in the battery, with the goal of maximizing the average transmission rate over time. - In the multi - channel case, the sender needs not only to select the transmit power but also to select a channel for transmission. 3. **Modeling Method**: - Model this problem as a Markov decision process (MDP), where the sender is the agent, the battery state is the state, the transmit power is the action, and the transmission rate is the reward. - The average reward maximization problem can be represented by linear programming (LP), using the transition probabilities of state - action pairs and their reward values to select the power allocation strategy. 4. **Challenges**: - Due to the uncertainty of the channel, the average reward of state - action pairs is unknown. - Therefore, it is necessary to design online learning algorithms to gradually learn these rewards and adjust the strategy. 5. **Solutions**: - Two online learning algorithms are proposed: LPSM (linear programming based on sample mean) and Epoch - LPSM. - For these two algorithms, it is proved that their regret is upper - bounded by a constant. - The LPSM algorithm can exactly match the optimal strategy within a finite expected time. - Although the Epoch - LPSM has a higher regret, it significantly reduces the computational requirements. - For the multi - channel case, the MC - LPSM algorithm is proposed to explore different channels and use this information to solve the LP problem. Its regret grows logarithmically with time and is linearly related to the number of channels. 6. **Contributions**: - For the first time, a constant - regret learning algorithm for MDPs with unknown average rewards is proposed. - It is proved that the LPSM algorithm can exactly match the optimal strategy within a finite expected time. - The Epoch - LPSM algorithm with lower computational complexity is proposed, and its performance trade - off is analyzed. - Extended to the multi - channel scenario, the MC - LPSM algorithm is proposed, and its asymptotic optimality is proved. Through these methods, the paper aims to solve the optimization problem of power allocation in energy - harvesting communication systems, ensuring efficient energy utilization and maximizing the transmission rate even under uncertain channel conditions.

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Reinforcement Learning Approaches for IoT Networks with Energy Harvesting

Joint Transmit and Jamming Power Optimization for Secrecy in Energy Harvesting Networks: A Reinforcement Learning Approach

Optimal Fairness-Aware Time and Power Allocation in Wireless Powered Communication Networks.

Online Power Allocation at Energy Harvesting Transmitter for Multiple Receivers with and without Individual Rate Constraints for OMA and NOMA Transmissions

Delay-optimal Random Access in Large-Scale Energy Harvesting IoT Networks Based on Mean Field Game

A Dynamic Power Allocation Scheme in Power-Domain NOMA Using Actor-Critic Reinforcement Learning.

Online Time Sharing Policy in Energy Harvesting Cognitive Radio Network with Channel Uncertainty

Balancing Delay and Energy Efficiency in Energy Harvesting Cognitive Radio Networks: A Stochastic Stackelberg Game Approach

Learning Non-myopic Power Allocation in Constrained Scenarios

Distributive Stochastic Learning for Delay-Optimal OFDMA Power and Subband Allocation

Online Power Control for Distributed Multitask Learning Over Noisy Fading Wireless Channels

Reinforcement Learning Based Power Control for Reliable Mission-Critical Wireless Transmission

Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

Online Power Control and Optimization for Energy Harvesting Communication System Based on State of Charge

Distributed Power Control for Large Energy Harvesting Networks: A Multi-Agent Deep Reinforcement Learning Approach

Low-Latency and Energy-Efficient Wireless Communications with Energy Harvesting

Competitive Ratio Analysis of Online Algorithms to Minimize Data Transmission Time in Energy Harvesting Communication System

Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems

Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach

Importance-Aware Fresh Delivery of Versions over Energy Harvesting MACs