Abstract:This study investigates the problem of joint channel and power allocation in stochastic underwater acoustic communication networks. And the multiarmed bandit theory is employed to model this problem which includes unknown variables. This study presents two two-tier learning algorithms, which do not need any prior environment information. In the upper learning, the user plays the predicted best strategy and learns the actual played strategy. In the lower learning, “outdated virtual learning information,” which can be obtained by the information of actual played strategy, is learned. The two-tier actual–virtual learning enormously enriches the learning information and effectively improves the learning ability. And multidimensional learning method is presented to ease the difficulty caused by the coupling of joint strategy. With the evolution of learning time, the emphasis of learning transfers from the channel power sub-strategy dimension to the entire power strategy dimension. Due to the specific learning manner, the algorithms have high tolerance about delay and non-complete information. Simulation results show high performance and adaptability of the proposed learning algorithms.

MAB-based Two-Tier Learning Algorithms for Joint Channel and Power Allocation in Stochastic Underwater Acoustic Communication Networks