Abstract:As the specific incarnation of cyber-physical-social systems, in deregulated electricity market, the market gaming behaviors may have significantly affected the costs of electricity delivered to the market. Especially, from the supply side, the primary goal of power generating companies (PGCs) is to develop strategic biddings to maximize their profits in long-term trading, when facing intrinsic uncertainty. Typically, in such repeated and dynamic settings, one fundamental challenge is that, any PGC neither has prior knowledge about all unknown opponents' incentives, nor observes their strategies and obtained profits. Especially, the common setting is that, once the bidding auction has occurred, the PGC only observes the market clearing price (MCP) at each round, and winning or losing status. While it is typical to assume some perfect or bounded rationality model of the PGCs, their real behaviors do not follow such assumptions due to lack of complete information, computational intractability, or lack of perfect execution, etc. We formulate the problem of sequentially optimizing any PGC's bids with an adversarial multi-armed bandit (MAB) model. Specifically, at each round, a PGC chooses to play against all other opponents from an infinite set of possible strategies that are split into continuous intervals by sequentially occurred MCPs. Then at the end of each round, the PGC observes the outcome of the auction and updates its estimation on the expected bid's fitness for each interval (i.e., how much the expected profit of the interval could be achieved), and selects the bid for the next round using the proposed algorithm Exp3C (i.e., exponential-weight for exploration and exploitation with continuous value). The experimental results based on real dataset demonstrate that Exp3C performs better than other heuristic schemes including pure greedy, -greedy and MCP predication based bidding schemes. Moreover, we theoretically prove the upper bound of average Exp3C regret per round follows , where T is the number of total rounds. In summary, the proposed Exp3C has two distinguished advantages. First it is distributed, since its decisions uniquely depend on its past decisions and profits. Second, it is rational, since a PGC is given guarantees on its own accumulated profit regardless of other PGCs' behaviors.

Deep Inverse Reinforcement Learning for Objective Function Identification in Bidding Models

Deep Reinforcement Learning for Strategic Bidding in Electricity Markets

Bidding Strategy Evolution Analysis Based on Multi-Task Inverse Reinforcement Learning

Multi-Market Bidding Behavior Analysis of Energy Storage System Based on Inverse Reinforcement Learning

A Deep Reinforcement Learning Bidding Algorithm on Electricity Market

Deep reinforcement learning-based optimal bidding strategy for real-time multi-participant electricity market with short-term load

A Reinforcement Learning Method for Power Suppliers' Strategic Bidding with Insufficient Information

Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Bidding Strategic of Virtual Power Plant Based on End-to-End Deep Reinforcement Learning

Bidding Strategy of Two-Layer Optimization Model for Electricity Market Considering Renewable Energy Based on Deep Reinforcement Learning

A data‐driven method for microgrid bidding optimization in electricity market

High-dimensional Bid Learning for Energy Storage Bidding in Energy Markets

Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity

Neural Fitted Q Iteration based Optimal Bidding Strategy in Real Time Reactive Power Market_1

Analysis of Evolutionary Dynamics for Bidding Strategy Driven by Multi-Agent Reinforcement Learning

Earning While Learning: An Adversarial Multi-Armed Bandit Based Real-Time Bidding Scheme in Deregulated Electricity Market

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

Applying Opponent Modeling for Automatic Bidding in Online Repeated Auctions

Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising

Reinforcement Learning for Bidding Strategy Optimization in Day-Ahead Energy Market

Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach