Abstract:Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. The Mamba architecture, known as a strong alternative to the transformer architecture, enables multi-circuit optimization with distinct parameters and target specifications. The effective scheduling strategy enhances sample efficiency by adjusting crucial MBRL training parameters. To the best of our knowledge, M3 is the first method for multi-circuit optimization by leveraging both the Mamba architecture and a MBRL with effective scheduling. As a result, it significantly improves sample efficiency compared to existing RL methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in analog circuit optimization, especially the multi - circuit optimization problem. Specifically, although current reinforcement learning (RL) methods have shown significant potential in analog circuit optimization, they face the following challenges when dealing with circuits with multiple different topologies and target specifications: 1. **High computational cost**: Traditional model - based reinforcement learning (MBRL) methods require a large amount of computational resources to generate and process synthetic data. 2. **Each circuit requires a customized model**: Existing RL methods usually train a dedicated model for each individual circuit, which leads to resource waste and inefficiency. To solve these problems, the paper proposes M3 (Mamba - assisted Multi - Circuit Optimization via MBRL with Effective Scheduling), which is a new model - based reinforcement learning method that utilizes the Mamba architecture and an effective scheduling strategy to optimize multiple circuits. The main contributions of M3 are as follows: - **Propose an online learning framework for multi - circuit optimization for the first time**: M3 can simultaneously process multiple circuits with different parameters and target specifications in a single neural network, thereby improving resource utilization and optimization efficiency. - **Adopt an effective scheduling strategy**: By gradually adjusting key MBRL training parameters (such as the ratio of real to synthetic data, the number of update iterations per environmental step, and the number of rollouts), M3 can better balance exploration and exploitation, thereby improving sample efficiency. - **Use the Mamba architecture for multi - circuit optimization for the first time**: The Mamba architecture, with its efficient linear complexity and constant memory usage, enables M3 to perform excellently in handling multi - circuit optimization. ### Formula Summary 1. **Reward Function**: \[ r = \begin{cases} \text{FoM} & \text{if FoM} < - 0.02 \\ 10 & \text{if FoM} \geq - 0.02 \end{cases} \] where, \[ \text{FoM} = \sum_{i = 1}^{K} \min\left\{\frac{m_{c,i}-n_{c,i}}{m_{c,i}+n_{c,i}},0\right\} \] 2. **Rollout Number Adjustment**: \[ R(t)=\max\left(\min\left(e^{R(t)},R_M\right),R_m\right) \] where, \[ e^{R(t)}=R_I + t\cdot\frac{R_F - R_I}{\text{scale}} \] 3. **Ratio of Real to Synthetic Data Adjustment**: \[ \alpha(t)=\max\left(\min\left(e^{\alpha(t)},\alpha_M\right),\alpha_m\right) \] where, \[ e^{\alpha(t)}=\alpha_I + t\cdot\frac{\alpha_F - \alpha_I}{\text{scale}} \] 4. **Update Iteration Number Adjustment**: \[ T_a(t)=\max\left(\min\left(e^{T_a(t)},T_{a,M}\right),T_{a,m}\right) \] where, \[ e^{T_a(t)}=T_{a,I}+t\cdot\frac{T_{a,F}-T_{a,I}}{\text{scale}} \] Through these formulas and strategies, M3 achieves efficient and flexible multi - circuit optimization, significantly improving sample efficiency and computing.

M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Multiagent Based Reinforcement Learning (MA-RL): An Automated Designer for Complex Analog Circuits

A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design

Automated Design of Complex Analog Circuits with Multiagent Based Reinforcement Learning.

An Open-Source AMS Circuit Optimization Framework Based on Reinforcement Learning—From Specifications to Layouts

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Incremental reinforcement learning for multi-objective analog circuit design acceleration

An innovative multi-head attention model with BiMGRU for real-time electric vehicle charging management through deep reinforcement learning

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

RoSE-Opt: Robust and Efficient Analog Circuit Parameter Optimization with Knowledge-infused Reinforcement Learning

M^3RS: Multi-robot, Multi-objective, and Multi-mode Routing and Scheduling

Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control

Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

A novel reinforcement learning-based method for structure optimization

Benchmarking Model-Based Reinforcement Learning

MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models

Physical Informed-Inspired Deep Reinforcement Learning Based Bi-Level Programming for Microgrid Scheduling

MBRL-MC: An HVAC Control Approach via Combining Model-based Deep Reinforcement Learning and Model Predictive Control