Abstract:Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. The Mamba architecture, known as a strong alternative to the transformer architecture, enables multi-circuit optimization with distinct parameters and target specifications. The effective scheduling strategy enhances sample efficiency by adjusting crucial MBRL training parameters. To the best of our knowledge, M3 is the first method for multi-circuit optimization by leveraging both the Mamba architecture and a MBRL with effective scheduling. As a result, it significantly improves sample efficiency compared to existing RL methods.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in analog circuit optimization, especially the multi - circuit optimization problem. Specifically, although current reinforcement learning (RL) methods have shown significant potential in analog circuit optimization, they face the following challenges when dealing with circuits with multiple different topologies and target specifications:
1. **High computational cost**: Traditional model - based reinforcement learning (MBRL) methods require a large amount of computational resources to generate and process synthetic data.
2. **Each circuit requires a customized model**: Existing RL methods usually train a dedicated model for each individual circuit, which leads to resource waste and inefficiency.
To solve these problems, the paper proposes M3 (Mamba - assisted Multi - Circuit Optimization via MBRL with Effective Scheduling), which is a new model - based reinforcement learning method that utilizes the Mamba architecture and an effective scheduling strategy to optimize multiple circuits. The main contributions of M3 are as follows:
- **Propose an online learning framework for multi - circuit optimization for the first time**: M3 can simultaneously process multiple circuits with different parameters and target specifications in a single neural network, thereby improving resource utilization and optimization efficiency.
- **Adopt an effective scheduling strategy**: By gradually adjusting key MBRL training parameters (such as the ratio of real to synthetic data, the number of update iterations per environmental step, and the number of rollouts), M3 can better balance exploration and exploitation, thereby improving sample efficiency.
- **Use the Mamba architecture for multi - circuit optimization for the first time**: The Mamba architecture, with its efficient linear complexity and constant memory usage, enables M3 to perform excellently in handling multi - circuit optimization.
### Formula Summary
1. **Reward Function**:
\[
r =
\begin{cases}
\text{FoM} & \text{if FoM} < - 0.02 \\
10 & \text{if FoM} \geq - 0.02
\end{cases}
\]
where,
\[
\text{FoM} = \sum_{i = 1}^{K} \min\left\{\frac{m_{c,i}-n_{c,i}}{m_{c,i}+n_{c,i}},0\right\}
\]
2. **Rollout Number Adjustment**:
\[
R(t)=\max\left(\min\left(e^{R(t)},R_M\right),R_m\right)
\]
where,
\[
e^{R(t)}=R_I + t\cdot\frac{R_F - R_I}{\text{scale}}
\]
3. **Ratio of Real to Synthetic Data Adjustment**:
\[
\alpha(t)=\max\left(\min\left(e^{\alpha(t)},\alpha_M\right),\alpha_m\right)
\]
where,
\[
e^{\alpha(t)}=\alpha_I + t\cdot\frac{\alpha_F - \alpha_I}{\text{scale}}
\]
4. **Update Iteration Number Adjustment**:
\[
T_a(t)=\max\left(\min\left(e^{T_a(t)},T_{a,M}\right),T_{a,m}\right)
\]
where,
\[
e^{T_a(t)}=T_{a,I}+t\cdot\frac{T_{a,F}-T_{a,I}}{\text{scale}}
\]
Through these formulas and strategies, M3 achieves efficient and flexible multi - circuit optimization, significantly improving sample efficiency and computing.