新建
神经网络激活函数 SiLU vs GELU

cyrus

推荐镜像 :Basic Image:ubuntu:22.04-py3.10-pytorch2.0
推荐机型 :c12_m46_1 * NVIDIA GPU B
赞
结论:CPU上SiLU显著快于GELU(超过10倍),且可调整k使得SiLU和GELU基本完全一致。 GPU上这两者计算效率基本一致。
代码
文本
[1]
%%bash
pip install matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.8.3) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.5) Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (10.2.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2) Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0) Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.50.0) Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: numpy<2,>=1.21 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2) Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本
[2]
import numpy as np
import matplotlib.pyplot as plt
代码
文本
[3]
def relu(x):
return np.maximum(x, 0.)
def gelu(x):
return .5 * x * (1. + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * x ** 3)))
def silu(x, k=1.):
return x * 1. / (1 + np.exp(-k * x))
代码
文本
[4]
x = np.linspace(-5, 5, 1000)
y1 = relu(x)
y2 = gelu(x)
ys = [silu(x, k) for k in range(1, 5)]
代码
文本
[5]
plt.plot(x, y1, label="relu")
plt.plot(x, y2, label="gelu")
for i, y in enumerate(ys):
plt.plot(x, y, label=f"silu({i+1})")
_ = plt.legend()
代码
文本
[6]
k = 1.8
x = np.linspace(-2.5, .5, 1000)
y = relu(x)
y1 = gelu(x)
y2 = silu(x, k)
plt.plot(x, y, label="relu")
plt.plot(x, y1, label="gelu")
plt.plot(x, y2, label=f"silu({k})")
plt.legend()
<matplotlib.legend.Legend at 0x7f2937300bb0>
代码
文本
[14]
x = np.random.randn(100000)
%timeit relu(x)
%timeit silu(x, 1)
%timeit gelu(x)
162 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 417 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 8.14 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
代码
文本
[8]
import torch
import torch.nn as nn
import torch.nn.functional as F
trelu = F.relu
tgelu = F.gelu
tsilu = F.silu
tsiluk = lambda x: F.sigmoid(1.8*x) * F.relu(x)
代码
文本
[9]
x = torch.randn(1024, requires_grad=True)
%timeit trelu(x)
%timeit tgelu(x)
%timeit tsilu(x)
%timeit tsiluk(x)
print("done")
4.79 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 181 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 5.56 µs ± 74.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 24.1 µs ± 83.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) done
代码
文本
cpu 的 torch gelu实现比silu慢30倍
代码
文本
[12]
# GPU test
x = torch.randn(1024, requires_grad=True).cuda()
%timeit tgelu(x)
%timeit trelu(x)
%timeit tsilu(x)
%timeit tsiluk(x)
print("done")
11.4 µs ± 39.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 13.5 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 11.8 µs ± 50.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 55.8 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) done
代码
文本
[13]
gfn = torch.autograd.grad
x = torch.randn(1024, requires_grad=True).cuda()
%timeit y = tgelu(x).sum(); g = gfn(y, x)
%timeit y = trelu(x).sum(); g = gfn(y, x)
%timeit y = tsilu(x).sum(); g = gfn(y, x)
%timeit y = tsiluk(x).sum(); g = gfn(y, x)
print("done")
120 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 125 µs ± 868 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 119 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 263 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) done
代码
文本
GPU的FP和BP中GELU和SiLU的效率相当。
比较反直觉的是ReLU比GELU和SiLU都慢,可能是因为其中的离散化操作。
代码
文本
点个赞吧
推荐阅读
公开
使用Fealpy模块快速实现RFM
Hao Liang

发布于 2023-12-18
1 转存文件
公开
Bk1_Ch35_04
会飞的超级老乔

发布于 2024-06-02