Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
神经网络激活函数 SiLU vs GELU
Deep Learning
Deep Learning
cyrus
发布于 2024-03-20
推荐镜像 :Basic Image:ubuntu:22.04-py3.10-pytorch2.0
推荐机型 :c12_m46_1 * NVIDIA GPU B

结论:CPU上SiLU显著快于GELU(超过10倍),且可调整k使得SiLU和GELU基本完全一致。 GPU上这两者计算效率基本一致。

代码
文本
[1]
%%bash
pip install matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.8.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.50.0)
Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: numpy<2,>=1.21 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本
[2]
import numpy as np
import matplotlib.pyplot as plt
代码
文本
[3]
def relu(x):
return np.maximum(x, 0.)

def gelu(x):
return .5 * x * (1. + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * x ** 3)))

def silu(x, k=1.):
return x * 1. / (1 + np.exp(-k * x))
代码
文本
[4]
x = np.linspace(-5, 5, 1000)
y1 = relu(x)
y2 = gelu(x)
ys = [silu(x, k) for k in range(1, 5)]
代码
文本
[5]
plt.plot(x, y1, label="relu")
plt.plot(x, y2, label="gelu")
for i, y in enumerate(ys):
plt.plot(x, y, label=f"silu({i+1})")
_ = plt.legend()
代码
文本
[6]
k = 1.8
x = np.linspace(-2.5, .5, 1000)
y = relu(x)
y1 = gelu(x)
y2 = silu(x, k)
plt.plot(x, y, label="relu")
plt.plot(x, y1, label="gelu")
plt.plot(x, y2, label=f"silu({k})")
plt.legend()
<matplotlib.legend.Legend at 0x7f2937300bb0>
代码
文本
[14]
x = np.random.randn(100000)
%timeit relu(x)
%timeit silu(x, 1)
%timeit gelu(x)
162 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
417 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
8.14 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
代码
文本
[8]
import torch
import torch.nn as nn
import torch.nn.functional as F
trelu = F.relu
tgelu = F.gelu
tsilu = F.silu
tsiluk = lambda x: F.sigmoid(1.8*x) * F.relu(x)

代码
文本
[9]
x = torch.randn(1024, requires_grad=True)
%timeit trelu(x)
%timeit tgelu(x)
%timeit tsilu(x)
%timeit tsiluk(x)
print("done")
4.79 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
181 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
5.56 µs ± 74.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
24.1 µs ± 83.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
done
代码
文本

cpu 的 torch gelu实现比silu慢30倍

代码
文本
[12]
# GPU test
x = torch.randn(1024, requires_grad=True).cuda()

%timeit tgelu(x)
%timeit trelu(x)
%timeit tsilu(x)
%timeit tsiluk(x)
print("done")
11.4 µs ± 39.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
13.5 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
11.8 µs ± 50.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
55.8 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
done
代码
文本
[13]
gfn = torch.autograd.grad
x = torch.randn(1024, requires_grad=True).cuda()

%timeit y = tgelu(x).sum(); g = gfn(y, x)
%timeit y = trelu(x).sum(); g = gfn(y, x)
%timeit y = tsilu(x).sum(); g = gfn(y, x)
%timeit y = tsiluk(x).sum(); g = gfn(y, x)
print("done")
120 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
125 µs ± 868 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
119 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
263 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
done
代码
文本

GPU的FP和BP中GELU和SiLU的效率相当。

比较反直觉的是ReLU比GELU和SiLU都慢,可能是因为其中的离散化操作。

代码
文本
Deep Learning
Deep Learning
点个赞吧
推荐阅读
公开
使用Fealpy模块快速实现RFM
Deep LearningAI4S
Deep LearningAI4S
Hao Liang
发布于 2023-12-18
1 转存文件
公开
Bk1_Ch35_04
Book_1_《编程不难》 | 鸢尾花书:从加减乘除到机器学习
Book_1_《编程不难》 | 鸢尾花书:从加减乘除到机器学习
会飞的超级老乔
发布于 2024-06-02