神经网络激活函数 SiLU vs GELU | Bohrium-玻尔科研空间站

空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

神经网络激活函数 SiLU vs GELU

Deep Learning

cyrus

发布于 2024-03-20

推荐镜像 :Basic Image:ubuntu:22.04-py3.10-pytorch2.0

推荐机型 :c12_m46_1 * NVIDIA GPU B

结论：CPU上SiLU显著快于GELU（超过10倍），且可调整k使得SiLU和GELU基本完全一致。 GPU上这两者计算效率基本一致。

代码

文本

[1]

%%bash

pip install matplotlib

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.8.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.50.0)
Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: numpy<2,>=1.21 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

代码

文本

[2]

import numpy as np

import matplotlib.pyplot as plt

代码

文本

[3]

def relu(x):

return np.maximum(x, 0.)

def gelu(x):

return .5 * x * (1. + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * x ** 3)))

def silu(x, k=1.):

return x * 1. / (1 + np.exp(-k * x))

代码

文本

[4]

x = np.linspace(-5, 5, 1000)

y1 = relu(x)

y2 = gelu(x)

ys = [silu(x, k) for k in range(1, 5)]

代码

文本

[5]

plt.plot(x, y1, label="relu")

plt.plot(x, y2, label="gelu")

for i, y in enumerate(ys):

plt.plot(x, y, label=f"silu({i+1})")

_ = plt.legend()

代码

文本

[6]

k = 1.8

x = np.linspace(-2.5, .5, 1000)

y = relu(x)

y1 = gelu(x)

y2 = silu(x, k)

plt.plot(x, y, label="relu")

plt.plot(x, y1, label="gelu")

plt.plot(x, y2, label=f"silu({k})")

plt.legend()

<matplotlib.legend.Legend at 0x7f2937300bb0>

代码

文本

[14]

x = np.random.randn(100000)

%timeit relu(x)

%timeit silu(x, 1)

%timeit gelu(x)

162 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
417 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
8.14 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

代码

文本

[8]

import torch

import torch.nn as nn

import torch.nn.functional as F

trelu = F.relu

tgelu = F.gelu

tsilu = F.silu

tsiluk = lambda x: F.sigmoid(1.8*x) * F.relu(x)

代码

文本

[9]

x = torch.randn(1024, requires_grad=True)

%timeit trelu(x)

%timeit tgelu(x)

%timeit tsilu(x)

%timeit tsiluk(x)

print("done")

4.79 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
181 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
5.56 µs ± 74.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
24.1 µs ± 83.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
done

代码

文本

cpu 的 torch gelu实现比silu慢30倍

代码

文本

[12]

# GPU test

x = torch.randn(1024, requires_grad=True).cuda()

%timeit tgelu(x)

%timeit trelu(x)

%timeit tsilu(x)

%timeit tsiluk(x)

print("done")

11.4 µs ± 39.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
13.5 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
11.8 µs ± 50.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
55.8 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
done

代码

文本

[13]

gfn = torch.autograd.grad

x = torch.randn(1024, requires_grad=True).cuda()

%timeit y = tgelu(x).sum(); g = gfn(y, x)

%timeit y = trelu(x).sum(); g = gfn(y, x)

%timeit y = tsilu(x).sum(); g = gfn(y, x)

%timeit y = tsiluk(x).sum(); g = gfn(y, x)

print("done")

120 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
125 µs ± 868 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
119 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
263 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
done

代码

文本

GPU的FP和BP中GELU和SiLU的效率相当。

比较反直觉的是ReLU比GELU和SiLU都慢，可能是因为其中的离散化操作。

代码

文本

Deep Learning

点个赞吧