引用本文: | 赵慧珍,刘付显,李龙跃.一种新的深度卷积神经网络的SLU函数[J].哈尔滨工业大学学报,2018,50(4):117.DOI:10.11918/j.issn.0367-6234.201703117 |
| ZHAO Huizhen,LIU Fuxian,LI Longyue.A novel softplus linear unit for deep CNN[J].Journal of Harbin Institute of Technology,2018,50(4):117.DOI:10.11918/j.issn.0367-6234.201703117 |
|
摘要: |
修正线性单元(rectified linear unit, ReLU)是深度卷积神经网络常用的激活函数, 但当输入为负数时, ReLU的输出为零, 造成了零梯度问题; 且当输入为正数时, ReLU的输出保持输入不变, 使得ReLU函数的平均值恒大于零, 引起了偏移现象, 从而限制了深度卷积神经网络的学习速率和学习效果.针对ReLU函数的零梯度问题和偏移现象, 根据“输出均值接近零的激活函数能够提升神经网络学习性能”原理对其进行改进, 提出SLU(softplus linear unit)函数.首先, 对负数输入部分进行softplus处理, 使得负数输入时SLU函数的输出为负, 从而输出平均值更接近于零, 减缓了偏移现象; 其次, 为保证梯度平稳, 对SLU的参数进行约束, 并固定正数部分的参数; 最后, 根据SLU对正数部分的处理调整负数部分的参数, 确保激活函数在零点处连续可导, 信息得以双向传播.设计深度自编码模型在数据集MINST上进行无监督学习, 设计网中网卷积神经网络模型在数据集CIFAR-10上进行监督学习.实验结果表明, 与ReLU及其相关改进单元相比, 基于SLU函数的神经网络模型具有更好的特征学习能力和更高的学习精度.
|
关键词: 深度学习 卷积神经网络 激活函数 softplus函数 修正线性单元 |
DOI:10.11918/j.issn.0367-6234.201703117 |
分类号:TP183 |
文献标识码:A |
基金项目:国家自然科学基金(61601499) |
|
A novel softplus linear unit for deep CNN |
ZHAO Huizhen,LIU Fuxian,LI Longyue
|
(School of Air and Missile Defense, Air Force Engineering University, Xi’an 710051, China)
|
Abstract: |
Currently, the most popular activation function for deep convolutional neural network is the rectified linear unit (ReLU).The ReLU activation function outputs zero for negative quadrant, inducing the death of some neurons, and remains the input data for the positive quadrant, inducing a bias shift.According to the theory that "zero means activations improving learning ability", softplus linear unit(SLU) is introduced as an adaptive activation function that can tackle with these two problems.Firstly, negative inputs are processed with the softplus function, pushing the mean of outputs of the activation function to zero and reducing the bias shift.Then, the parameters of the positive component are fixed to control vanishing gradients.Thirdly, to maintain continuity and differentiability at zero, the parameters of the negative part are updated according to the positive quadrant.Several experiments are conducted on the MNIST dataset for supervised learning with deep auto-encode networks, as well as several experiments on the CIFAR-10 dataset for unsupervised learning with deep convolutional neural networks.The experiments have shown faster convergence and better performance for image classification of SLU-based networks compared with rectified activation functions.
|
Key words: deep learning deep convolutional neural network activation function softplus function rectified linear unit |