Dynamic Gated Recurrent Neural Network (DG-RNN)

Introduction

As showing below, when Recurrent Neural Network (RNN) processing natural signals such as speech, the activations of some neurons change slowly over steps.

RNNactivations — Activation patterns of neurons in a Gated Recurrent Unit (GRU) layer over time steps. The GRU layer is from a speech enhancement model. Each row represents a neuron, while columns show activations at different time steps.

From this observation, we propose a new method that reduces the computes of conventional RNNs by updating only a selected subset of neurons at each step.

Dynamic Gated RNN (DG-RNN)

In conventional RNN models, every neuron in the hidden state is updated at each step. In contrast, DG-RNN introduces a novel component: a binary select gate \(\boldsymbol{g}_t\). This gate dynamically determines which subset of neurons should be updated at each step \(t\).

Neurons that are not selected by the select gate will skip their update process at that step, maintaining their values from the previous hidden state. This selective updating leads to a computation reduction.

Dynamic GRU (D-GRU)

When applying DG-RNN to GRU, no extra parameters are needed.

GRU hidden state update equation at step \(t\): \[\begin{align} \color{blue}{\text{Reset Gate: }} \boldsymbol{r}_t &= \sigma(\mathbf{W}_{ir}\boldsymbol{x}_t + \boldsymbol{b}_{ir} + \mathbf{W}_{hr}\boldsymbol{h}_{t-1} + \boldsymbol{b}_{hr}) \\ \color{blue}{\text{Candidate State: }} \boldsymbol{c}_t &= \tanh(\mathbf{W}_{ic}\boldsymbol{x}_t + \boldsymbol{b}_{ic} + \boldsymbol{r}_t \ast (\mathbf{W}_{hc}\boldsymbol{h}_{t-1} + \boldsymbol{b}_{hc})) \\ \color{green}{\text{Update Gate: }} \boldsymbol{z}_t &= \sigma(\mathbf{W}_{iz}\boldsymbol{x}_t + \boldsymbol{b}_{iz} + \mathbf{W}_{hz}\boldsymbol{h}_{t-1} + \boldsymbol{b}_{hz}) \label{eq:gru_z} \\ \text{State Update: }\boldsymbol{h}_t &= \boldsymbol{z}_t \ast \boldsymbol{c}_t + (1 - \boldsymbol{z}_t) \ast \boldsymbol{h}_{t-1} \label{eq:gru_h} \end{align}\] For a neuron \(j\), when \(z^j_t\) is close to 1, the hidden state \(h^j_t\) is largely replaced by the candidate hidden state \(c^j_t\). Conversely, \(z^j_t\) close to 0 means that \(h^j_{t}\) is close to \(h^j_{t-1}\).

In proposed D-GRU, we only update neurons with the top-\(A\) largest values in the update gate \(\boldsymbol{z}_t\). For neurons are not selected, the computation of reset gate \( r_t^j \) and candidate state \( c_t^j \) can be skipped.

Since the computation for the update gate \(\boldsymbol{z}\) cannot be saved, the total computation of the D-GRU is \((1+2\mathcal{P})/3\) of that in the conventional GRU. Here, \(\mathcal{P}\) is the ratio of selected neurons.

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

Introduction

Dynamic Gated RNN (DG-RNN)

Dynamic GRU (D-GRU)

Demo

Cite