Design and implementation of Recurrent Neural Network (RNN) acceleration SoC on FPGA
-
Abstract
To accelerate inference of Recurrent Neural Networks(RNN), the elapsed time on CPU, the sparsity of input vectors and the parameter size of RNNs are analyzed. RNN acceleration core for parallel matrix-sparse vector multiplication is designed. Multiple input vectors are stored on-chip, to reuse part of the weight matrix, reducing data bandwidth between DDR and on-chip SRAM. The RNN acceleration core is implemented in RTL using Verilog HDL. And behavior simulation environment is built, using parameters of a speech recognition algorithm-DeepSpeech2-as inputs of the acceleration core. Acceleration SoC is built on FPGA with MicroBlaze CPU and the RNN acceleration core. The MicroBlaze is responsible for computings like activation functions and element-wise multiplication of vectors. When accelerating RNN part of Deep Speech 2, 23x speed and 9.4x energy efficiency are achieved compared to MicroBlaze only.
-
-