End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
Blog Article
An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time wac 4011 series.Then,a attention mechanism was used to align and fuse the lip g5210t-p90 visual information and audio auditory information automatically.
Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.