END-TO-END AUDIOVISUAL SPEECH RECOGNITION BASED ON ATTENTION FUSION OF SDBN AND BLSTM

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

Blog Article

An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time wac 4011 series.Then,a attention mechanism was used to align and fuse the lip g5210t-p90 visual information and audio auditory information automatically.

Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.

Report this page