Skip to content
/ tf_asr Public

Improving Deep Neural Networks Based Speech Recognition System For Far-field Speech

Notifications You must be signed in to change notification settings

tanzita/tf_asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tf_asr

These codes are part of my masters thesis 'Improving Deep Neural Networks Based Speech Recognition System For Far-field Speech'.

Here, I have built Multilayer Perceptron and Convolutional Neural Networks based acoustic models using Tensorflow.

Data set: AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/). It is 100 hours of meeting data

Masters Thesis was submitted on 15th march, 2018

Thesis Title

Improving Deep Neural Networks Based Speech Recognition System For Far-field Speech

Abstract

Nowadays, the research focus of automatic speech recognition (ASR) task is shifting from the close-talk scenario towards the far-field scenario. It is considered as a more practical but challenging task as the input data contains noise, reverberation or overlapped speech. This work aims at improving the overall performance in far-field speech recognition task by applying the latest deep learning technologies. In order to achieve this goal, this work experiments with the beamforming technology on multi-array microphone data, improvement of language model module, and development of the acoustic model module for far-field ASR system by using multi-layer perceptron, convolutional neural network, and very deep convolutional neural networks. This work demonstrates the effectiveness of feature space maximum likelihood linear regression based feature extraction technique and very leaky rectified linear unit activation function over others to achieve better accuracy and word error based performance in the far-field domain. This work also shows that, the improvement of the language model and the pronunciation dictionary is extremely crucial in overall system performance improvement as the acoustic model is not strong with the relatively sparse far-field data.

Index Terms: deep neural networks, deep learning, far-field speech recognition, automatic speech recognition, natural language processing, very deep CNN

Full text web link: https://goo.gl/D8h7KL