Skip to content

Commit

Permalink
implementation done
Browse files Browse the repository at this point in the history
  • Loading branch information
ppwwyyxx committed Jan 3, 2014
1 parent deb7d3e commit 17b9ad2
Show file tree
Hide file tree
Showing 6 changed files with 156 additions and 39 deletions.
26 changes: 1 addition & 25 deletions doc/Final-Report-Complete/feature.tex
Original file line number Diff line number Diff line change
@@ -1,32 +1,8 @@
%File: feature.tex
%Date: Fri Jan 03 17:40:07 2014 +0800
%Date: Fri Jan 03 20:48:19 2014 +0800
%Author: Yuxin Wu <ppwwyyxxc@gmail.com>

\subsection{Feature Extraction}
%We extract \textbf{Mel-frequency cepstral coefficients} and \textbf{Linear Predictive
%Coding} features using following parameter are found to be
%optimal, according to our experiments in \secref{result}:
%\begin{itemize}
%\item Common parameters:
%\begin{itemize}
%\item Frame size: 32ms
%\item Frame shift: 16ms
%\item Preemphasis coefficient: 0.95
%\end{itemize}
%\item MFCC parameters:
%\begin{itemize}
%\item number of cepstral coefficient: 15
%\item number of filter banks: 55
%\item maximal frequency of the filter bank: 6000
%\end{itemize}
%\item LPC Parameters:
%\begin{itemize}
%\item number of coefficient: 23
%\end{itemize}
%\end{itemize}

%and then concatenate the two feature vectors of the same frame forming
%a larger feature vector of 15 + 23 = 38 dimension.

\subsubsection{MFCC}
\label{sec:mfcc}
Expand Down
1 change: 1 addition & 0 deletions doc/Final-Report-Complete/gui.tex
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
\section{GUI}
\label{sec:gui}
The GUI contains following tabs:
\begin{itemize}
\item \textbf{Enrollment} \\
Expand Down
95 changes: 94 additions & 1 deletion doc/Final-Report-Complete/implementation.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,98 @@
%File: implementation.tex
%Date: Fri Jan 03 18:37:07 2014 +0800
%Date: Fri Jan 03 21:07:55 2014 +0800
%Author: Yuxin Wu <ppwwyyxxc@gmail.com>

\section{Implementation}
The whole system is written mainly in python, together with code in C++ and matlab.
The system strongly relies on the support of the numpy\cite{numpy} and scipy\cite{scipy} library.

\begin{enumerate}
\item VAD

Three types of VAD filters are located in \verb|src/filters/|.

\verb|silence.py| implements an energy-based VAD algorithm.
\verb|ltsd.py| is a wrapper for LTSD algorithm, relying on pyssp\cite{pyssp}.
\verb|noisered.py| is a wrapper for SOX noise reduction tools, relying on SOX \cite{sox} being
installed in the system.

\item Feature

Implementations for feature extraction are locaed in \verb|src/feature/|.

\verb|MFCC.py| is a self-implemented MFCC feature extractor.
\verb|BOB.py| is a wrapper for the MFCC feature extraction in the bob \cite{bob2012} library.
\verb|LPC.py| is a LPC feature extractor, relying on \verb|scikits.talkbox| \cite{talkbox}.
All the three extractor have the same interface, with configurable parameters.

In the implemention, we have tried different parameters of these features.
The test script can be found as \verb|src/test/test-feature.py|
According to our experiments, we have found that the following parameters are optimal:
\begin{itemize}
\item Common parameters:
\begin{itemize}
\item Frame size: 32ms
\item Frame shift: 16ms
\item Preemphasis coefficient: 0.95
\end{itemize}
\item MFCC parameters:
\begin{itemize}
\item number of cepstral coefficient: 15
\item number of filter banks: 55
\item maximal frequency of the filter bank: 6000
\end{itemize}
\item LPC Parameters:
\begin{itemize}
\item number of coefficient: 23
\end{itemize}
\end{itemize}

\item GMM

We have tried GMM from scikit-learn \cite{scikit-learn} as well as pypr \cite{pypr}, but
they suffer a common problem of inefficency.
For the consideration of speed, a C++ version of GMM with K-MeansII initialization and
concurrency support
was implemented and located in \verb|src/gmm/|. It requires \verb|g++ >= 4.7| to compile.
This implementation of GMM also provides a python binding which have similar interface to the GMM in
scikit-learn.

The new version of GMM, has enhancement in both speed and accuracy. A more detailed discussion
will be in \secref{result}.

At last, we used GMM with 32 components, which is found to be optimal according to our experiment.
The covariance matrix of every Gaussian component is assumed to be diagonal,
since each dimension of the feature vector are independent.

\item CRBM

CRBM is implemented in C++, located in \verb|src/nn|. It also has concurrency support.

\item JFA

From our investigation, we found that the original algorithm \cite{jfa-se} for training JFA model is of
too much complication and hard to implement.
Therefore, we use the simpler algorithm presented in \cite{jfa-study}
to train the JFA model.
This JFA implementation is based on JFA cookbook\cite{cookbook}.
To generate feature files for JFA, \verb|test/gen-features-file.py| shall be used.
After \verb|train.lst, test.lst, enroll.lst| are properly located in \verb|jfa/feature-data|,
the script \verb|run_all.m| will do the training and testing, and \verb|exp/gen_result.py|
will calculate the accuracy.

However, from the result, JFA does not seem to outperform our enhanced MFCC and GMM algorithms
(but do outperform our old algorithms). It is suspected that the training of a JFA model needs more data than
we have provided, since JFA needs data from various source to account for different types of variabilities.
Therefore, we might need to add extra data on the training of JFA, but keep the same data scale in the stage of enrollment,
to get a better result.

It is also worth mentioning that the training of JFA will take much longer time than our old method,
since the estimation process of $ u, v, d$ does not converge quickly. As a result, it might not be practical to add
JFA approach to our GUI system. But we will still test further on the performance of it, compared to other methods.

\item GUI

GUI is implemented based on PyQt\cite{pyqt} and PyAudio\cite{pyaudio}.
\verb|gui.py| is the entrance point. The usage of GUI will be introduced in \secref{gui}.
\end{enumerate}

14 changes: 1 addition & 13 deletions doc/Final-Report-Complete/model.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
%File: model.tex
%Date: Fri Jan 03 18:35:53 2014 +0800
%Date: Fri Jan 03 21:06:49 2014 +0800
%Author: Yuxin Wu <ppwwyyxxc@gmail.com>

\subsection{GMM}
Expand Down Expand Up @@ -142,15 +142,3 @@ \subsection{JFA}
The parameter $ R_s $ and $ R_c$, also referred to as ``Speaker Rank'' and ``Channel Rank'', are two emprical constant selected as first.
The training of JFA is to calculate the best $ u, v, d$ to fit all the training data.

After our investigation, we found that the original algorithm \cite{jfa-se} for training JFA model is of
too much complication and hard to implement.
Therefore, we use the simpler algorithm presented in \cite{jfa-study}
to train the JFA model. However, from the result, JFA does not seem to outperform our enhanced MFCC and GMM algorithms
(but do outperform our old algorithms). It is suspected that the training of a JFA model needs more data than
we have provided, since JFA needs data from various source to account for different types of variabilities.
Therefore, we might need to add extra data on the training of JFA, but keep the same data scale in the stage of enrollment,
to get a better result.

It is also worth mentioning that the training of JFA will take much longer time than our old method,
since the estimation process of $ u, v, d$ does not converge quickly. As a result, it might not be practical to add
JFA approach to our GUI system. But we will still test further on the performance of it, compared to other methods.
24 changes: 24 additions & 0 deletions doc/Final-Report-Complete/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,22 @@ @ONLINE{sox
title = {SoX - Sound eXchange},
url = {http://sox.sourceforge.net/}
}
@ONLINE{cookbook,
title = {Joint Factor Analysis Matlab Demo},
url = {http://speech.fit.vutbr.cz/software/joint-factor-analysis-matlab-demo}
}
@ONLINE{pyqt,
title = {The GPL licensed Python bindings for the Qt application framework},
url = {http://sourceforge.net/projects/pyqt/}
}
@ONLINE{pyaudio,
title = {PyAudio: PortAudio v19 Python Bindings},
url = {http://people.csail.mit.edu/hubert/pyaudio/}
}
@ONLINE{talkbox,
title = {Talkbox, a set of python modules for speech/signal processing},
url = {http://scikits.appspot.com/talkbox}
}

@ONLINE{SRwiki,
title = {Speaker Recognition - Wikipedia, the free encyclopedia},
Expand Down Expand Up @@ -79,6 +95,14 @@ @ONLINE{numpy
title = {NumPy -- Numpy},
url = {http://www.numpy.org/}
}
@ONLINE{scipy,
title = {Scientific Computing Tools for Python},
url = {http://www.scipy.org/}
}
@ONLINE{pyssp,
title = {python speech signal processing library for education},
url = {https://pypi.python.org/pypi/pyssp}
}
@ONLINE{UBM,
title = {Universal Background Models},
url = {http://www.ll.mit.edu/mission/communications/ist/publications/0802_Reynolds_Biometrics_UBM.pdf}
Expand Down
35 changes: 35 additions & 0 deletions src/exp/gen_result.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env python2
# -*- coding: UTF-8 -*-
# File: gen_result.py
# Date: Tue Dec 10 17:32:13 2013 +0800
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>

import operator
import string
enroll_names = map(string.strip,
open("scores_enroll_labels.txt").readlines())
test_names = map(string.strip,
open("scores_test_labels.txt").readlines())

scores = []
with open("scores.txt") as f:
for line in f:
line = map(float, line.strip().split())
scores.append(line)

cnt = 0
right = 0
for tst in xrange(len(test_names)):
match = max([(idx, score[tst]) for (idx, score) in
enumerate(scores)], key=operator.itemgetter(1))
print test_names[tst], enroll_names[match[0]],
if test_names[tst] != enroll_names[match[0]]:
print " wrong"
else:
print
cnt += 1
if int(test_names[tst]) == int(enroll_names[match[0]]):
right += 1
print right, cnt, float(right) / cnt


0 comments on commit 17b9ad2

Please sign in to comment.