implementation done

ppwwyyxx · Jan 3, 2014 · 17b9ad2 · 17b9ad2
1 parent deb7d3e
commit 17b9ad2
Show file tree

Hide file tree

Showing 6 changed files with 156 additions and 39 deletions.
diff --git a/doc/Final-Report-Complete/feature.tex b/doc/Final-Report-Complete/feature.tex
@@ -1,32 +1,8 @@
 %File: feature.tex
-%Date: Fri Jan 03 17:40:07 2014 +0800
+%Date: Fri Jan 03 20:48:19 2014 +0800
 %Author: Yuxin Wu <ppwwyyxxc@gmail.com>
 
 \subsection{Feature Extraction}
-%We extract \textbf{Mel-frequency cepstral coefficients} and \textbf{Linear Predictive
-%Coding} features using following parameter are found to be
-%optimal, according to our experiments in \secref{result}:
-%\begin{itemize}
-%\item Common parameters:
-%\begin{itemize}
-%\item Frame size: 32ms
-%\item Frame shift: 16ms
-%\item Preemphasis coefficient: 0.95
-%\end{itemize}
-%\item MFCC parameters:
-%\begin{itemize}
-%\item number of cepstral coefficient: 15
-%\item number of filter banks: 55
-%\item maximal frequency of the filter bank: 6000
-%\end{itemize}
-%\item LPC Parameters:
-%\begin{itemize}
-%\item number of coefficient: 23
-%\end{itemize}
-%\end{itemize}
-
-%and then concatenate the two feature vectors of the same frame forming
-%a larger feature vector of 15 + 23 = 38 dimension.
 
 \subsubsection{MFCC}
 \label{sec:mfcc}

diff --git a/doc/Final-Report-Complete/gui.tex b/doc/Final-Report-Complete/gui.tex
@@ -1,4 +1,5 @@
 \section{GUI}
+\label{sec:gui}
 The GUI contains following tabs:
 \begin{itemize}
   \item \textbf{Enrollment} \\

diff --git a/doc/Final-Report-Complete/implementation.tex b/doc/Final-Report-Complete/implementation.tex
@@ -1,5 +1,98 @@
 %File: implementation.tex
-%Date: Fri Jan 03 18:37:07 2014 +0800
+%Date: Fri Jan 03 21:07:55 2014 +0800
 %Author: Yuxin Wu <ppwwyyxxc@gmail.com>
 
 \section{Implementation}
+The whole system is written mainly in python, together with code in C++ and matlab.
+The system strongly relies on the support of the numpy\cite{numpy} and scipy\cite{scipy} library.
+
+\begin{enumerate}
+    \item VAD
+
+      Three types of  VAD filters are located in \verb|src/filters/|.
+
+      \verb|silence.py| implements an energy-based VAD algorithm.
+      \verb|ltsd.py| is a wrapper for LTSD algorithm, relying on pyssp\cite{pyssp}.
+      \verb|noisered.py| is a wrapper for SOX noise reduction tools, relying on SOX \cite{sox} being
+      installed in the system.
+
+    \item Feature
+
+      Implementations for feature extraction are locaed in \verb|src/feature/|.
+
+      \verb|MFCC.py| is a self-implemented MFCC feature extractor.
+      \verb|BOB.py| is a wrapper for the MFCC feature extraction in the bob \cite{bob2012} library.
+      \verb|LPC.py| is a LPC feature extractor, relying on \verb|scikits.talkbox| \cite{talkbox}.
+      All the three extractor have the same interface, with configurable parameters.
+
+      In the implemention, we have  tried different parameters of these features.
+      The test script can be found as \verb|src/test/test-feature.py|
+      According to our experiments, we have found that the following parameters are optimal:
+      \begin{itemize}
+        \item Common parameters:
+          \begin{itemize}
+            \item Frame size: 32ms
+            \item Frame shift: 16ms
+            \item Preemphasis coefficient: 0.95
+          \end{itemize}
+        \item MFCC parameters:
+          \begin{itemize}
+            \item number of cepstral coefficient: 15
+            \item number of filter banks: 55
+            \item maximal frequency of the filter bank: 6000
+          \end{itemize}
+        \item LPC Parameters:
+          \begin{itemize}
+            \item number of coefficient: 23
+          \end{itemize}
+      \end{itemize}
+
+    \item GMM
+
+      We have tried GMM from scikit-learn \cite{scikit-learn} as well as pypr \cite{pypr}, but
+      they suffer a common problem of inefficency.
+      For the consideration of speed, a C++ version of GMM with K-MeansII initialization and
+      concurrency support
+      was implemented and located in \verb|src/gmm/|. It requires \verb|g++ >= 4.7| to compile.
+      This implementation of GMM also provides a python binding which have similar interface to the GMM in
+      scikit-learn.
+
+      The new version of GMM, has enhancement in both speed and accuracy. A more detailed discussion
+      will be in \secref{result}.
+
+      At last, we used GMM with 32 components, which is found to be optimal according to our experiment.
+      The covariance matrix of every Gaussian component is assumed to be diagonal,
+      since each dimension of the feature vector are independent.
+
+    \item CRBM
+
+      CRBM is implemented in C++, located in \verb|src/nn|. It also has concurrency support.
+
+    \item JFA
+
+      From our investigation, we found that the original algorithm \cite{jfa-se} for training JFA model is of
+      too much complication and hard to implement.
+      Therefore, we use the simpler algorithm presented in \cite{jfa-study}
+      to train the JFA model.
+      This JFA implementation is based on JFA cookbook\cite{cookbook}.
+      To generate feature files for JFA, \verb|test/gen-features-file.py| shall be used.
+      After \verb|train.lst, test.lst, enroll.lst| are properly located in \verb|jfa/feature-data|,
+      the script \verb|run_all.m| will do the training and testing, and \verb|exp/gen_result.py|
+      will calculate the accuracy.
+
+      However, from the result, JFA does not seem to outperform our enhanced MFCC and GMM algorithms
+      (but do outperform our old algorithms). It is suspected that the training of a JFA model needs more data than
+      we have provided, since JFA needs data from various source to account for different types of variabilities.
+      Therefore, we might need to add extra data on the training of JFA, but keep the same data scale in the stage of enrollment,
+      to get a better result.
+
+      It is also worth mentioning that the training of JFA will take much longer time than our old method,
+      since the estimation process of $ u, v, d$ does not converge quickly. As a result, it might not be practical to add
+      JFA approach to our GUI system. But we will still test further on the performance of it, compared to other methods.
+
+    \item GUI
+
+      GUI is implemented based on PyQt\cite{pyqt} and PyAudio\cite{pyaudio}.
+      \verb|gui.py| is the entrance point. The usage of GUI will be introduced in \secref{gui}.
+  \end{enumerate}
+
diff --git a/doc/Final-Report-Complete/model.tex b/doc/Final-Report-Complete/model.tex
@@ -1,5 +1,5 @@
 %File: model.tex
-%Date: Fri Jan 03 18:35:53 2014 +0800
+%Date: Fri Jan 03 21:06:49 2014 +0800
 %Author: Yuxin Wu <ppwwyyxxc@gmail.com>
 
 \subsection{GMM}
@@ -142,15 +142,3 @@ \subsection{JFA}
 The parameter $ R_s $ and $ R_c$, also referred to as ``Speaker Rank'' and ``Channel Rank'', are two emprical constant selected as first.
 The training of JFA is to calculate the best $ u, v, d$ to fit all the training data.
 
-After our investigation, we found that the original algorithm \cite{jfa-se} for training JFA model is of
-too much complication and hard to implement.
-Therefore, we use the simpler algorithm presented in \cite{jfa-study}
-to train the JFA model. However, from the result, JFA does not seem to outperform our enhanced MFCC and GMM algorithms
-(but do outperform our old algorithms). It is suspected that the training of a JFA model needs more data than
-we have provided, since JFA needs data from various source to account for different types of variabilities.
-Therefore, we might need to add extra data on the training of JFA, but keep the same data scale in the stage of enrollment,
-to get a better result.
-
-It is also worth mentioning that the training of JFA will take much longer time than our old method,
-since the estimation process of $ u, v, d$ does not converge quickly. As a result, it might not be practical to add
-JFA approach to our GUI system. But we will still test further on the performance of it, compared to other methods.
diff --git a/doc/Final-Report-Complete/refs.bib b/doc/Final-Report-Complete/refs.bib
@@ -26,6 +26,22 @@ @ONLINE{sox
 	title = {SoX - Sound eXchange},
 	url = {http://sox.sourceforge.net/}
 }
+@ONLINE{cookbook,
+	title = {Joint Factor Analysis Matlab Demo},
+	url = {http://speech.fit.vutbr.cz/software/joint-factor-analysis-matlab-demo}
+}
+@ONLINE{pyqt,
+	title = {The GPL licensed Python bindings for the Qt application framework},
+	url = {http://sourceforge.net/projects/pyqt/}
+}
+@ONLINE{pyaudio,
+	title = {PyAudio: PortAudio v19 Python Bindings},
+	url = {http://people.csail.mit.edu/hubert/pyaudio/}
+}
+@ONLINE{talkbox,
+	title = {Talkbox, a set of python modules for speech/signal processing},
+	url = {http://scikits.appspot.com/talkbox}
+}
 
 @ONLINE{SRwiki,
 	title = {Speaker Recognition - Wikipedia, the free encyclopedia},
@@ -79,6 +95,14 @@ @ONLINE{numpy
 	title = {NumPy -- Numpy},
 	url = {http://www.numpy.org/}
 }
+@ONLINE{scipy,
+	title = {Scientific Computing Tools for Python},
+	url = {http://www.scipy.org/}
+}
+@ONLINE{pyssp,
+	title = {python speech signal processing library for education},
+	url = {https://pypi.python.org/pypi/pyssp}
+}
 @ONLINE{UBM,
 	title = {Universal Background Models},
 	url = {http://www.ll.mit.edu/mission/communications/ist/publications/0802_Reynolds_Biometrics_UBM.pdf}

diff --git a/src/exp/gen_result.py b/src/exp/gen_result.py
@@ -0,0 +1,35 @@
+#!/usr/bin/env python2
+# -*- coding: UTF-8 -*-
+# File: gen_result.py
+# Date: Tue Dec 10 17:32:13 2013 +0800
+# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
+
+import operator
+import string
+enroll_names = map(string.strip,
+                   open("scores_enroll_labels.txt").readlines())
+test_names = map(string.strip,
+                 open("scores_test_labels.txt").readlines())
+
+scores = []
+with open("scores.txt") as f:
+    for line in f:
+        line = map(float, line.strip().split())
+        scores.append(line)
+
+cnt = 0
+right = 0
+for tst in xrange(len(test_names)):
+    match = max([(idx, score[tst]) for (idx, score) in
+                 enumerate(scores)], key=operator.itemgetter(1))
+    print test_names[tst], enroll_names[match[0]],
+    if test_names[tst] != enroll_names[match[0]]:
+        print "  wrong"
+    else:
+        print
+    cnt += 1
+    if int(test_names[tst]) == int(enroll_names[match[0]]):
+        right += 1
+print right, cnt, float(right) / cnt
+
+