-
Notifications
You must be signed in to change notification settings - Fork 274
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
46 changed files
with
1,117 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
TARGET=report | ||
TEX=xelatex -shell-escape | ||
BIBTEX=biber | ||
READER=mupdf | ||
|
||
all: rebuild | ||
|
||
rebuild output/$(TARGET).pdf: *.tex *.bib output | ||
cd output && rm -f *.tex *.bib && ln -fs ../*.tex ../*.bib ../img . | ||
pgrep -a $(TEX) || cd output && $(TEX) $(TARGET).tex && $(BIBTEX) $(TARGET) #&& $(TEX) $(TARGET).tex | ||
|
||
output: | ||
mkdir output | ||
cd output && rm -f data res src && ln -s ../img . | ||
|
||
view: output/$(TARGET).pdf | ||
$(READER) output/$(TARGET).pdf & | ||
(inotifywait -mqe CLOSE_WRITE output/report.pdf | while read; do killall -SIGHUP mupdf; done) | ||
|
||
clean: | ||
rm -rf output | ||
|
||
run: view | ||
|
||
dist: output/$(TARGET).pdf | ||
rm -rf paper | ||
mkdir paper | ||
cp output/$(TARGET).pdf paper/ | ||
7z a -tzip paper.zip paper | ||
rm -rf paper | ||
|
||
.PHONY: all view clean rebuild dist |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
\section{Algorithms} | ||
In this section we will present our aproach to tackle the speaker recognition problem. | ||
|
||
An utterance of a user is collected during enrollment procedure. | ||
Further processing of the utterance follows following steps: | ||
\subsection{VAD} | ||
Signals must be first filtered to rule out the silence part, otherwise the | ||
training might be seriously biased. Therefore \textbf{Voice Activity Detection} must | ||
be first performed. | ||
|
||
An observation found is that, the corpus provided is nearly noise-free. | ||
Therefore we use a simple energy-based approach | ||
to remove the silence part, by simply remove the frames that the average | ||
energy is below 0.01 times the average energy of the whole utterance. | ||
|
||
This energy-based method is found to work well on database, but not | ||
on GUI. | ||
We use LTSD(Long-Term Spectral Divergence) \cite{ltsd1}\cite{ltsd2} | ||
algorithm on GUI, as well as noise reduction technique from SOX\cite{sox} to gain better result. | ||
|
||
LTSD algorithm splits a utterance into overlapped frames, and give scores for each frame on | ||
the probability that there is voice activity in this frame. This probability will be accumulated | ||
to extract all the intervals with voice activity. A picture showing the principle of LTSD is as followed: | ||
|
||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.6\textwidth]{img/ltsd.png} | ||
\end{figure} | ||
|
||
\input{feature} | ||
\input{model} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
\section{Dataset} | ||
The dataset provided by teacher comprised of 102 speaker, in which 60 are | ||
females and the rest are males, with three different speaking style: Spontaneous, | ||
Reading and Whisper. A statistic is as follows: | ||
\begin{table}[!ht] | ||
\centering | ||
\begin{tabular}{|c|c|c|c|} | ||
\hline | ||
& Spontaneous & Reading & Whisper \\\hline | ||
Average Duration & 202s & 205s & 221s \\\hline | ||
Female Average Duration & 205s & 202s & 217s \\\hline | ||
Male Average Duration & 200s & 203s & 223s \\\hline | ||
\end{tabular} | ||
\end{table} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
%File: feature.tex | ||
%Date: Fri Jan 03 17:40:07 2014 +0800 | ||
%Author: Yuxin Wu <ppwwyyxxc@gmail.com> | ||
|
||
\subsection{Feature Extraction} | ||
%We extract \textbf{Mel-frequency cepstral coefficients} and \textbf{Linear Predictive | ||
%Coding} features using following parameter are found to be | ||
%optimal, according to our experiments in \secref{result}: | ||
%\begin{itemize} | ||
%\item Common parameters: | ||
%\begin{itemize} | ||
%\item Frame size: 32ms | ||
%\item Frame shift: 16ms | ||
%\item Preemphasis coefficient: 0.95 | ||
%\end{itemize} | ||
%\item MFCC parameters: | ||
%\begin{itemize} | ||
%\item number of cepstral coefficient: 15 | ||
%\item number of filter banks: 55 | ||
%\item maximal frequency of the filter bank: 6000 | ||
%\end{itemize} | ||
%\item LPC Parameters: | ||
%\begin{itemize} | ||
%\item number of coefficient: 23 | ||
%\end{itemize} | ||
%\end{itemize} | ||
|
||
%and then concatenate the two feature vectors of the same frame forming | ||
%a larger feature vector of 15 + 23 = 38 dimension. | ||
|
||
\subsubsection{MFCC} | ||
\label{sec:mfcc} | ||
\textbf{Mel-Frequency Cepstral Coefficient} is a representation of the short-term power spectrum of a sound, | ||
based on a linear cosine transform of a log power spectrum on a nonlinear mel-scale of frequency \cite{mfcc} . | ||
MFCC is the mostly widely used features in Automatic Speech Recognition(ASR), and it can also be applied to Speaker Recognition task. | ||
|
||
|
||
The process to extract MFCC feature is as followed: | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=\textwidth]{img/MFCC.png} | ||
\end{figure} | ||
|
||
First, the input speech should be divided into successive short-time frames of length $L$, | ||
neighboring frames shall have overlap $R$. | ||
Those frames are then windowed by Hamming Window, as shown in \figref{framming} | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.7\textwidth]{img/MFCC-windowing-frames.png} | ||
\caption{Framing and Windowing \label{fig:framming}} | ||
\end{figure} | ||
|
||
Then, We perform Discrete Fourier Transform (DFT) on windowed signals to compute their spectrums. | ||
For each of $N$ discrete frequency bands we get a complex number $X[k]$ representing | ||
magnitude and phase of that frequency component in the original signal. | ||
|
||
Considering the fact that human hearing is not equally sensitive to all frequency bands, and especially, | ||
it has lower resolution at higher frequencies. | ||
Scaling methods like Mel-scale are aimed at scaling the frequency domain to better fit human auditory perception. | ||
They are approximately linear below 1 kHz and logarithmic above 1 kHz, as shown below: | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.5\textwidth]{img/mel-scale.png} | ||
\end{figure} | ||
|
||
In MFCC, Mel-scale is applied on the spectrums of the signals. | ||
The expression of Mel-scale warpping is as followed: | ||
\[ M(f) = 2595 \log_{10}(1 + \dfrac{f}{700}) \] | ||
|
||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.5\textwidth]{img/bank.png} | ||
\caption{Filter Banks (6 filters) \label{fig:bank}} | ||
\end{figure} | ||
Then, we appply the bank of filters according to Mel-scale on the spectrum, | ||
calculate the logarithm of energy under each bank by $E_i[m] = \log (\sum_{k=0}^{N-1}{X_i[k]^2 H_m[k]}) $ and apply Discrete | ||
Cosine Transform (DCT) on $E_i[m](m = 1, 2, \cdots M) $ to get an array $c_i $: | ||
\[ c_i[n] = \sum_{m=0}^{M-1}{E_i[m]\cos(\dfrac{\pi n}{M}(m - \dfrac{1}{2}))} \] | ||
|
||
Then, the first $k$ terms in $c_i $ can be used as features for future training. | ||
The number of $k$ varies in different cases, we will further discuss the choice of $k$ in \secref{result}. | ||
|
||
\subsubsection{LPC} | ||
\textbf{Linear predictive coding} is a tool used mostly in audio signal processing and speech | ||
processing for representing the spectral envelope of a | ||
digital signal of speech in compressed form, using the information of a linear predictive model.\cite{lpc} | ||
|
||
The basic assumption in LPC is that, | ||
in a short period, the $n$th signal is a linear combination of previous $p$ signals: | ||
$ \hat{x}(n) = \sum_{i=1}^pa_i x(n-i)$ | ||
Therefore, to estimate the coefficients $ a_i$, we have to minimize the squared error | ||
$ \text{E}\left[ \hat{x}(n) - x(n)\right]$. | ||
This optimization can be done by Levinson-Durbin algorithm.\cite{levinson-durbin} | ||
|
||
Therefore, we first split the input signal into frames, as is done in MFCC feature extraction \secref{mfcc}. | ||
Then we calculate the $k$ order LPC coefficients for the signal in this frame. | ||
Since the coefficients is a compressed description for the original audio signal, | ||
the coefficients is also a good feature for speech/speaker recognition. | ||
The choice of $k$ will also be further discussed in \secref{result}. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
\section{GUI} | ||
The GUI contains following tabs: | ||
\begin{itemize} | ||
\item \textbf{Enrollment} \\ | ||
|
||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.8\textwidth]{img/enrollment.png} | ||
\end{figure} | ||
|
||
A new user may start his or her first step by clicking the | ||
tab Enrollment. New users could provide personal information | ||
such as name, sex, and age. then upload personal avatar to | ||
build up their own data. Experienced users can choose from | ||
the userlist and update their infomation. | ||
|
||
Next the user needs to provide a piece of utterance for | ||
the enrollment and training process. | ||
|
||
There are two ways to enroll a user: | ||
\begin{itemize} | ||
\item \textbf{Enroll by Recording} | ||
Click Record and start talking while click Stop to stop | ||
and save.There is no limit of the content of the utterance, | ||
whileit is highly recommended that the user speaks long enough | ||
to provide sufficient message for the enrollment. | ||
|
||
\item \textbf{Enroll from Wav Files} | ||
User can upload a pre-recorded voice of a speaker.(*.wav recommended) | ||
The systemaccepts the voice given and the enrollment of a speaker is done. | ||
\end{itemize} | ||
|
||
The user can train, dump or load his/her voice features after enrollment. | ||
|
||
\item \textbf{Recognition of a user} \\ | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.8\textwidth]{img/recognition.png} | ||
\end{figure} | ||
|
||
A enrolled user present or record a piece of utterance, | ||
the system tells who the person is and show user's avatar. | ||
Recognition of multiple pre-recorded files can be done as well. | ||
|
||
\item \textbf{Conversation Recognition Mode} \\ | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.8\textwidth]{img/conversation.png} | ||
\caption{\label{fig:}} | ||
\end{figure} | ||
|
||
In Conversation Recognition mode, multiple users can have conversations | ||
together near the microphone. Same recording procedure as above. | ||
The system will continuously collect voice data, and determine | ||
who is speaking right now. Current speaker's anvatar will show up | ||
in screen; otherwise the name will be shown. The conversation | ||
audio can be downloaded and saved. | ||
There are some ways to visualize the speaker-distribution in the | ||
conversation. | ||
\begin{itemize} | ||
\item \textbf{Conversation log} | ||
A detailed log, including start time, stop time, | ||
current speaker of each period is generated. | ||
\item \textbf{Conversation flow graph} | ||
\begin{figure}[H] | ||
\centering | ||
\includegraphics[width=0.8\textwidth]{img/conversationgraph.png} | ||
\end{figure} | ||
|
||
A timeline of the conversation will be shown by a number of | ||
talking-clouds joining together, with start time, stop time | ||
and users' avatars labeled. Different users are presented | ||
with different colors.The timeline will flow to the left dynamically | ||
just as time elapses. The visualization of the conversation is done | ||
in this way. This functionality is still under development. | ||
\end{itemize} | ||
|
||
\end{itemize} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/MFCC-mel-filterbank.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/MFCC-windowing-frames.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/MFCC.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/a0.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/a1.png |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/crbm.pdf |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/gmm-compare.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/gmm.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/lpc-frame-len.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/lpc-nceps.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/ltsd.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/mel-scale.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/mfcc-frame-len.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/mfcc-nceps.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/mfcc-nfilter.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/nmixture.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/performance.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/rbm-original.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/rbm-reconstruct.png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/reading.pdf |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/spont.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/time-comp-small.pdf |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../Presentation/res/whisper.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
%File: implementation.tex | ||
%Date: Fri Jan 03 17:18:14 2014 +0800 | ||
%Author: Yuxin Wu <ppwwyyxxc@gmail.com> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
%File: intro.tex | ||
%Date: Fri Jan 03 17:03:58 2014 +0800 | ||
%Author: Yuxin Wu <ppwwyyxxc@gmail.com> | ||
|
||
|
||
\section{Introduction} | ||
\textbf{Speaker recognition} is the identification of the person who is speaking by characteristics | ||
of their voices (voice biometrics), also called voice recognition. \cite{SRwiki} | ||
|
||
A \textbf{Speaker Recognition} tasks can be classified with respect to different criterion: | ||
Text-dependent or Text-independent, Verification (decide whether the person is he claimed to be) or | ||
Identification (decide who the person is by its voice).\cite{SRwiki} | ||
|
||
Speech is a kind of complicated signal produced as a result of several transformations occurring at | ||
different levels: semantic, linguistic and acoustic. | ||
Differences in these transformations may lead to differences in the acoustic properties of the signals. | ||
The recognizability of speaker can be affected not only by the linguistic message | ||
but also the age, health, emotional state and effort level of the speaker. | ||
Background noise and performance of recording device also interfere | ||
the classification process. | ||
|
||
Speaker recognition is an important part of Human-Computer Interaction (HCI). | ||
As the trend of employing wearable computer reveals, | ||
Voice User Interface (VUI) has been a vital part of such computer. | ||
As these devices are particularly small, they are more likely to lose and be stolen. | ||
In these scenarios, speaker recognition is not only a good HCI, | ||
but also a combination of seamless interaction with computer and security guard | ||
when the device is lost. | ||
The need of personal identity validation will become more acute in the future. | ||
Speaker verification may be essential in business telecommunications. | ||
Telephone banking and telephone reservation services will develop rapidly | ||
when secure means of authentication were available. | ||
|
||
Also,the identity of a speaker is quite often at issue in court cases. | ||
A crime victim may have heard but not seen the perpetrator, | ||
but claim to recognize the perpetrator as someone whose voice was previously familiar; | ||
or there may be recordings of a criminal whose identity is unknown. | ||
Speaker recognition technique may bring a reliable scientific determination. | ||
|
||
Furthermore, these techniques can be used in environment which demands high security. | ||
It can be combined with other biological metrics to form a multi-modal authentication system. | ||
|
||
In this task, we have built a proof-of-concept text-independent speaker recognition system with | ||
GUI support. It is fast, accurate based on our tests on large corpus. | ||
And the gui program only require very short utterance to quickly respond. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
% $File: mint-defs.tex | ||
% $Date: Thu Sep 26 22:11:33 2013 +0800 | ||
% $Author: Xinyu Zhou <zxytim@gmail.com> | ||
|
||
\newcommand{\inputmintedConfigured}[3][]{\inputminted[fontsize=\footnotesize, | ||
label=#3,linenos,frame=lines,framesep=0.8em,tabsize=4,#1]{#2}{#3}} | ||
|
||
\newcommand{\txtsrc}[2][]{\inputmintedConfigured[#1]{text}{#2}} | ||
\newcommand{\txtsrcpart}[4][]{\txtsrc[firstline=#3,firstnumber=#3,lastline=#4,#1]{#2}} | ||
|
||
\newcommand{\cppsrc}[2][]{\inputmintedConfigured[#1]{cpp}{#2}} | ||
\newcommand{\cppsrcpart}[4][]{\cppsrc[firstline=#3,firstnumber=#3,lastline=#4,#1]{#2}} | ||
|
||
\newcommand{\javasrc}[2][]{\inputmintedConfigured[#1]{java}{#2}} | ||
\newcommand{\javasrcpart}[4][]{\javasrc[firstline=#3,firstnumber=#3,lastline=#4,#1]{#2}} | ||
|
||
\newcommand{\matlabsrc}[2][]{\inputmintedConfigured[#1]{matlab}{#2}} | ||
\newcommand{\matlabsrcpart}[4][]{\matlabsrc[firstline=#3,firstnumber=#3,lastline=#4,#1]{#2}} |
Oops, something went wrong.