uis-rnn can't work for long utterances dataset?

## Describe the question
In Diarization task, i train on AMI train-dev set and ICSI corpus , i test on AMI test set. Both datasets include audios of 3-5 speakers in 50-70 minutes. My d embedding  trains on Voxceleb1,2 with EER = 4.55%. I train uirnn with window size .24ms, overlap 50%, segment size .4ms. The result is poor on both train and test set. 
I also read all your code about uirnn, i don't understand 1> why do you split up the original utterances and concatenate them by speaker and then use that input for training? 2> Why doese the input ignore which audio the utterance belongs to, just merge all utterances in 1 single audio? .This process seems completely different to inference process and also reduce the capacity of using batch size if one speaker talk too much.
For 1 hour audio, the output has 20-30 speakers instead of 3-5 speakers no matter the smaller of crp_alpha is.
## My background

Have I read the `README.md` file?
* yes

Have I searched for similar questions from closed issues?
* yes

Have I tried to find the answers in the paper *Fully Supervised Speaker Diarization*?
* yes

Have I tried to find the answers in the reference *Speaker Diarization with LSTM*?
* yes

Have I tried to find the answers in the reference *Generalized End-to-End Loss for Speaker Verification*?
* yes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uis-rnn can't work for long utterances dataset? #50

Describe the question

My background

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development