Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metagenomics section #200

Merged
merged 12 commits into from
Feb 16, 2017
Prev Previous commit
Next Next commit
Update 04_study.md
fixed have plenty-> has plenty
more specific about convergence issues
defined the improvement in performance of mrzelj
  • Loading branch information
gailrosen authored Feb 7, 2017
commit 2087ce4ca8f408e99d17b7d24bbd04dca44274c1
18 changes: 10 additions & 8 deletions sections/04_study.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ genomic fragments and showed that it has better performance than K-means;
however, other methods based on interpolated Markov models [Salzberg] have
performed better. Due to the complexity of the problem, neural networks have
been applied more to gene annotation (e.g. Orphelia [Hoff]), which usually
have plenty of training examples. Representations (similar to Word2Vec [ref]
has plenty of training examples. Representations (similar to Word2Vec [ref]
in natural language processing) for protein family classification has been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for @brettbj - think this connects to some of your thinking.

introduced and classified with a skip-gram neural network [Asgari].
Recurrent neural networks show good performance for homology and protein
Expand Down Expand Up @@ -166,13 +166,15 @@ are still a challenge for deep neural networks that require many more
training examples than features to sufficiently converge the weights on the
hidden layers.

In fact, due to convergence issues of neural networks, one would think
that taxonomic classification would be impossible for deep neural networks.
However, with the 16S rRNA having hundreds of thousands of full-sequenced
examples (compared to several thousand fully-sequenced whole-genome
sequences), deep neural networks have been successfully applied to taxonomic
classification of 16S rRNA genes, with convolutional networks outperforming
RNNs and even random forests [Mrzelj].
In fact, due to convergence issues (slowness and instability due to large
neural networks modeling very large datasets [arXiv:1212.0901v2]), one would
think that taxonomic classification would be impossible for deep neural
networks. However, with the 16S rRNA databases containing hundreds of thousands
of full-sequenced examples (compared to several thousand fully-sequenced
whole-genome sequences), deep neural networks have been successfully applied to
taxonomic classification of 16S rRNA genes, with convolutional networks
provide about 10% accuracy genus-level improvement over RNNs and even random
forests [Mrzelj].

### Sequencing and variant calling

Expand Down