Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metagenomics section #200

Merged
merged 12 commits into from
Feb 16, 2017
Prev Previous commit
Next Next commit
Update 04_study.md
  • Loading branch information
gailrosen authored Jan 18, 2017
commit 37ac5fa70e2cbc5ef3bfb39108da15f59733de8a
29 changes: 15 additions & 14 deletions sections/04_study.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,20 +83,21 @@ use interesting network architectures to approach single-cell data.
### Metagenomics

*@gailrosen will write this:*
Metagenomics (which refers to the study of genetic material, 16S rRNA and/or whole-genome shotgun DNA, from microbial communities) has revolutionized the study
of micro-scale ecosystems within us and around us. There is increasing
literature of applying machine learning in general to metagenomic analysis.
In the late 2000’s, a plethora of machine learning methods were applied to
classifying DNA sequencing reads to the thousands of species within a sample.
An important problem is genome assembly from these mixed-organism samples.
And to do that, the organisms should be “binned” before assembling. Binning
methods began with many k-mer techniques [refs] and then delved into other
clustering algorithms, such as self-organizing maps (SOM). Then came the
taxonomic classification problem, with researchers naturally using BLAST
[blast], followed by other machine learning techniques such as SVMs
[McHardy], naive Bayesian classifiers [nbc], etc. to classify each read.
Then, researchers began to use techniques that could be used to estimate
relative abundances of an entire sample, instead of the precise but
Metagenomics (which refers to the study of genetic material, 16S rRNA
and/or whole-genome shotgun DNA, from microbial communities) has
revolutionized the study of micro-scale ecosystems within us and around us.
There is increasing literature of applying machine learning in general to
metagenomic analysis. In the late 2000’s, a plethora of machine learning
methods were applied to classifying DNA sequencing reads to the thousands of
species within a sample. An important problem is genome assembly from these
mixed-organism samples. And to do that, the organisms should be “binned”
before assembling. Binning methods began with many k-mer techniques [refs]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge before adding the references, but we'll have to remember to come back to them. Perhaps add a TODO reminder at the start or end of this sub-section?

and then delved into other clustering algorithms, such as self-organizing maps
(SOM). Then came the taxonomic classification problem, with researchers
naturally using BLAST [blast], followed by other machine learning techniques
such as SVMs [McHardy], naive Bayesian classifiers [nbc], etc. to classify
each read. Then, researchers began to use techniques that could be used to
estimate relative abundances of an entire sample, instead of the precise but
painstakingly slow read-by-read classification. Relative abundance
estimators (a.k.a diversity profilers) are MetaPhlan[ref], (WGS)Quikr[ref],
and some configurations of tools like OneCodex[ref] and LMAT[ref]. While one
Expand Down