Merge branch 'master' into master

yml-blog · Oct 19, 2019 · edea9ba · edea9ba
2 parents b33fe6d + 1977379
commit edea9ba
Show file tree

Hide file tree

Showing 120 changed files with 4,790 additions and 2,116 deletions.
diff --git a/ML-on-code-programming-source-code.md b/ML-on-code-programming-source-code.md
@@ -0,0 +1,35 @@
+# ML on Code/Programm/Source Code
+
+- [Talk: Learning to Type by Liam Atkinson](https://lara.epfl.ch/~kuncak/Learning_to_Type_S1360006.mp4) at the [ml4p.org]() conference in 2018
+- [Awesome ML on Source Code](https://github.com/src-d/awesome-machine-learning-on-source-code)
+- [Machine Learning on Go Code](https://medium.com/sourcedtech/machine-learning-on-go-code-829e85e2d2c6)
+- [ML on Source Code](https://github.com/topics/machine-learning-on-source-code)
+- [Introducing Experiments, an ongoing research effort from GitHub](https://github.blog/2018-09-18-introducing-experiments-an-ongoing-research-effort-from-github/)
+- [C# or Java? TypeScript or JavaScript? Machine learning based classification of programming languages](https://github.blog/2019-07-02-c-or-java-typescript-or-javascript-machine-learning-based-classification-of-programming-languages/)
+- [Introducing the CodeSearchNet challenge](https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/)
+   - [CodeSearchNet Challenge](https://github.com/github/codesearchnet#introduction) | [CodeSearchNet Challenge: Evaluating the State of Semantic Code Search](https://arxiv.org/abs/1909.09436) | [leaderboard](https://app.wandb.ai/github/codesearchnet/benchmark) | [technical report](https://arxiv.org/abs/1909.09436)
+   - [TreeSitter](http://tree-sitter.github.io/tree-sitter/) | [data preprocessing pipeline](https://github.com/github/CodeSearchNet/tree/master/function_parser)
+   - [Transformer](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html)
+   - [CodeSearchNet Corpus on S3 bucket](https://github.com/github/CodeSearchNet#downloading-data-from-s3)
+   - [Baseline models](https://github.com/github/CodeSearchNet) | [BERT](https://arxiv.org/abs/1810.04805)
+   - [StaQC](https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset)
+- [Towards Natural Language Semantic Code Search](https://github.blog/2018-09-18-towards-natural-language-semantic-code-search/)
+   - [Semantic Search](https://en.wikipedia.org/wiki/Semantic_search)
+   - [Sequence-to-sequence](https://towardsdatascience.com/how-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8)
+   - [tree-based LSTMs](https://arxiv.org/pdf/1802.00921.pdf)
+   - [gated-graph networks](https://github.com/Microsoft/gated-graph-neural-network-samples)
+   - [Fine-tuning Deep Learning models in Keras](https://flyyufelix.github.io/2016/10/03/fine-tuning-in-keras-part1.html)
+   - [BLEU Score](https://en.wikipedia.org/wiki/BLEU)
+   - [Universal Sentence Encoder](https://arxiv.org/abs/1803.11175) | [Tensorflow Hub](https://www.tensorflow.org/hub/modules/google/universal-sentence-encoder/1)
+   - [Neural language model](https://en.wikipedia.org/wiki/Language_model) | [fast.ai](https://fast.ai)
+   - [AWD LSTMs](https://arxiv.org/pdf/1708.02182.pdf) | [cyclical learning rates ](https://arxiv.org/abs/1506.01186) | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146.pdf)
+   - [A python tool for evaluating the quality of sentence embeddings](https://github.com/facebookresearch/SentEval)
+   - [Cosine Proximity Loss](https://keras.io/losses/) | [Efficient Natural Language Response Suggestion for Smart Reply](https://arxiv.org/abs/1705.00652)
+   - [open-source end-to-end tutorial](https://towardsdatascience.com/semantic-code-search-3cd6d244a39c)
+   - [Code Search implemented in Kubeflow](https://github.com/kubeflow/examples/tree/master/code_search) | [kubeflow](https://www.kubeflow.org/)
+   - [Live demo of Semantic Code Search](https://experiments.github.com/semantic-code-search) | [Experiments site](https://blog.github.com/2018-09-18-introducing-experiments-an-ongoing-research-effort-from-github/)
+- [ML for Detecting Code Bugs](https://towardsdatascience.com/machine-learning-for-detecting-code-bugs-a79f37f144b7)
+- [Machine Learning on Source Code](https://ml4code.github.io/)
+- [ML on Code devroom at FOSDEM](https://archive.fosdem.org/2019/schedule/track/ml_on_code/)
+- [The Open Source Show: Machine Learning on Code](https://channel9.msdn.com/Shows/The-Open-Source-Show/Machine-Learning-on-Code) by Rob Caron, Lacey Butler, Allison Cordle
+- [Machine Learning for Programming](https://ml4p.org/) - conference held in 2018 in Oxford, UK
diff --git a/Programming-in-Python.md b/Programming-in-Python.md
@@ -1,6 +1,6 @@
 # Programming in Python
 
-- [Learning](#learning)
+- [Basics / Learning](#basics-learning)
 - [Cheatsheets](#cheatsheets)
 - [Static analysis](#static-analysis)
     - [Focussed packages](#focussed-packages)
@@ -11,7 +11,12 @@
 - [Performance](#performance)
 - [Contributing](#contributing)
 
-## Learning
+### Basics / learning
+
+- [Lists, Tuples, Dictionaries, Conditionals, Loops, etc...](https://lnkd.in/gWRbc3J)
+- [Data Structures & Algorithms](https://lnkd.in/gYKnJWN)
+- [NumPy Arrays](https://lnkd.in/geeFePh)
+- [Regex](https://lnkd.in/gzUahNV)
 - [Introduction to Python](https://simpliv-wordpress-com.cdn.ampproject.org/c/s/simpliv.wordpress.com/2019/06/27/best-way-to-learn-python-step-by-step-guide/amp/)
 - [Learn Python](https://www.learnpython.org/)
 - [Python 3 Tutorial](https://docs.python.org/3/tutorial/)
@@ -24,12 +29,23 @@
   - [Online Python Turtle Editor](https://repl.it/languages/python_turtle)
   - [Online Python Compiler](https://www.onlinegdb.com/online_python_compiler)
 - [Local machine: Interacting with Python](https://realpython.com/interacting-with-python/)
+- [Python by Chris Albon](https://chrisalbon.com/#python) - topics covered: Basics • Data Wrangling • Data Visualization • Web Scraping • Testing • Logging • Other
+- [Regex resources by Chris Albon](https://chrisalbon.com/#regex)
+- [WTF Python repo](https://github.com/satwikkansal/wtfpython)
 
 ## Cheatsheets
 - [Python Cheatsheet](https://www.pythoncheatsheet.org/)
 - [PySheee: Python Cheatsheet](https://www.pythonsheets.com/)
 - [7+ Python Cheat Sheets for Beginners and Experts](https://sinxloud.com/python-cheat-sheet-beginner-advanced/)
 - [Python for Data Science](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf)
+- [30 seconds of python](https://github.com/30-seconds/30-seconds-of-python)
+
+## Database
+##### Databases implemented in python
+
+- [Pickle DB](https://pythonhosted.org/pickleDB/)
+- [Tinydb](https://github.com/msiemens/tinydb)
+- [ZODB](http://www.zodb.org/en/latest/)
 
 ## Static analysis
 
@@ -58,6 +74,13 @@
 * [multilint](https://github.com/adamchainz/multilint) - a wrapper around `flake8`, `isort` and `modernize`
 * [prospector](https://github.com/PyCQA/prospector) - a wrapper around `pylint`, `pep8`, `mccabe` and others
 
+## Cookie cutter: Python project templates
+
+- [For Python projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#python)
+- [For Data Science projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#data-science)
+- [For Reproducible Data Science projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#reproducible-science)
+- [For Data Driven Journalism projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#data-driven-journalism)
+
 ## Best practices
 
 - [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
@@ -71,6 +94,12 @@
 - [Python Best Practices: 5 Tips For Better Code - Airbrake Blog](https://airbrake.io/blog/python/python-best-practices)
 - [Python tutorial: Best practices and common mistakes to avoid](https://jaxenter.com/python-tutorial-best-practices-145959.html)
 - [Common mistakes beginnners make in python](https://github.com/qxf2/wtfiswronghere)
+- [Six steps to more professional data science code notebook on Kaggle by Rachael Tateman](https://kaggle.intercom-mail.com/via/e?ob=ktkgCmdq8TTLcC0KcRnaTDrpfyNmo93hWgJS%2Bf3C%2FpeXDMn5IliXwMPCgGeVFtngYeGLq2r3zzpfPOt1R2SLUvPz%2BOZl6ye5CNrx98D279Mjy%2BDCxeLTcN3rL%2BXuXvYPwdMeFoEliM4ujTLctPU1Rb2Kt8AOwN30PYPGdMZPPhxkha%2BlQ9oixCrQILf%2BWqOTvh59huu9yn%2BqmDKPk9wcnA%3D%3D&h=6b05c8a50ab9ff60c7c061020cc5428a92dce16c-23895383563) | [Video: 6 Steps for More Professional Data Science Code | Kaggle](https://www.youtube.com/watch?v=Trar7GZOPl8&feature=youtu.be&utm_medium=email&utm_source=intercom&utm_campaign=modular-code-event) | [Import scripts into notebook kernels](https://www.kaggle.com/product-feedback/91185) | [Kaggle Live Coding: Making code modular | Kaggle](https://www.youtube.com/watch?v=5zgxMgG4A7o) | [Documentation on Python modules](https://docs.python.org/3/tutorial/modules.html) | [DocStrings](https://www.python.org/dev/peps/pep-0257/) | [Don't Repeat Yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) | [PEP 8](https://www.python.org/dev/peps/pep-0008/) | [Joy of Functional programming for Data Science](https://www.youtube.com/watch?v=bzUmK0Y07ck) | [Method Chaining in Python using pyjanitor](https://pyjanitor.readthedocs.io/notebooks/pyjanitor_intro.html#Clean-up-our-data-using-a-pyjanitor-method-chaining-pipeline) | [pyjanitor docs](https://pyjanitor.readthedocs.io/notebooks/pyjanitor_intro.html#Clean-up-our-data-using-a-pyjanitor-method-chaining-pipeline) | [Code reviewing Data Science work](https://medium.com/apteo/code-reviewing-data-science-work-774747248e33) | [Python built-in method: assert](https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement) | [Code Smells](https://en.wikipedia.org/wiki/Code_smell) | [Kaggle Coffee Chat: Joel Grus | Kaggle: software engineering best practices](https://www.youtube.com/watch?v=Sg6xJ0ACc78) | [Scripting-your-data-validation notebook: Automating Data Pipelines](https://www.kaggle.com/rtatman/automating-data-pipelines-day-2#Scripting-your-data-validation) | [Dashboarding with Notebooks: Day 5](https://www.kaggle.com/rtatman/dashboarding-with-notebooks-day-5) | [Kaggle Scripts](https://www.kaggle.com/kernels?sortBy=hotness&group=everyone&pageSize=20&tagIds=16074) | [Regular Expressions](https://en.wikipedia.org/wiki/Regular_expression)
+- Packages & Libraries: [Cerberus module](http://docs.python-cerberus.org/en/stable/usage.html) | [missingno package](https://github.com/ResidentMario/missingno) | [python-magic module](https://github.com/ahupp/python-magic) | [Python Flashtext](https://flashtext.readthedocs.io/en/latest/) | [Flashtext github](https://github.com/vi3k6i5/flashtext#why-not-regex) | [Forum post embeddings + clustering](https://www.kaggle.com/rtatman/forum-post-embeddings-clustering)
+- [Jason Gormans'](https://twitter.com/jasongorman) Python Code Craft series:
+  - [Code Craft : Part I – Why We Need Code Craft](https://codemanship.wordpress.com/2019/10/01/code-craft-part-i-why-we-need-code-craft/)
+  - [Code Craft : Part II – Version Control is Seat Belts for Programmers](https://codemanship.wordpress.com/2019/10/02/code-craft-part-ii-version-control-is-seat-belts-for-programmers/)
+  - [Code Craft : Part III – Unit Tests are an Early Warning System for Programmers](https://codemanship.wordpress.com/2019/10/04/code-craft-part-iii-unit-tests-are-an-early-warning-system-for-programmers/)
 
 ## Testing
 
@@ -99,6 +128,10 @@
 - [NumPy aware dynamic Python compiler using LLVM ](https://github.com/ameroueh/numba) | [Numba](http://numba.pydata.org/)
 - [Profiling in Python](https://github.com/mkunesch/profiling-talk) - by [Markus Kunesch](https://github.com/mkunesch)
 
+## Competitions & coding challenges
+
+See [Competitions > Coding challenges](./competitions.md#coding-challenges)
+
 # Contributing
 
 Contributions are very welcome, please share back with the wider community (and get credited for it)!