Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
neomatrix369 authored Oct 19, 2019
2 parents b33fe6d + 1977379 commit edea9ba
Show file tree
Hide file tree
Showing 120 changed files with 4,790 additions and 2,116 deletions.
35 changes: 35 additions & 0 deletions ML-on-code-programming-source-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ML on Code/Programm/Source Code

- [Talk: Learning to Type by Liam Atkinson](https://lara.epfl.ch/~kuncak/Learning_to_Type_S1360006.mp4) at the [ml4p.org]() conference in 2018
- [Awesome ML on Source Code](https://github.com/src-d/awesome-machine-learning-on-source-code)
- [Machine Learning on Go Code](https://medium.com/sourcedtech/machine-learning-on-go-code-829e85e2d2c6)
- [ML on Source Code](https://github.com/topics/machine-learning-on-source-code)
- [Introducing Experiments, an ongoing research effort from GitHub](https://github.blog/2018-09-18-introducing-experiments-an-ongoing-research-effort-from-github/)
- [C# or Java? TypeScript or JavaScript? Machine learning based classification of programming languages](https://github.blog/2019-07-02-c-or-java-typescript-or-javascript-machine-learning-based-classification-of-programming-languages/)
- [Introducing the CodeSearchNet challenge](https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/)
- [CodeSearchNet Challenge](https://github.com/github/codesearchnet#introduction) | [CodeSearchNet Challenge: Evaluating the State of Semantic Code Search](https://arxiv.org/abs/1909.09436) | [leaderboard](https://app.wandb.ai/github/codesearchnet/benchmark) | [technical report](https://arxiv.org/abs/1909.09436)
- [TreeSitter](http://tree-sitter.github.io/tree-sitter/) | [data preprocessing pipeline](https://github.com/github/CodeSearchNet/tree/master/function_parser)
- [Transformer](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html)
- [CodeSearchNet Corpus on S3 bucket](https://github.com/github/CodeSearchNet#downloading-data-from-s3)
- [Baseline models](https://github.com/github/CodeSearchNet) | [BERT](https://arxiv.org/abs/1810.04805)
- [StaQC](https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset)
- [Towards Natural Language Semantic Code Search](https://github.blog/2018-09-18-towards-natural-language-semantic-code-search/)
- [Semantic Search](https://en.wikipedia.org/wiki/Semantic_search)
- [Sequence-to-sequence](https://towardsdatascience.com/how-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8)
- [tree-based LSTMs](https://arxiv.org/pdf/1802.00921.pdf)
- [gated-graph networks](https://github.com/Microsoft/gated-graph-neural-network-samples)
- [Fine-tuning Deep Learning models in Keras](https://flyyufelix.github.io/2016/10/03/fine-tuning-in-keras-part1.html)
- [BLEU Score](https://en.wikipedia.org/wiki/BLEU)
- [Universal Sentence Encoder](https://arxiv.org/abs/1803.11175) | [Tensorflow Hub](https://www.tensorflow.org/hub/modules/google/universal-sentence-encoder/1)
- [Neural language model](https://en.wikipedia.org/wiki/Language_model) | [fast.ai](https://fast.ai)
- [AWD LSTMs](https://arxiv.org/pdf/1708.02182.pdf) | [cyclical learning rates ](https://arxiv.org/abs/1506.01186) | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/pdf/1801.06146.pdf)
- [A python tool for evaluating the quality of sentence embeddings](https://github.com/facebookresearch/SentEval)
- [Cosine Proximity Loss](https://keras.io/losses/) | [Efficient Natural Language Response Suggestion for Smart Reply](https://arxiv.org/abs/1705.00652)
- [open-source end-to-end tutorial](https://towardsdatascience.com/semantic-code-search-3cd6d244a39c)
- [Code Search implemented in Kubeflow](https://github.com/kubeflow/examples/tree/master/code_search) | [kubeflow](https://www.kubeflow.org/)
- [Live demo of Semantic Code Search](https://experiments.github.com/semantic-code-search) | [Experiments site](https://blog.github.com/2018-09-18-introducing-experiments-an-ongoing-research-effort-from-github/)
- [ML for Detecting Code Bugs](https://towardsdatascience.com/machine-learning-for-detecting-code-bugs-a79f37f144b7)
- [Machine Learning on Source Code](https://ml4code.github.io/)
- [ML on Code devroom at FOSDEM](https://archive.fosdem.org/2019/schedule/track/ml_on_code/)
- [The Open Source Show: Machine Learning on Code](https://channel9.msdn.com/Shows/The-Open-Source-Show/Machine-Learning-on-Code) by Rob Caron, Lacey Butler, Allison Cordle
- [Machine Learning for Programming](https://ml4p.org/) - conference held in 2018 in Oxford, UK
37 changes: 35 additions & 2 deletions Programming-in-Python.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Programming in Python

- [Learning](#learning)
- [Basics / Learning](#basics-learning)
- [Cheatsheets](#cheatsheets)
- [Static analysis](#static-analysis)
- [Focussed packages](#focussed-packages)
Expand All @@ -11,7 +11,12 @@
- [Performance](#performance)
- [Contributing](#contributing)

## Learning
### Basics / learning

- [Lists, Tuples, Dictionaries, Conditionals, Loops, etc...](https://lnkd.in/gWRbc3J)
- [Data Structures & Algorithms](https://lnkd.in/gYKnJWN)
- [NumPy Arrays](https://lnkd.in/geeFePh)
- [Regex](https://lnkd.in/gzUahNV)
- [Introduction to Python](https://simpliv-wordpress-com.cdn.ampproject.org/c/s/simpliv.wordpress.com/2019/06/27/best-way-to-learn-python-step-by-step-guide/amp/)
- [Learn Python](https://www.learnpython.org/)
- [Python 3 Tutorial](https://docs.python.org/3/tutorial/)
Expand All @@ -24,12 +29,23 @@
- [Online Python Turtle Editor](https://repl.it/languages/python_turtle)
- [Online Python Compiler](https://www.onlinegdb.com/online_python_compiler)
- [Local machine: Interacting with Python](https://realpython.com/interacting-with-python/)
- [Python by Chris Albon](https://chrisalbon.com/#python) - topics covered: Basics • Data Wrangling • Data Visualization • Web Scraping • Testing • Logging • Other
- [Regex resources by Chris Albon](https://chrisalbon.com/#regex)
- [WTF Python repo](https://github.com/satwikkansal/wtfpython)

## Cheatsheets
- [Python Cheatsheet](https://www.pythoncheatsheet.org/)
- [PySheee: Python Cheatsheet](https://www.pythonsheets.com/)
- [7+ Python Cheat Sheets for Beginners and Experts](https://sinxloud.com/python-cheat-sheet-beginner-advanced/)
- [Python for Data Science](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf)
- [30 seconds of python](https://github.com/30-seconds/30-seconds-of-python)

## Database
##### Databases implemented in python

- [Pickle DB](https://pythonhosted.org/pickleDB/)
- [Tinydb](https://github.com/msiemens/tinydb)
- [ZODB](http://www.zodb.org/en/latest/)

## Static analysis

Expand Down Expand Up @@ -58,6 +74,13 @@
* [multilint](https://github.com/adamchainz/multilint) - a wrapper around `flake8`, `isort` and `modernize`
* [prospector](https://github.com/PyCQA/prospector) - a wrapper around `pylint`, `pep8`, `mccabe` and others

## Cookie cutter: Python project templates

- [For Python projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#python)
- [For Data Science projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#data-science)
- [For Reproducible Data Science projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#reproducible-science)
- [For Data Driven Journalism projects](https://cookiecutter.readthedocs.io/en/latest/readme.html#data-driven-journalism)

## Best practices

- [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
Expand All @@ -71,6 +94,12 @@
- [Python Best Practices: 5 Tips For Better Code - Airbrake Blog](https://airbrake.io/blog/python/python-best-practices)
- [Python tutorial: Best practices and common mistakes to avoid](https://jaxenter.com/python-tutorial-best-practices-145959.html)
- [Common mistakes beginnners make in python](https://github.com/qxf2/wtfiswronghere)
- [Six steps to more professional data science code notebook on Kaggle by Rachael Tateman](https://kaggle.intercom-mail.com/via/e?ob=ktkgCmdq8TTLcC0KcRnaTDrpfyNmo93hWgJS%2Bf3C%2FpeXDMn5IliXwMPCgGeVFtngYeGLq2r3zzpfPOt1R2SLUvPz%2BOZl6ye5CNrx98D279Mjy%2BDCxeLTcN3rL%2BXuXvYPwdMeFoEliM4ujTLctPU1Rb2Kt8AOwN30PYPGdMZPPhxkha%2BlQ9oixCrQILf%2BWqOTvh59huu9yn%2BqmDKPk9wcnA%3D%3D&h=6b05c8a50ab9ff60c7c061020cc5428a92dce16c-23895383563) | [Video: 6 Steps for More Professional Data Science Code | Kaggle](https://www.youtube.com/watch?v=Trar7GZOPl8&feature=youtu.be&utm_medium=email&utm_source=intercom&utm_campaign=modular-code-event) | [Import scripts into notebook kernels](https://www.kaggle.com/product-feedback/91185) | [Kaggle Live Coding: Making code modular | Kaggle](https://www.youtube.com/watch?v=5zgxMgG4A7o) | [Documentation on Python modules](https://docs.python.org/3/tutorial/modules.html) | [DocStrings](https://www.python.org/dev/peps/pep-0257/) | [Don't Repeat Yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) | [PEP 8](https://www.python.org/dev/peps/pep-0008/) | [Joy of Functional programming for Data Science](https://www.youtube.com/watch?v=bzUmK0Y07ck) | [Method Chaining in Python using pyjanitor](https://pyjanitor.readthedocs.io/notebooks/pyjanitor_intro.html#Clean-up-our-data-using-a-pyjanitor-method-chaining-pipeline) | [pyjanitor docs](https://pyjanitor.readthedocs.io/notebooks/pyjanitor_intro.html#Clean-up-our-data-using-a-pyjanitor-method-chaining-pipeline) | [Code reviewing Data Science work](https://medium.com/apteo/code-reviewing-data-science-work-774747248e33) | [Python built-in method: assert](https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement) | [Code Smells](https://en.wikipedia.org/wiki/Code_smell) | [Kaggle Coffee Chat: Joel Grus | Kaggle: software engineering best practices](https://www.youtube.com/watch?v=Sg6xJ0ACc78) | [Scripting-your-data-validation notebook: Automating Data Pipelines](https://www.kaggle.com/rtatman/automating-data-pipelines-day-2#Scripting-your-data-validation) | [Dashboarding with Notebooks: Day 5](https://www.kaggle.com/rtatman/dashboarding-with-notebooks-day-5) | [Kaggle Scripts](https://www.kaggle.com/kernels?sortBy=hotness&group=everyone&pageSize=20&tagIds=16074) | [Regular Expressions](https://en.wikipedia.org/wiki/Regular_expression)
- Packages & Libraries: [Cerberus module](http://docs.python-cerberus.org/en/stable/usage.html) | [missingno package](https://github.com/ResidentMario/missingno) | [python-magic module](https://github.com/ahupp/python-magic) | [Python Flashtext](https://flashtext.readthedocs.io/en/latest/) | [Flashtext github](https://github.com/vi3k6i5/flashtext#why-not-regex) | [Forum post embeddings + clustering](https://www.kaggle.com/rtatman/forum-post-embeddings-clustering)
- [Jason Gormans'](https://twitter.com/jasongorman) Python Code Craft series:
- [Code Craft : Part I – Why We Need Code Craft](https://codemanship.wordpress.com/2019/10/01/code-craft-part-i-why-we-need-code-craft/)
- [Code Craft : Part II – Version Control is Seat Belts for Programmers](https://codemanship.wordpress.com/2019/10/02/code-craft-part-ii-version-control-is-seat-belts-for-programmers/)
- [Code Craft : Part III – Unit Tests are an Early Warning System for Programmers](https://codemanship.wordpress.com/2019/10/04/code-craft-part-iii-unit-tests-are-an-early-warning-system-for-programmers/)

## Testing

Expand Down Expand Up @@ -99,6 +128,10 @@
- [NumPy aware dynamic Python compiler using LLVM ](https://github.com/ameroueh/numba) | [Numba](http://numba.pydata.org/)
- [Profiling in Python](https://github.com/mkunesch/profiling-talk) - by [Markus Kunesch](https://github.com/mkunesch)

## Competitions & coding challenges

See [Competitions > Coding challenges](./competitions.md#coding-challenges)

# Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!
Expand Down
Loading

0 comments on commit edea9ba

Please sign in to comment.