GSoC Idea: Machine Learning based Community Health and Communication #1637
Description
Ideas for Google Summer of Code projects
Interested in working with CHAOSS? Below are some project ideas. We describe how to apply to work with CHAOSS and how we select students on a different page: https://github.com/chaoss/community/blob/master/GSoC-interest.md
Idea: Advancing Risk Prediction With Machine Learning in Augur
Currently Augur uses computational linguistics, dependency mapping, license scanning, topic modeling, social network analysis, and algorithms that target temporal changes in CHAOSS metrics. The aim of this project is to advance the accessibility of these insights through the development of python based API endpoints that deliver visualizations of machine learning outputs, similar to the style found in https://github.com/chaoss/augur/augur/routes/pull_request_reports and https://github.com/chaoss/augur/augur/routes/contributor_reports
This work could include optimization and refinement of machine learning workers found under https://github.com/chaoss/augur/workers to generate additional, or reporting optimized data, as well as the extension of Augur's new front end at https://github.com/augurlabs/augur_view, which is based on twitter/bootstrap and flask.
The aims of the project are as follows:
- Communicate repository and project health insights through visualization
- Identify projects that have similar characteristics, and visualize similarity using spacial proximity metaphors
- Increase awareness of open source project ecosystems, and their component projects.
The aims will require working in a programming language to automate the task, use API to generate the graphs, and use some Graphic editor to prepare the pdf.
- Difficulty: Medium
- Requirements: Python programming experience, or a strong interest.
- Recommended: Experience with accessing APIs, writing SQL, and a strong interest in Machine Learning.
- Mentors: Sean Goggins, Andrew Brain
Augur is more advanced in its machine learning than basic recommender systems, using TF-IDF, Boosting, Latent Dirichlet Analysis of sequenced conversations, and clustering algorithms that look for similar topics across repositories.
The Github Workers that use machine learning are visible in our Workers directory here: . We use our advanced contributor worker to manage identities, and the existing discourse_analysis, message_insights, clustering, insight, and pull_request_worker_analysis workers. Documentation about these existing workers can be found here: https://oss-augur.readthedocs.io/en/dev/development-guide/workers/toc.html
Augur features machine learning workers are now an active part of Augur's growing ecosystem.
The aims will require generating code in Python for FLASK and the GraphQL API, and for the web app, which has now advanced from the Vue.js and Vuetify ecosystems to more robust and sustained projects like Twitter/Bootstrap.
Microtasks
For becoming familiar with Augur, you can start by reading some documentation. You can find useful information at in the links, below.
Once you're familiar with Augur, you can have a look at the following microtasks.
-
Microtask 0:
Download and configure Augur, creating a dev environment using the general cautions noted here: https://oss-augur.readthedocs.io/en/dev/getting-started/installation.html and the full documentation here: https://oss-augur.readthedocs.io/en/dev/development-guide/toc.html -
Microstask 1:
Work on any Augur Issue that's Open -
Microtask 2:
Identify new issues you encounter during installation. -
Microstask 3:
Collection data with an existing machine learning worker and build a simple visualization API Endpoint following the patterns in the examples from the project description. -
Microtask 4:
Anything you want to show us. Even if you find bugs in our documentation and want to issue a PR for those!