Skip to content

Commit

Permalink
history rewrite to remove dataset from git
Browse files Browse the repository at this point in the history
  • Loading branch information
rafaelkallis committed Nov 3, 2018
1 parent 0ab953b commit fd39d37
Show file tree
Hide file tree
Showing 19 changed files with 1,360 additions and 166,030 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
node_modules
*.swp
*.env\

dataset.csv
model.vec
test.txt
train.txt
1 change: 1 addition & 0 deletions Procfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
web: npm start
78 changes: 77 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,77 @@
# ticket-tagger
# ticket-tagger

### Development

#### get started:

```sh
git clone https://github.com/rafaelkallis/ticket-tagger ticket-tagger
cd ticket-tacker
npm install

# run benchmark
npm run benchmark

# run server
npm start
```

#### customize preprocessing:

```js
/* src/preprocess.js */

const stemmer = require('natural').PorterStemmer;

/* example preprocessing method */
module.exports = function(text) {
const stem = stemmer.tokenizeAndStem(text);
return stem.join(' ');
}
```

#### generate dataset:

a dataset (with 10k bugs, 10k enhancements and 10k questions) is already included in the repository, or can be found [here](https://gist.github.com/rafaelkallis/707743843fa0337277ab36b42607c46d).
the dataset was generated using github archive's which can be accessed through google [BigQuery](https://bigquery.cloud.google.com).

add the query below to your BigQuery console and adjust if needed.

```sql
SELECT
label, CONCAT(title, ' ', REGEXP_REPLACE(body, '(\r|\n|\r\n)',' '))
FROM (
SELECT
LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.labels[0].name')) AS label,
JSON_EXTRACT_SCALAR(payload, '$.issue.title') AS title,
JSON_EXTRACT_SCALAR(payload, '$.issue.body') AS body
FROM
[githubarchive:day.20180201],
[githubarchive:day.20180202],
[githubarchive:day.20180203],
[githubarchive:day.20180204],
[githubarchive:day.20180205]
WHERE
type = 'IssuesEvent'
AND JSON_EXTRACT_SCALAR(payload, '$.action') = 'closed' )
WHERE
(label = 'bug' OR label = 'enhancement' OR label = 'question')
AND body != 'null';
```

#### run serverless app:

you need a `.env` file in order to run the marketplace app.
The file should look like this:

```
GITHUB_CERT=/path/to/cert.private-key.pem
GITHUB_SECRET=123456
GITHUB_APP_ID=123
PORT=3000
```

#### references:

- [Building GitHub Apps](https://developer.github.com/apps/building-github-apps/)
- [Fasttext](https://fasttext.cc)
150,254 changes: 0 additions & 150,254 deletions datasets/github_issues.csv

This file was deleted.

17 changes: 0 additions & 17 deletions datasets/preprocess.js

This file was deleted.

11,749 changes: 0 additions & 11,749 deletions datasets/processed_github_issues.csv

This file was deleted.

76 changes: 0 additions & 76 deletions github-webhook/index.js

This file was deleted.

Binary file added model.bin
Binary file not shown.
Loading

0 comments on commit fd39d37

Please sign in to comment.