workflow

[WIP] Github summarization workflow.

Prerequisites.

Get the input data and upload it to GCS.

Get the input data from this location. In the following, we assume that the file path is ./github-issues.zip

Decompress the input data:

unzip ./github-issues.zip

For debugging purposes, consider reducing the size of the input data. The workflow will execute much faster:

cat ./github-issues.csv | head -n 10000 > ./github-issues-medium.csv

Compress the data using gzip (this format is the one assumed by the workflow):

gzip ./github-issues-medium.csv

Upload the data to GCS:

gsutil cp ./github-issues-medium.csv.gz gs://<MY_BUCKET>

Building the container.

Build the container and tag it so that it can be pushed to a GCP container registry

docker build -f Dockerfile -t gcr.io/<GCP_PROJECT>/github_issue_summarization:v1 .

Push the container to the GCP container registry:

gcloud docker -- push gcr.io/<GCP_PROJECT>/github_issue_summarization:v1

Running the workflow.

Run the workflow:

argo submit github_issues_summarization.yaml
  -p bucket=<BUCKET_NAME>
  -p bucket-key=<PATH_TO_INPUT_DATA_IN_BUCKET>
  -p container-image=gcr.io/<GCP_PROJECT>/github_issue_summarization:v1

Where:

<BUCKET_NAME> is the name of a GCS bucket where the input data is stored (e.g.: "my_bucket_1234").
<BUCKET_KEY> is the path to the input data in csv.gz format (e.g.: "data/github_issues.csv.gz").
<GCP_PROJECT> is the name of the GCP project where the container was pushed.

The data generated by the workflow will be stored in the default artifact repository specified in the previous section.

The logs can be read by using the argo get and argo logs commands (link)

Name		Name	Last commit message	Last commit date
parent directory ..
workspace/src		workspace/src
Dockerfile		Dockerfile
README.md		README.md
github_issues_summarization.yaml		github_issues_summarization.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow

workflow

README.md

[WIP] Github summarization workflow.

Prerequisites.

Get the input data and upload it to GCS.

Building the container.

Running the workflow.

Files

workflow

Directory actions

More options

Directory actions

More options

Latest commit

History

workflow

Folders and files

parent directory

README.md

[WIP] Github summarization workflow.

Prerequisites.

Get the input data and upload it to GCS.

Building the container.

Running the workflow.